{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evolution of NBA Player Salary Determinants Over the Last 30 Years" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a longtime NBA fan, I have been curious for a while as to how NBA player contracts are determined, and which factors play the biggest part in determining the size of a contract that a player will get. Specifically in the modern NBA, 3-point shooting has become much more common, and most of the current highest-payed players are excellent 3-point shooters. For this project, I decided to take a look at how the NBA has evolved in the last 30 years in regards to determinants for a player's contract, with a focus on 3-point shooting." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: chart-studio in /opt/conda/lib/python3.8/site-packages (1.1.0)\r\n", "Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from chart-studio) (2.25.0)\r\n", "Requirement already satisfied: plotly in /opt/conda/lib/python3.8/site-packages (from chart-studio) (4.14.1)\r\n", "Requirement already satisfied: retrying>=1.3.3 in /opt/conda/lib/python3.8/site-packages (from chart-studio) (1.3.3)\r\n", "Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from chart-studio) (1.15.0)\r\n", "Requirement already satisfied: retrying>=1.3.3 in /opt/conda/lib/python3.8/site-packages (from chart-studio) (1.3.3)\r\n", "Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from chart-studio) (1.15.0)\r\n", "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->chart-studio) (2020.12.5)\r\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->chart-studio) (1.25.11)\r\n", "Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->chart-studio) (2.10)\r\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->chart-studio) (3.0.4)\r\n", "Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from chart-studio) (1.15.0)\r\n" ] } ], "source": [ "# import packages and set themes\n", "! pip install chart-studio\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import qeds\n", "import requests\n", "\n", "import plotly as pt\n", "import plotly.express as px\n", "from chart_studio.plotly import plot, iplot as py\n", "import plotly.graph_objects as go\n", "from plotly.offline import iplot, init_notebook_mode\n", "\n", "import seaborn as sns\n", "import matplotlib.colors as mplc\n", "import matplotlib.pyplot as plt\n", "\n", "from sklearn import (\n", " linear_model, metrics, neural_network, pipeline, model_selection\n", ")\n", "\n", "%matplotlib inline\n", "# activate plot theme\n", "qeds.themes.mpl_style();\n", "colors = qeds.themes.COLOR_CYCLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first dataset contains all basketball statistics for all NBA players through each season." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
seas_idseasonplayer_idplayerbirth_yearhofposageexperiencelg...ft_percentorb_per_gamedrb_per_gametrb_per_gameast_per_gamestl_per_gameblk_per_gametov_per_gamepf_per_gamepts_per_game
02894320214219Aaron GordonNaNFalsePF25.07NBA...0.6461.54.56.03.70.70.82.21.913.6
12894420214219Aaron GordonNaNFalsePF25.07NBA...0.6291.65.16.64.20.60.82.72.014.6
22894520214219Aaron GordonNaNFalsePF25.07NBA...0.7271.33.34.52.50.80.61.11.611.5
32894620214582Aaron HolidayNaNFalsePG24.03NBA...0.7830.21.11.21.60.60.20.91.67.3
42894720214805Aaron NesmithNaNFalseSF21.01NBA...0.6880.51.62.10.30.30.20.51.43.3
..................................................................
296072001947157Walt MillerNaNFalseF31.01BAA...0.500NaNNaNNaN0.5NaNNaNNaN1.31.9
296082011947158Warren FenleyNaNFalseF24.01BAA...0.511NaNNaNNaN0.5NaNNaNNaN1.82.6
296092021947159Wilbert KautzNaNFalseG-F31.01BAA...0.534NaNNaNNaN0.7NaNNaNNaN2.35.1
296102031947160Woody GrimshawNaNFalseG27.01BAA...0.477NaNNaNNaN0.0NaNNaNNaN1.22.9
296112041947161Wyndol GrayNaNFalseG-F24.01BAA...0.581NaNNaNNaN0.9NaNNaNNaN1.96.4
\n", "

29612 rows × 36 columns

\n", "
" ], "text/plain": [ " seas_id season player_id player birth_year hof pos \\\n", "0 28943 2021 4219 Aaron Gordon NaN False PF \n", "1 28944 2021 4219 Aaron Gordon NaN False PF \n", "2 28945 2021 4219 Aaron Gordon NaN False PF \n", "3 28946 2021 4582 Aaron Holiday NaN False PG \n", "4 28947 2021 4805 Aaron Nesmith NaN False SF \n", "... ... ... ... ... ... ... ... \n", "29607 200 1947 157 Walt Miller NaN False F \n", "29608 201 1947 158 Warren Fenley NaN False F \n", "29609 202 1947 159 Wilbert Kautz NaN False G-F \n", "29610 203 1947 160 Woody Grimshaw NaN False G \n", "29611 204 1947 161 Wyndol Gray NaN False G-F \n", "\n", " age experience lg ... ft_percent orb_per_game drb_per_game \\\n", "0 25.0 7 NBA ... 0.646 1.5 4.5 \n", "1 25.0 7 NBA ... 0.629 1.6 5.1 \n", "2 25.0 7 NBA ... 0.727 1.3 3.3 \n", "3 24.0 3 NBA ... 0.783 0.2 1.1 \n", "4 21.0 1 NBA ... 0.688 0.5 1.6 \n", "... ... ... ... ... ... ... ... \n", "29607 31.0 1 BAA ... 0.500 NaN NaN \n", "29608 24.0 1 BAA ... 0.511 NaN NaN \n", "29609 31.0 1 BAA ... 0.534 NaN NaN \n", "29610 27.0 1 BAA ... 0.477 NaN NaN \n", "29611 24.0 1 BAA ... 0.581 NaN NaN \n", "\n", " trb_per_game ast_per_game stl_per_game blk_per_game tov_per_game \\\n", "0 6.0 3.7 0.7 0.8 2.2 \n", "1 6.6 4.2 0.6 0.8 2.7 \n", "2 4.5 2.5 0.8 0.6 1.1 \n", "3 1.2 1.6 0.6 0.2 0.9 \n", "4 2.1 0.3 0.3 0.2 0.5 \n", "... ... ... ... ... ... \n", "29607 NaN 0.5 NaN NaN NaN \n", "29608 NaN 0.5 NaN NaN NaN \n", "29609 NaN 0.7 NaN NaN NaN \n", "29610 NaN 0.0 NaN NaN NaN \n", "29611 NaN 0.9 NaN NaN NaN \n", "\n", " pf_per_game pts_per_game \n", "0 1.9 13.6 \n", "1 2.0 14.6 \n", "2 1.6 11.5 \n", "3 1.6 7.3 \n", "4 1.4 3.3 \n", "... ... ... \n", "29607 1.3 1.9 \n", "29608 1.8 2.6 \n", "29609 2.3 5.1 \n", "29610 1.2 2.9 \n", "29611 1.9 6.4 \n", "\n", "[29612 rows x 36 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"nba_player_data.csv\")\n", "\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This doesn't include any player salaries or salary cap information, so I will need to combine a few more datasets to add in that information." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearSalary CapAdjusted
01985$3,600,000$8,557,797
11986$4,233,000$9,873,144
21987$4,945,000$11,128,422
31988$6,164,000$13,325,270
41989$7,232,000$14,916,364
\n", "
" ], "text/plain": [ " Year Salary Cap Adjusted\n", "0 1985 $3,600,000 $8,557,797 \n", "1 1986 $4,233,000 $9,873,144 \n", "2 1987 $4,945,000 $11,128,422 \n", "3 1988 $6,164,000 $13,325,270 \n", "4 1989 $7,232,000 $14,916,364 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "salary_cap_df = pd.read_csv(\"nba_historical_salary_cap.csv\")\n", "salary_cap_df.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearEndTeamPlayerSalaryBelowMinUnnamed: 5
02017Atlanta HawksDwight Howard23,180,275NaN1
12017Atlanta HawksPaul Millsap20,072,033NaN2
22017Atlanta HawksKent Bazemore15,730,338NaN0
32017Atlanta HawksTiago Splitter8,550,000NaN0
42017Atlanta HawksKyle Korver5,239,437NaN0
.....................
127101991Washington BulletsHarvey Grant475,000NaN0
127111991Washington BulletsByron Irvin375,000NaN0
127121991Washington BulletsA.J. English275,000NaN0
127131991Washington BulletsGreg Foster275,000NaN0
127141991Washington BulletsHaywoode Workman120,000NaN0
\n", "

12715 rows × 6 columns

\n", "
" ], "text/plain": [ " YearEnd Team Player Salary BelowMin \\\n", "0 2017 Atlanta Hawks Dwight Howard 23,180,275 NaN \n", "1 2017 Atlanta Hawks Paul Millsap 20,072,033 NaN \n", "2 2017 Atlanta Hawks Kent Bazemore 15,730,338 NaN \n", "3 2017 Atlanta Hawks Tiago Splitter 8,550,000 NaN \n", "4 2017 Atlanta Hawks Kyle Korver 5,239,437 NaN \n", "... ... ... ... ... ... \n", "12710 1991 Washington Bullets Harvey Grant 475,000 NaN \n", "12711 1991 Washington Bullets Byron Irvin 375,000 NaN \n", "12712 1991 Washington Bullets A.J. English 275,000 NaN \n", "12713 1991 Washington Bullets Greg Foster 275,000 NaN \n", "12714 1991 Washington Bullets Haywoode Workman 120,000 NaN \n", "\n", " Unnamed: 5 \n", "0 1 \n", "1 2 \n", "2 0 \n", "3 0 \n", "4 0 \n", "... ... \n", "12710 0 \n", "12711 0 \n", "12712 0 \n", "12713 0 \n", "12714 0 \n", "\n", "[12715 rows x 6 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "big_salary_data = pd.read_csv('player_salary_history.csv')\n", "big_salary_data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "salary2018 = pd.read_csv('salary2018.csv')\n", "salary2019 = pd.read_csv('salary2019.csv')\n", "salary2020 = pd.read_csv('salary2020.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like we have a lot of data cleaning and merging ahead of us to combine these datasets, so let's get started." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Year object\n", "Salary Cap object\n", "dtype: object" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data cleaning #\n", "\n", "clean_cap = salary_cap_df.drop('Adjusted', axis = 1)\n", "clean_cap['Salary Cap'] = clean_cap['Salary Cap'].str.replace(',', '')\n", "clean_cap['Salary Cap'] = clean_cap['Salary Cap'].str.replace('$', '')\n", "clean_cap['Year'] = clean_cap['Year'].str.replace(\"'\", '')\n", "\n", "clean_cap.dtypes" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearSalary Cap
019853600000
119864233000
219874945000
319886164000
419897232000
519909802000
6199111871000
7199212500000
8199314000000
9199415175000
\n", "
" ], "text/plain": [ " Year Salary Cap\n", "0 1985 3600000\n", "1 1986 4233000\n", "2 1987 4945000\n", "3 1988 6164000\n", "4 1989 7232000\n", "5 1990 9802000\n", "6 1991 11871000\n", "7 1992 12500000\n", "8 1993 14000000\n", "9 1994 15175000" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_cap['Year'] = pd.to_numeric(clean_cap['Year'])\n", "clean_cap['Salary Cap'] = pd.to_numeric(clean_cap['Salary Cap'])\n", "clean_cap.head(10)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearPlayerSalary
02017Dwight Howard23180275
12017Paul Millsap20072033
22017Kent Bazemore15730338
32017Tiago Splitter8550000
42017Kyle Korver5239437
............
127101991Harvey Grant475000
127111991Byron Irvin375000
127121991A.J. English275000
127131991Greg Foster275000
127141991Haywoode Workman120000
\n", "

12715 rows × 3 columns

\n", "
" ], "text/plain": [ " Year Player Salary\n", "0 2017 Dwight Howard 23180275 \n", "1 2017 Paul Millsap 20072033 \n", "2 2017 Kent Bazemore 15730338 \n", "3 2017 Tiago Splitter 8550000 \n", "4 2017 Kyle Korver 5239437 \n", "... ... ... ...\n", "12710 1991 Harvey Grant 475000 \n", "12711 1991 Byron Irvin 375000 \n", "12712 1991 A.J. English 275000 \n", "12713 1991 Greg Foster 275000 \n", "12714 1991 Haywoode Workman 120000 \n", "\n", "[12715 rows x 3 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_bigsalary = big_salary_data.drop(['Team', 'BelowMin', 'Unnamed: 5'], axis=1)\n", "cleaned = clean_bigsalary.rename(columns = {'YearEnd': 'Year', ' Salary ': 'Salary'})\n", "cleaned['Salary'] = cleaned['Salary'].str.replace(',', '')\n", "cleaned" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Year int64\n", "Player object\n", "Salary int64\n", "dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cleaned['Salary'] = cleaned['Salary'].str.replace('Unknown', '0')\n", "cleaned['Salary'] = pd.to_numeric(cleaned['Salary'])\n", "cleaned.dtypes" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerSalaryYear
0Stephen Curry346825502018
1LeBron James332857092018
2Paul Millsap307692312018
3Gordon Hayward297279002018
4Blake Griffin295129002018
............
581Andre Ingram460792018
582Trey McKinney-Jones460792018
583Aaron Jackson460792018
584Jameel Warney460792018
585Marcus Thornton II460792018
\n", "

586 rows × 3 columns

\n", "
" ], "text/plain": [ " Player Salary Year\n", "0 Stephen Curry 34682550 2018\n", "1 LeBron James 33285709 2018\n", "2 Paul Millsap 30769231 2018\n", "3 Gordon Hayward 29727900 2018\n", "4 Blake Griffin 29512900 2018\n", ".. ... ... ...\n", "581 Andre Ingram 46079 2018\n", "582 Trey McKinney-Jones 46079 2018\n", "583 Aaron Jackson 46079 2018\n", "584 Jameel Warney 46079 2018\n", "585 Marcus Thornton II 46079 2018\n", "\n", "[586 rows x 3 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_salary2018 = salary2018.drop('Unnamed: 0', axis=1)\n", "clean_salary2018['Salary'] = clean_salary2018['Salary'].str.replace(',', '')\n", "clean_salary2018['Salary'] = clean_salary2018['Salary'].str.replace('$', '')\n", "clean_salary2018_v2 = clean_salary2018.rename(columns = {'Season': 'Year'})\n", "clean_salary2018_v2['Salary'] = pd.to_numeric(clean_salary2018_v2['Salary'])\n", "clean_salary2018_v2" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "clean_salary2019 = salary2019.drop('Unnamed: 0', axis=1)\n", "clean_salary2019['Salary'] = clean_salary2019['Salary'].str.replace(',', '')\n", "clean_salary2019['Salary'] = clean_salary2019['Salary'].str.replace('$', '')\n", "clean_salary2019['Salary'] = pd.to_numeric(clean_salary2019['Salary'])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "clean_salary2020 = salary2020.drop('Unnamed: 0', axis=1)\n", "clean_salary2020['Salary'] = clean_salary2020['Salary'].str.replace(',', '')\n", "clean_salary2020['Salary'] = clean_salary2020['Salary'].str.replace('$', '')\n", "clean_salary2020['Salary'] = pd.to_numeric(clean_salary2020['Salary'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the five salary data frames are cleaned and have the same column names, I can combine them together." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Year int64\n", "Salary Cap int64\n", "Player object\n", "Salary int64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearSalary CapPlayerSalary
0199111871000Moses Malone2406000
1199111871000Dominique Wilkins2065000
2199111871000Jon Koncak1550000
3199111871000Doc Rivers895000
4199111871000Rumeal Robinson800000
...............
143842020109140000Jeremiah Martin79568
143852020109140000Tremont Waters79568
143862020109140000Tacko Fall79568
143872020109140000Charlie Brown79568
143882020109140000Malik Newman65978
\n", "

14389 rows × 4 columns

\n", "
" ], "text/plain": [ " Year Salary Cap Player Salary\n", "0 1991 11871000 Moses Malone 2406000\n", "1 1991 11871000 Dominique Wilkins 2065000\n", "2 1991 11871000 Jon Koncak 1550000\n", "3 1991 11871000 Doc Rivers 895000\n", "4 1991 11871000 Rumeal Robinson 800000\n", "... ... ... ... ...\n", "14384 2020 109140000 Jeremiah Martin 79568\n", "14385 2020 109140000 Tremont Waters 79568\n", "14386 2020 109140000 Tacko Fall 79568\n", "14387 2020 109140000 Charlie Brown 79568\n", "14388 2020 109140000 Malik Newman 65978\n", "\n", "[14389 rows x 4 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Combine all salary and salary cap dataframes\n", "merge1 = pd.concat([clean_salary2018_v2, cleaned], axis=0)\n", "merge1\n", "\n", "merge2 = pd.concat([clean_salary2019, merge1], axis=0)\n", "merge2\n", "\n", "merge3 = pd.concat([clean_salary2020, merge2], axis=0)\n", "merge3\n", "\n", "merge4 = pd.merge(clean_cap, merge3, on='Year')\n", "merge4\n", "\n", "print(merge4.dtypes)\n", "merge4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our purposes, most of these columns won't be used in our calculations and can be dropped from the data. Additionally, a lot of data from 1947-1979 is fragmented and includes players from the ABA and BAA, so I will limit my data from seasons after 1991." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Yearplayer_idPlayerageexperiencetmgmp_per_gamex3p_per_gamex3pa_per_gametrb_per_gameast_per_gamestl_per_gameblk_per_gametov_per_gamepf_per_gamepts_per_gameSalary CapSalary
020204219Aaron Gordon24.06ORL62.032.51.23.87.73.70.80.61.62.014.410914000019863636
120204582Aaron Holiday23.02IND66.024.51.33.32.43.40.80.21.31.89.51091400002239200
220204463Abdel Nader26.03OKC55.015.80.92.31.80.70.40.40.81.46.31091400001618520
320204687Adam Mokoka21.01CHI11.010.20.51.40.90.40.40.00.21.52.910914000079568
420204688Admiral Schofield22.01WAS33.011.20.61.81.40.50.20.10.21.53.01091400001000000
............................................................
1590219912558Winston Bennett25.02CLE27.012.40.00.02.41.00.30.10.71.94.311871000525000
1590319912401Winston Garland26.04LAC69.024.70.10.42.94.61.40.11.72.78.211871000450000
1590419912278Xavier McDaniel27.06TOT81.032.50.00.16.92.30.90.62.33.317.0118710001400000
1590519912278Xavier McDaniel27.06SEA15.035.30.00.25.42.51.70.32.73.321.8118710001400000
1590619912278Xavier McDaniel27.06PHO66.031.90.00.17.22.30.80.62.23.215.8118710001400000
\n", "

15907 rows × 19 columns

\n", "
" ], "text/plain": [ " Year player_id Player age experience tm g \\\n", "0 2020 4219 Aaron Gordon 24.0 6 ORL 62.0 \n", "1 2020 4582 Aaron Holiday 23.0 2 IND 66.0 \n", "2 2020 4463 Abdel Nader 26.0 3 OKC 55.0 \n", "3 2020 4687 Adam Mokoka 21.0 1 CHI 11.0 \n", "4 2020 4688 Admiral Schofield 22.0 1 WAS 33.0 \n", "... ... ... ... ... ... ... ... \n", "15902 1991 2558 Winston Bennett 25.0 2 CLE 27.0 \n", "15903 1991 2401 Winston Garland 26.0 4 LAC 69.0 \n", "15904 1991 2278 Xavier McDaniel 27.0 6 TOT 81.0 \n", "15905 1991 2278 Xavier McDaniel 27.0 6 SEA 15.0 \n", "15906 1991 2278 Xavier McDaniel 27.0 6 PHO 66.0 \n", "\n", " mp_per_game x3p_per_game x3pa_per_game trb_per_game ast_per_game \\\n", "0 32.5 1.2 3.8 7.7 3.7 \n", "1 24.5 1.3 3.3 2.4 3.4 \n", "2 15.8 0.9 2.3 1.8 0.7 \n", "3 10.2 0.5 1.4 0.9 0.4 \n", "4 11.2 0.6 1.8 1.4 0.5 \n", "... ... ... ... ... ... \n", "15902 12.4 0.0 0.0 2.4 1.0 \n", "15903 24.7 0.1 0.4 2.9 4.6 \n", "15904 32.5 0.0 0.1 6.9 2.3 \n", "15905 35.3 0.0 0.2 5.4 2.5 \n", "15906 31.9 0.0 0.1 7.2 2.3 \n", "\n", " stl_per_game blk_per_game tov_per_game pf_per_game pts_per_game \\\n", "0 0.8 0.6 1.6 2.0 14.4 \n", "1 0.8 0.2 1.3 1.8 9.5 \n", "2 0.4 0.4 0.8 1.4 6.3 \n", "3 0.4 0.0 0.2 1.5 2.9 \n", "4 0.2 0.1 0.2 1.5 3.0 \n", "... ... ... ... ... ... \n", "15902 0.3 0.1 0.7 1.9 4.3 \n", "15903 1.4 0.1 1.7 2.7 8.2 \n", "15904 0.9 0.6 2.3 3.3 17.0 \n", "15905 1.7 0.3 2.7 3.3 21.8 \n", "15906 0.8 0.6 2.2 3.2 15.8 \n", "\n", " Salary Cap Salary \n", "0 109140000 19863636 \n", "1 109140000 2239200 \n", "2 109140000 1618520 \n", "3 109140000 79568 \n", "4 109140000 1000000 \n", "... ... ... \n", "15902 11871000 525000 \n", "15903 11871000 450000 \n", "15904 11871000 1400000 \n", "15905 11871000 1400000 \n", "15906 11871000 1400000 \n", "\n", "[15907 rows x 19 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Drop rows and combine player and salary data\n", "clean_data = df.drop(['seas_id', 'hof', 'lg', 'pos', 'birth_year', 'gs', 'fg_per_game', 'fga_per_game', 'fg_percent'], axis=1) \n", "clean_data2 = clean_data.drop(['x2p_per_game', 'x2pa_per_game', 'x2p_percent', 'x3p_percent', 'e_fg_percent', 'ft_per_game', 'fta_per_game', 'ft_percent', 'orb_per_game', 'drb_per_game'], axis=1)\n", "capitalized = clean_data2.rename(columns = {'season' : 'Year', 'player' : 'Player'})\n", "modern_data = capitalized.loc[capitalized[\"Year\"] > 1990]\n", "\n", "merge5 = pd.merge(modern_data, merge4, on=['Year', 'Player'])\n", "merge5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like a few players have multiple entries in the data due to changing teams partway through the season. In order to not have the presence of multiple players for each traded player, I will combine the multiple team players into one entry for each season. I will also add a column including the percentage of team salary cap that a player holds, and I will remove players who play less than 10 games, less than 5 minutes per game, or do not sign full-year contracts. In this way, the data is not influenced by players who play an arbitrary amount of time in the NBA or who play a very short season due to injury." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 8585 entries, 0 to 12312\n", "Data columns (total 18 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Year 8585 non-null int64 \n", " 1 Player 8585 non-null int64 \n", " 2 Age 8585 non-null float64\n", " 3 Experience 8585 non-null float64\n", " 4 Games 8585 non-null float64\n", " 5 Minutes 8585 non-null float64\n", " 6 3P Makes 8585 non-null float64\n", " 7 3P Attempts 8585 non-null float64\n", " 8 Rebounds 8585 non-null float64\n", " 9 Assists 8585 non-null float64\n", " 10 Steals 8585 non-null float64\n", " 11 Blocks 8585 non-null float64\n", " 12 Turnovers 8585 non-null float64\n", " 13 Fouls 8585 non-null float64\n", " 14 Points 8585 non-null float64\n", " 15 Salary Cap 8585 non-null float64\n", " 16 Salary 8585 non-null float64\n", " 17 Salary Percent 8585 non-null float64\n", "dtypes: float64(16), int64(2)\n", "memory usage: 1.2 MB\n" ] } ], "source": [ "# Combine rows of players who played for multiple seasons in one year\n", "\n", "grouped = merge5.groupby(['Year', 'player_id']).mean()\n", "trade_df = grouped.reset_index()\n", "\n", "# Add row of salary percentage\n", "salary_percent = pd.DataFrame(data = (trade_df['Salary'] / trade_df['Salary Cap']), columns = ['Salary Percent'])\n", "\n", "new_df = pd.concat([trade_df, salary_percent], axis=1)\n", "new_df\n", "\n", "# Filter out unnecessary rows and rename columns\n", "trade_df_games = new_df.loc[new_df['g'] > 10]\n", "trade_df_mins = trade_df_games.loc[trade_df_games['mp_per_game'] > 5]\n", "trade_df_adjusted = trade_df_mins[trade_df_mins['Salary Percent'] > 0.02]\n", "clean_df1 = trade_df_adjusted[trade_df_adjusted['Salary Percent'] < 0.6]\n", "clean_df = clean_df1.rename(columns = {'player_id': 'Player', 'age': 'Age', 'experience': 'Experience', \n", " 'g': 'Games', 'mp_per_game': 'Minutes','x3p_per_game' : '3P Makes', \n", " 'x3pa_per_game': '3P Attempts', 'trb_per_game': 'Rebounds', \n", " 'ast_per_game': 'Assists', 'stl_per_game': 'Steals', \n", " 'blk_per_game': 'Blocks', 'tov_per_game': 'Turnovers', \n", " 'pf_per_game': 'Fouls', 'pts_per_game': 'Points'})\n", "# Check for non-number columns\n", "clean_df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can finally get to some visualizations. First let's take a look at how the average 3-point attempts and makes has changed in the past 40 years in the NBA." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "line": { "color": "orange", "width": 2 }, "mode": "lines+markers", "name": "3P Attempts", "type": "scatter", "x": [ 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 ], "y": [ 0.7164887307236055, 0.7684272300469481, 0.8828605200945624, 0.9780092592592587, 1.5706293706293715, 1.6653846153846152, 1.6982779827798284, 1.2766666666666664, 1.3191812865497081, 1.370666666666667, 1.3526984126984134, 1.495906432748538, 1.4265046296296295, 1.4861409796893676, 1.5941580756013751, 1.595108077360637, 1.6823129251700695, 1.7840206185567018, 1.8098173515981748, 1.7829908675799102, 1.752551020408165, 1.8323497267759574, 1.9588996763754059, 2.044491525423729, 2.215824915824917, 2.352133794694348, 2.8233397190293736, 2.9542592592592585, 3.1987203495630454, 3.454098360655737 ] }, { "line": { "color": "red", "width": 2 }, "mode": "lines+markers", "name": "3P Misses", "type": "scatter", "x": [ 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 ], "y": [ 0.48873072360616754, 0.5167840375586851, 0.589125295508274, 0.6559027777777774, 1.0069638694638705, 1.0645687645687651, 1.09258917589176, 0.8427380952380951, 0.8701754385964915, 0.8874444444444445, 0.8769312169312176, 0.9700584795321643, 0.9275462962962961, 0.9781660692951027, 1.0310423825887751, 1.0245733788395905, 1.081292517006804, 1.1488831615120287, 1.1444063926940655, 1.158418949771691, 1.12559523809524, 1.1920218579234985, 1.262351672060412, 1.3117231638418079, 1.4374859708193053, 1.5220876585928487, 1.8014687100893998, 1.8871008939974447, 2.062109862671659, 2.2185792349726765 ] }, { "line": { "color": "green", "width": 2 }, "mode": "lines+markers", "name": "3P Makes", "type": "scatter", "x": [ 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 ], "y": [ 0.227758007117438, 0.25164319248826306, 0.2937352245862884, 0.32210648148148135, 0.5636655011655011, 0.6008158508158502, 0.6056888068880685, 0.43392857142857133, 0.4490058479532166, 0.4832222222222225, 0.4757671957671958, 0.5258479532163738, 0.4989583333333334, 0.5079749103942649, 0.5631156930126, 0.5705346985210465, 0.6010204081632654, 0.6351374570446731, 0.6654109589041093, 0.6245719178082192, 0.6269557823129248, 0.6403278688524591, 0.696548004314994, 0.732768361581921, 0.7783389450056115, 0.8300461361014994, 1.0218710089399738, 1.0671583652618137, 1.1366104868913862, 1.2355191256830604 ] } ], "layout": { "height": 750, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "Average 3-point attempts and makes of NBA players by season", "x": 0.5, "xanchor": "center", "y": 0.9, "yanchor": "top" }, "xaxis": { "title": { "text": "Year" } }, "yaxis": { "title": { "text": "3P" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Generate yearly averages\n", "seasonal = clean_df.groupby('Year').mean()\n", "averages = seasonal.reset_index()\n", "averages['3P Misses'] = averages['3P Attempts'] - averages['3P Makes']\n", "\n", "# Create plot\n", "fig = go.Figure([\n", " go.Scatter(x = averages['Year'], y = averages['3P Attempts'],line=dict(color='orange', width=2),mode='lines+markers', name = \"3P Attempts\"),\n", " go.Scatter(x = averages['Year'], y = averages['3P Misses'],line=dict(color='red', width=2),mode='lines+markers', name = \"3P Misses\"),\n", " go.Scatter(x = averages['Year'], y = averages['3P Makes'],line=dict(color='green', width=2),mode='lines+markers', name = \"3P Makes\")])\n", "fig.update_layout(\n", " height = 750,\n", " title={\n", " 'text': \"Average 3-point attempts and makes of NBA players by season\",\n", " 'y':0.9,\n", " 'x':0.5,\n", " 'xanchor': 'center',\n", " 'yanchor': 'top'},\n", " xaxis_title=\"Year\",\n", " yaxis_title=\"3P\"\n", " \n", ")\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The eye test shows a very strong positive correlation between season and 3-pointers, with averages approximately quadruping since 1991. There is growth almost year-by-year, with the exception of 1995-1997 when the league implemented a shorter 3-point line.\n", "\n", "Let's take a look at if the results are similar for if we isolate our data to elite players only (> 20 points per game)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "line": { "color": "orange", "width": 2 }, "mode": "lines+markers", "name": "3P Attempts", "type": "scatter", "x": [ 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 ], "y": [ 1.591304347826087, 1.842857142857143, 1.8799999999999997, 2.097916666666667, 3.102666666666667, 2.515, 3.504166666666667, 1.9312500000000001, 1.361904761904762, 2.2950000000000004, 2.845833333333333, 3.221333333333334, 3.404166666666667, 3.310526315789473, 3.3507246376811595, 3.4296296296296305, 3.326388888888889, 3.276666666666666, 3.1346153846153846, 2.988888888888889, 2.8183333333333325, 2.857142857142857, 3.936363636363637, 3.738888888888889, 3.961111111111111, 4.823809523809523, 5.366666666666666, 5.57361111111111, 6.093103448275862, 6.192473118279569 ] }, { "line": { "color": "red", "width": 2 }, "mode": "lines+markers", "name": "3P Misses", "type": "scatter", "x": [ 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 ], "y": [ 1.0797101449275364, 1.204761904761905, 1.2149999999999996, 1.385416666666667, 2.002666666666667, 1.58, 2.2333333333333334, 1.3, 0.9547619047619047, 1.4900000000000002, 1.8333333333333328, 2.073333333333334, 2.2236111111111114, 2.23157894736842, 2.1811594202898554, 2.174074074074075, 2.1416666666666666, 2.1166666666666654, 1.9846153846153844, 1.9814814814814816, 1.816666666666666, 1.8928571428571428, 2.436363636363637, 2.344444444444444, 2.4999999999999996, 3.0047619047619034, 3.3612903225806448, 3.5069444444444438, 3.875862068965517, 3.9043010752688163 ] }, { "line": { "color": "green", "width": 2 }, "mode": "lines+markers", "name": "3P Makes", "type": "scatter", "x": [ 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 ], "y": [ 0.5115942028985507, 0.638095238095238, 0.665, 0.7125, 1.1, 0.9350000000000002, 1.2708333333333333, 0.6312500000000001, 0.4071428571428572, 0.805, 1.0125000000000004, 1.148, 1.1805555555555556, 1.0789473684210529, 1.1695652173913043, 1.2555555555555558, 1.1847222222222222, 1.1600000000000004, 1.1500000000000001, 1.0074074074074073, 1.0016666666666665, 0.9642857142857144, 1.5, 1.3944444444444446, 1.4611111111111112, 1.8190476190476192, 2.0053763440860215, 2.0666666666666664, 2.217241379310345, 2.2881720430107526 ] } ], "layout": { "height": 750, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "Average 3-point attempts and makes of elite NBA players by season", "x": 0.5, "xanchor": "center", "y": 0.9, "yanchor": "top" }, "xaxis": { "title": { "text": "Year" } }, "yaxis": { "title": { "text": "3P" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Create a dataset of elite players\n", "elite_players = clean_df.loc[clean_df[\"Points\"] > 20]\n", "\n", "# Generate yearly averages for elite players\n", "e_seasonal = elite_players.groupby('Year').mean()\n", "e_averages = e_seasonal.reset_index()\n", "e_averages['3P Misses'] = e_averages['3P Attempts'] - e_averages['3P Makes']\n", "\n", "# Create plot\n", "fig = go.Figure([\n", " go.Scatter(x = e_averages['Year'], y = e_averages['3P Attempts'],line=dict(color='orange', width=2),mode='lines+markers', name = \"3P Attempts\"),\n", " go.Scatter(x = e_averages['Year'], y = e_averages['3P Misses'],line=dict(color='red', width=2),mode='lines+markers', name = \"3P Misses\"),\n", " go.Scatter(x = e_averages['Year'], y = e_averages['3P Makes'],line=dict(color='green', width=2),mode='lines+markers', name = \"3P Makes\")])\n", "fig.update_layout(\n", " height = 750,\n", " title={\n", " 'text': \"Average 3-point attempts and makes of elite NBA players by season\",\n", " 'y':0.9,\n", " 'x':0.5,\n", " 'xanchor': 'center',\n", " 'yanchor': 'top'},\n", " xaxis_title=\"Year\",\n", " yaxis_title=\"3P\"\n", " \n", ")\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 3-point attempts and makes for elite players follows a similar trend with strong positive correlation with season, however with around double the attempts and makes per-year, likely explained by the higher usage rate of elite players.\n", "\n", "It is worth noting that 3-point attempts and makes per game have reached approximately quadrupled since 1991." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now lets take a look at how the salary cap has changed during the same period." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "line": { "color": "green", "width": 2 }, "mode": "lines+markers", "name": "Cap", "type": "scatter", "x": [ 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 ], "y": [ 11871000, 12500000, 14000000, 15175000, 15964000, 23000000, 24363000, 26900000, 30000000, 34000000, 35500000, 42500000, 40271000, 43840000, 43870000, 49500000, 53135000, 55630000, 58680000, 57700000, 58044000, 58044000, 58044000, 58679000, 63065000, 70000000, 94143000, 99093000, 101869000, 109140000 ] } ], "layout": { "height": 750, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "NBA Team Salary Cap by Year (Nominal Dollars)", "x": 0.5, "xanchor": "center", "y": 0.9, "yanchor": "top" }, "xaxis": { "title": { "text": "Year" } }, "yaxis": { "title": { "text": "Dollars" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Create plot\n", "fig = go.Figure([\n", " go.Scatter(x = averages['Year'], y = averages['Salary Cap'],line=dict(color='green', width=2),mode='lines+markers', name = \"Cap\")])\n", "fig.update_layout(\n", " height = 750,\n", " title={\n", " 'text': \"NBA Team Salary Cap by Year (Nominal Dollars)\",\n", " 'y':0.9,\n", " 'x':0.5,\n", " 'xanchor': 'center',\n", " 'yanchor': 'top'},\n", " xaxis_title=\"Year\",\n", " yaxis_title=\"Dollars\"\n", " \n", ")\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The salary cap has increased tenfold in the last 30 years, likely due to inflation and the increase in popularity of the NBA. To adjust for varying salary caps, I will use the percentage of the team salary cap that a player earns.\n", "\n", "Let's do a preliminary analysis of the correlation between the variables we would like to look at. First we will split our data into two equal time frames: 1991-2005 and 2006-2020." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Salary Percent3P Makes
Salary Percent1.0000000.167274
3P Makes0.1672741.000000
\n", "
" ], "text/plain": [ " Salary Percent 3P Makes\n", "Salary Percent 1.000000 0.167274\n", "3P Makes 0.167274 1.000000" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "older_df = clean_df[clean_df['Year'] < 2006]\n", "newer_df = clean_df[clean_df['Year'] > 2005]\n", "\n", "# Combined dataframe 3P correlation\n", "clean_df[['Salary Percent', '3P Makes']].corr()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Salary Percent3P Makes
Salary Percent1.0000000.143388
3P Makes0.1433881.000000
\n", "
" ], "text/plain": [ " Salary Percent 3P Makes\n", "Salary Percent 1.000000 0.143388\n", "3P Makes 0.143388 1.000000" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Older data 3P correlation\n", "older_df[['Salary Percent', '3P Makes']].corr()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Salary Percent3P Makes
Salary Percent1.0000000.208503
3P Makes0.2085031.000000
\n", "
" ], "text/plain": [ " Salary Percent 3P Makes\n", "Salary Percent 1.000000 0.208503\n", "3P Makes 0.208503 1.000000" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Newer data 3P correlation\n", "newer_df[['Salary Percent', '3P Makes']].corr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a promising start. There is positive correlation between salary percent and 3-points per game in both time frames, and about a 0.06 increase from the older to newer data.\n", "\n", "Let's start with a simple linear regression on our filtered dataframe." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple Linear Regression" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Simple linear model: Salary Percent = 0.0907 + 0.0193 3P Makes\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn import linear_model\n", "\n", "\n", "X = clean_df.copy()\n", "y = clean_df['Salary Percent']\n", "\n", "# construct the model instance\n", "simple_lr_model = linear_model.LinearRegression()\n", "\n", "# fit the model\n", "simple_lr_model.fit(X[[\"3P Makes\"]], y)\n", "\n", "# print the coefficients\n", "beta_0 = simple_lr_model.intercept_\n", "beta_1 = simple_lr_model.coef_[0]\n", "\n", "print(f\"Simple linear model: Salary Percent = {beta_0:.4f} + {beta_1:.4f} 3P Makes\")\n", "\n", "sns.lmplot(\n", " data=clean_df, x=\"3P Makes\", y=\"Salary Percent\", height=6,\n", " scatter_kws=dict(s=5, alpha=0.5),\n", " line_kws={'color': 'orange'}\n", ");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While here we can see positive correlation between 3-points made and salary, the data is not a great fit, and we are still looking at our time period as a whole.\n", "\n", "The same regression will be run with our data split between older and newer data to see if beta_1 increases as 3-pointers become more common in the modern era." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Old simple linear model: Salary Percent = 0.0958 + 0.0197 3P Makes\n", "Mean squared error is 0.006662360293839116\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "X = older_df.copy()\n", "y = older_df['Salary Percent']\n", "\n", "oldsimple_lr_model = linear_model.LinearRegression()\n", "\n", "oldsimple_lr_model.fit(X[[\"3P Makes\"]], y)\n", "\n", "oldbeta_0 = oldsimple_lr_model.intercept_\n", "oldbeta_1 = oldsimple_lr_model.coef_[0]\n", "\n", "print(f\"Old simple linear model: Salary Percent = {oldbeta_0:.4f} + {oldbeta_1:.4f} 3P Makes\")\n", "\n", "sns.lmplot(\n", " data=older_df, x=\"3P Makes\", y=\"Salary Percent\", height=6,\n", " scatter_kws=dict(s=5, alpha=0.5),\n", " line_kws={'color': 'orange'}\n", ");\n", "\n", "old_mse = metrics.mean_squared_error(y, oldsimple_lr_model.predict(X[['3P Makes']]))\n", "print('Mean squared error is', old_mse) " ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New simple linear model: Salary Percent = 0.0833 + 0.0219 3P Makes\n", "Mean squared error is 0.006293510029486172\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "X = newer_df.copy()\n", "y = newer_df['Salary Percent']\n", "\n", "newsimple_lr_model = linear_model.LinearRegression()\n", "\n", "newsimple_lr_model.fit(X[[\"3P Makes\"]], y)\n", "\n", "newbeta_0 = newsimple_lr_model.intercept_\n", "newbeta_1 = newsimple_lr_model.coef_[0]\n", "\n", "print(f\"New simple linear model: Salary Percent = {newbeta_0:.4f} + {newbeta_1:.4f} 3P Makes\")\n", "\n", "sns.lmplot(\n", " data=newer_df, x=\"3P Makes\", y=\"Salary Percent\", height=8,\n", " scatter_kws=dict(s=5, alpha=0.5),\n", " line_kws={'color': 'orange'}\n", ");\n", "\n", "new_mse = metrics.mean_squared_error(y, newsimple_lr_model.predict(X[['3P Makes']]))\n", "print('Mean squared error is', new_mse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately the data is still not a great fit for this simple linear regression. However, it looks like our beta_1 is about 12% higher in our newer data compared to the older data. That is a good sign. Let's see how they fare when we include our other variables as well.\n", "\n", "Before we generate some multivariate linear regressions, it is a good idea to check the correlation between each of our predictor variables, in order to eliminate variables that show signs of multicollinearity." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.subplots(figsize=(10,10))\n", "sns.heatmap(clean_df.corr(), linewidths = 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will remove all salary information as we are using salary percentage as our dependent variable. I'll also remove Player, as player_id is an arbitrary number, and Year, as it is already factored into our two datasets. Finally, I'll remove Age for its high correlation with Experience, and 3P Attempts for its high correlation with 3P Makes." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "newindep_variables = newer_df.drop(['Salary', 'Salary Cap', 'Salary Percent', 'Year', 'Player', 'Age', '3P Attempts', 'Points'], axis=1)\n", "oldindep_variables = older_df.drop(['Salary', 'Salary Cap', 'Salary Percent', 'Year', 'Player', 'Age', '3P Attempts', 'Points'], axis=1)\n", "plt.subplots(figsize=(10,5))\n", "sns.heatmap(newindep_variables.corr(), linewidths = 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multiple Linear Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the predictor variables have been filtered, I will run a multiple linear regression on each of the old and new data." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Old Intercept: -0.01083027981862214\n", "Experience : 0.006356229149073229\n", "Games : -0.00035442506394580287\n", "Minutes Played : 0.0017468426427601413\n", "3P Made : 0.010412184611248551\n", "Rebounds : 0.006054133896862581\n", "Assists : 0.002317022298807209\n", "Steals : -0.01885485384063002\n", "Blocks : 0.02910799228369434\n", "Turnovers : 0.026474773398563668\n", "Fouls : -0.009810582336634172\n" ] } ], "source": [ "X = oldindep_variables.copy()\n", "y = older_df['Salary Percent']\n", "\n", "categories = ['Experience', 'Games', 'Minutes Played', '3P Made', \n", " 'Rebounds', 'Assists', 'Steals', 'Blocks', 'Turnovers', 'Fouls']\n", "\n", "oldmulti_regr = linear_model.LinearRegression()\n", "oldmulti_regr.fit(X, y)\n", "\n", "pairs = zip(categories, oldmulti_regr.coef_)\n", "\n", "print('Old Intercept:', oldmulti_regr.intercept_)\n", "for (category, coef) in pairs:\n", " print(category, ': ', coef)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New Intercept: -0.03617121782411255\n", "Experience : 0.007216528291439452\n", "Games : -0.0004742823454764643\n", "Minutes Played : 0.0028959302451106697\n", "3P Made : 0.004759857038476971\n", "Rebounds : 0.007801379347796547\n", "Assists : 0.00036986302927898517\n", "Steals : -0.010678000581346946\n", "Blocks : 0.010584607766033583\n", "Turnovers : 0.02724818053296039\n", "Fouls : -0.01403652911810434\n" ] } ], "source": [ "X = newindep_variables.copy()\n", "y = newer_df['Salary Percent']\n", "\n", "newmulti_regr = linear_model.LinearRegression()\n", "newmulti_regr.fit(X, y)\n", "\n", "pairs = zip(categories, newmulti_regr.coef_)\n", "\n", "print('New Intercept:', newmulti_regr.intercept_)\n", "for (category, coef) in pairs:\n", " print(category, ': ', coef)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Interestingly, turnovers (a negative stat) have a positive coefficient. This likely can be explained by the high turnover count of elite players, as only top players handle the ball enough to generate a high turnover count.\n", "\n", "3-points made are positive in both regressions, which is a good sign, however the coefficient is smaller in our newer data.\n", "\n", "Let's compare our multivariate linear regressions with lasso regressions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lasso Regression" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lassolinreg
Experience0.0050690.006356
Games-0.000397-0.000354
Minutes0.0034910.001747
3P Makes0.0000000.010412
Rebounds0.0061000.006054
Assists0.0000000.002317
Steals-0.000000-0.018855
Blocks0.0000000.029108
Turnovers0.0000000.026475
Fouls-0.000000-0.009811
\n", "
" ], "text/plain": [ " lasso linreg\n", "Experience 0.005069 0.006356\n", "Games -0.000397 -0.000354\n", "Minutes 0.003491 0.001747\n", "3P Makes 0.000000 0.010412\n", "Rebounds 0.006100 0.006054\n", "Assists 0.000000 0.002317\n", "Steals -0.000000 -0.018855\n", "Blocks 0.000000 0.029108\n", "Turnovers 0.000000 0.026475\n", "Fouls -0.000000 -0.009811" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = oldindep_variables.copy()\n", "y = older_df['Salary Percent']\n", "\n", "oldlasso_model = linear_model.Lasso(alpha = 0.01)\n", "oldlasso_model.fit(X, y)\n", "\n", "oldlasso_coefs = pd.Series(dict(zip(list(X), oldlasso_model.coef_)))\n", "oldlr_coefs = pd.Series(dict(zip(list(X), oldmulti_regr.coef_)))\n", "oldlasso_vs_linreg = pd.DataFrame(dict(lasso=oldlasso_coefs, linreg=oldlr_coefs))\n", "oldlasso_vs_linreg" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lassolinreg
Experience0.0066470.007217
Games-0.000560-0.000474
Minutes0.0044960.002896
3P Makes0.0000000.004760
Rebounds0.0055500.007801
Assists0.0008120.000370
Steals-0.000000-0.010678
Blocks0.0000000.010585
Turnovers0.0000000.027248
Fouls-0.000000-0.014037
\n", "
" ], "text/plain": [ " lasso linreg\n", "Experience 0.006647 0.007217\n", "Games -0.000560 -0.000474\n", "Minutes 0.004496 0.002896\n", "3P Makes 0.000000 0.004760\n", "Rebounds 0.005550 0.007801\n", "Assists 0.000812 0.000370\n", "Steals -0.000000 -0.010678\n", "Blocks 0.000000 0.010585\n", "Turnovers 0.000000 0.027248\n", "Fouls -0.000000 -0.014037" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = newindep_variables.copy()\n", "y = newer_df['Salary Percent']\n", "\n", "newlasso_model = linear_model.Lasso(alpha = 0.01)\n", "newlasso_model.fit(X, y)\n", "\n", "newlasso_coefs = pd.Series(dict(zip(list(X), newlasso_model.coef_)))\n", "newlr_coefs = pd.Series(dict(zip(list(X), newmulti_regr.coef_)))\n", "newlasso_vs_linreg = pd.DataFrame(dict(lasso=newlasso_coefs, linreg=newlr_coefs))\n", "newlasso_vs_linreg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even with a low alpha regularization parameter of 0.01, the '3P Makes' column is thrown out in both the older and newer data. This is due to either a lack of prediction power in '3P Makes', or still too much multicollinearity between predictor variables.\n", "\n", "Let's generate training and testing data to see how the mean squared error compares in our linear and lasso regressions." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Old linear MSE train: 0.0034287730803641498 , MSE test: 0.010252713402191525\n" ] } ], "source": [ "# Compute MSE values for all 4 regressions\n", "\n", "n_test = 75\n", "X_train = X.iloc[:n_test, :]\n", "X_test = X.iloc[n_test:, :]\n", "y_train = y.iloc[:n_test]\n", "y_test = y.iloc[n_test:]\n", "\n", "oldmulti_regr.fit(X_train, y_train)\n", "print('Old linear MSE train:', metrics.mean_squared_error(y_train, oldmulti_regr.predict(X_train)), ', '\n", " 'MSE test:', metrics.mean_squared_error(y_test, oldmulti_regr.predict(X_test)))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Old lasso MSE train: 0.004165405010664216 , MSE test: 0.007019333696622415\n" ] } ], "source": [ "oldlasso_model.fit(X_train, y_train)\n", "print('Old lasso MSE train:', metrics.mean_squared_error(y_train, oldlasso_model.predict(X_train)), ', '\n", " 'MSE test:', metrics.mean_squared_error(y_test, oldlasso_model.predict(X_test)))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New linear MSE train: 0.0034287730803641498 , MSE test: 0.010252713402191525\n" ] } ], "source": [ "newmulti_regr.fit(X_train, y_train)\n", "print('New linear MSE train:', metrics.mean_squared_error(y_train, newmulti_regr.predict(X_train)), ', '\n", " 'MSE test:', metrics.mean_squared_error(y_test, newmulti_regr.predict(X_test)))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New lasso MSE train: 0.004165405010664216 , MSE test: 0.007019333696622415\n" ] } ], "source": [ "newlasso_model.fit(X_train, y_train)\n", "print('New lasso MSE train:', metrics.mean_squared_error(y_train, newlasso_model.predict(X_train)), ', '\n", " 'MSE test:', metrics.mean_squared_error(y_test, newlasso_model.predict(X_test)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test MSE improves in the lasso models, which mean they are a more accurate measurement of the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "I had a few interesting takeaways from this project.\n", "\n", "1. The increase in 3-point shooting by year is evident, and the increase in both attempts and makes has been positive almost every year since 1990.\n", "\n", "2. Stronger correlation and a higher beta_1 coefficient in the newer data of the simple linear regression point towards an increase in value for 3-point shooting when determining player contracts.\n", "\n", "3. The relationship between 3-point shooting and player salary is hard to determine in the multiple linear and lasso regressions, likely due in part to the multicollinearity of predictor variables.\n", "\n", "\n", "While I enjoyed analyzing NBA data on this project, I will have to continue to experiment with different predictor variables and time frames to see if more evidence of the relationship between 3-point shooting and player salary exists." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }