{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# FBref Player Stats Data Engineering\n", "##### Notebook to engineer raw data from [FBref](https://fbref.com/en/) via [StatsBomb](https://statsbomb.com/) \n", "\n", "### By [Edd Webster](https://www.twitter.com/eddwebster)\n", "Notebook first written: 26/12/2020
\n", "Notebook last updated: 01/08/2021\n", "\n", "![title](../../img/fbref-logo-banner.png)\n", "\n", "![title](../../img/stats-bomb-logo.png)\n", "\n", "Click [here](#section5) to jump straight to the Exploratory Data Analysis section and skip the [Task Brief](#section2), [Data Sources](#section3), and [Data Engineering](#section4) sections. Or click [here](#section6) to jump straight to the Conclusion." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "\n", "## Introduction\n", "This notebook scrapes player statstics data from [FBref](https://fbref.com/en/) via [StatsBomb](https://statsbomb.com/), using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames, [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.\n", "\n", "For more information about this notebook and the author, I am available through all the following channels:\n", "* [eddwebster.com](https://www.eddwebster.com/);\n", "* edd.j.webster@gmail.com;\n", "* [@eddwebster](https://www.twitter.com/eddwebster);\n", "* [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);\n", "* [github/eddwebster](https://github.com/eddwebster/);\n", "* [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster);\n", "* [kaggle.com/eddwebster](https://www.kaggle.com/eddwebster); and\n", "* [hackerrank.com/eddwebster](https://www.hackerrank.com/eddwebster).\n", "\n", "![title](../../img/fifa21eddwebsterbanner.png)\n", "\n", "The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/football_analytics) and a static version of this notebook can be found [here](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/A%29%20Web%20Scraping/FBref%20Web%20Scraping%20and%20Parsing.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "\n", "## Notebook Contents\n", "1. [Notebook Dependencies](#section1)
\n", "2. [Project Brief](#section2)
\n", "3. [Data Sources](#section3)
\n", " 1. [Introduction](#section3.1)
\n", " 2. [Outfielder Players](#section3.2)
\n", " 1. [Data Dictionary](#section3.2.1)
\n", " 2. [Creating the DataFrame](#section3.2.2)
\n", " 3. [Initial Data Handling](#section3.2.3)
\n", " 4. [Export the Raw DataFrame](#section3.2.4)
\n", " 3. [Goalkeepers](#section3.3)
\n", " 1. [Data Dictionary](#section3.3.1)
\n", " 2. [Creating the DataFrame](#section3.3.2)
\n", " 3. [Initial Data Handling](#section3.3.3)
\n", " 4. [Export the Raw DataFrame](#section3.3.4)
\n", "4. [Data Engineering](#section4)
\n", " 1. [Outfielder Players](#section4.1)
\n", " 2. [Goalkeepers](#section4.2)
\n", " 3. [Outfielder Players and Goalkeepers Combined](#section4.3)
\n", "5. [Summary](#section5)
\n", "6. [Next Steps](#section6)
\n", "7. [References](#section7)
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "\n", "## 1. Notebook Dependencies\n", "\n", "This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:\n", "* [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;\n", "* [`NumPy`](http://www.numpy.org/) for multidimensional array computing;\n", "* [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation; and\n", "* [`matplotlib`](https://matplotlib.org/contents.html?v=20200411155018) for data visualisations.\n", "\n", "All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import Libraries and Modules" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setup Complete\n" ] } ], "source": [ "# Python ≥3.5 (ideally)\n", "import platform\n", "import sys, getopt\n", "assert sys.version_info >= (3, 5)\n", "import csv\n", "\n", "# Import Dependencies\n", "%matplotlib inline\n", "\n", "# Math Operations\n", "import numpy as np\n", "from math import pi\n", "\n", "# Datetime\n", "import datetime\n", "from datetime import date\n", "import time\n", "\n", "# Data Preprocessing\n", "import pandas as pd\n", "#import pandas_profiling as pp\n", "import os\n", "import re\n", "import random\n", "from io import BytesIO\n", "from pathlib import Path\n", "\n", "# Reading directories\n", "import glob\n", "import os\n", "\n", "# Working with JSON\n", "import json\n", "from pandas.io.json import json_normalize\n", "\n", "# Web Scraping\n", "import requests\n", "from bs4 import BeautifulSoup\n", "import re\n", "\n", "# Fuzzy Matching - Record Linkage\n", "import recordlinkage\n", "import jellyfish\n", "import numexpr as ne\n", "\n", "# Data Visualisation\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "plt.style.use('seaborn-whitegrid')\n", "import missingno as msno\n", "\n", "# Progress Bar\n", "from tqdm import tqdm\n", "\n", "# Display in Jupyter\n", "from IPython.display import Image, YouTubeVideo\n", "from IPython.core.display import HTML\n", "\n", "# Ignore Warnings\n", "import warnings\n", "warnings.filterwarnings(action=\"ignore\", message=\"^internal gelsd\")\n", "\n", "print('Setup Complete')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python: 3.7.6\n", "NumPy: 1.20.3\n", "pandas: 1.3.2\n", "matplotlib: 3.4.2\n" ] } ], "source": [ "# Python / module versions used here for reference\n", "print('Python: {}'.format(platform.python_version()))\n", "print('NumPy: {}'.format(np.__version__))\n", "print('pandas: {}'.format(pd.__version__))\n", "print('matplotlib: {}'.format(mpl.__version__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define Filepaths" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Set up initial paths to subfolders\n", "base_dir = os.path.join('..', '..', )\n", "data_dir = os.path.join(base_dir, 'data')\n", "data_dir_fbref = os.path.join(base_dir, 'data', 'fbref')\n", "img_dir = os.path.join(base_dir, 'img')\n", "fig_dir = os.path.join(base_dir, 'img', 'fig')\n", "video_dir = os.path.join(base_dir, 'video')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defined Variables" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Defined Bariables\n", "\n", "## Define today's date\n", "today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defined Dictionaries" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Defined Dictionaries\n", "\n", "## Define league names and their IDs\n", "dict_league_ids = {'Premier-League': '9',\n", " 'Ligue-1': '13',\n", " 'Bundesliga': '20',\n", " 'Serie-A': '11',\n", " 'La-Liga': '12',\n", " 'Major-League-Soccer': '22',\n", " 'Big-5-European-Leagues': 'Big5'\n", " }\n", "\n", "## Define league names and cleaned league names\n", "dict_league_names = {'eng Premier League': 'Premier League',\n", " 'fr Ligue 1': 'Ligue 1',\n", " 'de Bundesliga': 'Bundeliga',\n", " 'it Serie A': 'Serie A',\n", " 'es La Liga': 'La Liga'\n", " }\n", "\n", "## Define positions and their grouped position names\n", "dict_positions_grouped = {'DF': 'Defender',\n", " 'DF,FW': 'Defender',\n", " 'DF,GK': 'Defender',\n", " 'DF,MF': 'Defender',\n", " 'FW': 'Forward',\n", " 'FW,DF': 'Forward',\n", " 'FW,MF': 'Forward',\n", " 'GK': 'Goalkeeper',\n", " 'GK,FW': 'Goalkeeper',\n", " 'MF': 'Midfielder',\n", " 'MF,DF': 'Midfielder',\n", " 'MF,FW': 'Midfielder',\n", " 'MF,GK': 'Midfielder',\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defined Lists" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Defined Lists\n", "\n", "## Define list of long names for 'Big 5' European Leagues and MLS\n", "lst_league_names_long = ['Premier-League', 'Ligue-1', 'Bundesliga', 'Serie-A', 'La-Liga', 'Major-League-Soccer', 'Big-5-European-Leagues']\n", "\n", "## Define seasons to scrape\n", "lst_seasons = ['2017-2018', '2018-2019', '2019-2020', '2020-2021', '2021-2022']\n", "\n", "## Define list of folders\n", "lst_folders = ['raw', 'engineered', 'reference']\n", "\n", "## Define list of folders\n", "lst_data_types = ['goalkeeper', 'outfield', 'team']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create Directory Structure\n", "Create folders and subfolders for data, if not already created." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Make the data directory structure\n", "for folder in lst_folders:\n", " path = os.path.join(data_dir_fbref, folder)\n", " if not os.path.exists(path):\n", " os.mkdir(path)\n", " for data_types in lst_data_types:\n", " path = os.path.join(data_dir_fbref, folder, data_types)\n", " if not os.path.exists(path):\n", " os.mkdir(path)\n", " os.mkdir(os.path.join(path, 'archive'))\n", " for league in lst_league_names_long:\n", " path = os.path.join(data_dir_fbref, folder, data_types, league)\n", " if not os.path.exists(path):\n", " os.mkdir(path)\n", " for season in lst_seasons:\n", " path = os.path.join(data_dir_fbref, folder, data_types, league, season)\n", " if not os.path.exists(path):\n", " os.mkdir(path)\n", " os.mkdir(os.path.join(path, 'archive'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook Settings" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Display all columns of pandas DataFrames\n", "pd.set_option('display.max_columns', None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "## 2. Project Brief\n", "This Jupyter notebook is part of a series of notebooks to scrape, parse, engineer, unify, and the model, culminating in a an Expected Transfer (xTransfer) player performance vs. valuation model. This model aims to determine the under- and over-performing players based on their on-the-pitch output against transfer fee and wages.\n", "\n", "This particular notebook is one of several data engineering notebooks, that takes scraped data from [FBref](https://fbref.com/en/), provided by [StatsBomb](https://statsbomb.com/), and and engineers it using [pandas](http://pandas.pydata.org/) through the manipulation of DataFrames.\n", "\n", "This notebook, along with the other notebooks in this project workflow are shown in the following diagram:\n", "\n", "![roadmap](../../img/football_analytics_data_roadmap.png)\n", "\n", "Links to these notebooks in the [`football_analytics`](https://github.com/eddwebster/football_analytics) GitHub repository can be found at the following:\n", "* [Webscraping](https://github.com/eddwebster/football_analytics/tree/master/notebooks/1_data_scraping)\n", " + [FBref Player Stats Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/FBref%20Player%20Stats%20Web%20Scraping.ipynb)\n", " + [TransferMarket Player Bio and Status Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Bio%20and%20Status%20Web%20Scraping.ipynb)\n", " + [TransferMarket Player Valuation Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Valuation%20Web%20Scraping.ipynb)\n", " + [TransferMarkt Player Recorded Transfer Fees Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Webscraping.ipynb)\n", " + [Capology Player Salary Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/Capology%20Player%20Salary%20Web%20Scraping.ipynb)\n", " + [FBref Team Stats Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/FBref%20Team%20Stats%20Web%20Scraping.ipynb)\n", "* [Data Parsing](https://github.com/eddwebster/football_analytics/tree/master/notebooks/2_data_parsing)\n", " + [ELO Team Ratings Data Parsing](https://github.com/eddwebster/football_analytics/blob/master/notebooks/2_data_parsing/ELO%20Team%20Ratings%20Data%20Parsing.ipynb)\n", "* [Data Engineering](https://github.com/eddwebster/football_analytics/tree/master/notebooks/3_data_engineering)\n", " + [FBref Player Stats Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/FBref%20Player%20Stats%20Data%20Engineering.ipynb)\n", " + [TransferMarket Player Bio and Status Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Bio%20and%20Status%20Data%20Engineering.ipynb)\n", " + [TransferMarket Player Valuation Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Valuation%20Data%20Engineering.ipynb)\n", " + [TransferMarkt Player Recorded Transfer Fees Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Data%20Engineering.ipynb)\n", " + [Capology Player Salary Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb)\n", " + [FBref Team Stats Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/FBref%20Team%20Stats%20Data%20Engineering.ipynb)\n", " + [ELO Team Ratings Data Parsing](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/ELO%20Team%20Ratings%20Data%20Parsing.ipynb)\n", " + [TransferMarkt Team Recorded Transfer Fee Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Team%20Recorded%20Transfer%20Fee%20Data%20Engineering.ipynb) (aggregated from [TransferMarkt Player Recorded Transfer Fees notebook](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Data%20Engineering.ipynb))\n", " + [Capology Team Salary Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Team%20Salary%20Data%20Engineering.ipynb) (aggregated from [Capology Player Salary notebook](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb))\n", "* [Data Unification](https://github.com/eddwebster/football_analytics/tree/master/notebooks/4_data_unification)\n", " + [Golden ID for Player Level Datasets](https://github.com/eddwebster/football_analytics/blob/master/notebooks/4_data_unification/Golden%20ID%20for%20Player%20Level%20Datasets.ipynb)\n", " + [Golden ID for Team Level Datasets](https://github.com/eddwebster/football_analytics/blob/master/notebooks/4_data_unification/Golden%20ID%20for%20Team%20Level%20Datasets.ipynb)\n", "* [Production Datasets](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_production_datasets)\n", " + [Player Performance/Market Value Dataset](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_production_datasets/Player%20Performance/Market%20Value%20Dataset.ipynb)\n", " + [Team Performance/Market Value Dataset](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_production_datasets/Team%20Performance/Market%20Value%20Dataset.ipynb)\n", "* [Expected Transfer (xTransfer) Modeling](https://github.com/eddwebster/football_analytics/tree/master/notebooks/6_data_analysis_and_projects/expected_transfer_modeling)\n", " + [Expected Transfer (xTransfer) Modeling](https://github.com/eddwebster/football_analytics/tree/master/notebooks/6_data_analysis_and_projects/expected_transfer_modeling/Expected%20Transfer%20%20Modeling.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "## 3. Data Sources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### 3.1. Introduction\n", "This player data to be engineered in this notebook was done separately for outfielders and goalkeepers. This Data Sources subsection is split into two sections for each of these data sets.\n", "\n", "We'll be using the [pandas](http://pandas.pydata.org/) library to import our data to this workbook as a DataFrame." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### 3.2. Outfield Players" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 3.2.1. Data Dictionary\n", "The raw dataset has one hundred and X features (columns) with the following definitions and data types:\n", "\n", "The raw dataset has one hundred and ninety two features (columns) with the following definitions and data types:\n", "\n", "\n", "| Variable | Data Type | Description | Stats Type |\n", "|------|-----|-----|-----|\n", "| `Player` | object | **Player Name** | Standard Stats |\n", "| `Nation` | float64 | **Nationality of the player.** | Standard Stats |\n", "| `Pos` | float64 | **Position** | Standard Stats |\n", "| `Squad` | float64 | | Standard Stats |\n", "| `Comp` | float64 | | Standard Stats |\n", "| `Age` | float64 | **Current Age** | Standard Stats |\n", "| `Born` | float64 | **Year of Birth** | Standard Stats |\n", "| `MP` | float64 | **Matched Played** | Standard Stats |\n", "| `Starts` | float64 | **Games Started** | Standard Stats |\n", "| `Min` | float64 | **Minutes** | Standard Stats |\n", "| `90s` | float64 | **90s Played**. Minutes played divided by 90. | Standard Stats |\n", "| `Gls` | float64 | **Goals**. Goals scored or allowed | Standard Stats |\n", "| `Ast` | float64 | **Assists** | Standard Stats |\n", "| `G-PK` | float64 | **Non-Penalty Goals** | Standard Stats |\n", "| `PK` | float64 | **Penalty Kicks Made** | Standard Stats |\n", "| `PKatt` | float64 | **Penalty Kicks Attempted** | Standard Stats |\n", "| `CrdY` | float64 | **Yellow Cards** | Standard Stats |\n", "| `CrdR` | float64 | **Red Cards** | Standard Stats |\n", "| `G+A` | float64 | **Goals Scored per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader | Standard Stats |\n", "| `G+A-PK` | float64 | **Goals plus Assists minus Penalty Kicks made per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader | Standard Stats |\n", "| `xG` | float64 | **Expected Goals**. xG totals include penalty kicks, but do not include penalty shootouts. | Standard Stats |\n", "| `npxG` | float64 | **Non-Penalty Expected Goals per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader | Standard Stats |\n", "| `xA` | float64 | **xG Assisted**. xG which follows a pass that assists a shot. | Standard Stats |\n", "| `npxG+xA` | float64 | **Non-Penalty Expected Goals plus xG Assisted per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader | Standard Stats |\n", "| `xG+xA` | float64 | **Expected Goals plus Assist per 90 minutes**. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). Minimum 30 minutes played per squad game to qualify as a leader | Standard Stats |\n", "| `Sh` | float64 | **Shots Total**. Does not include penalty kicks. | Shooting Stats |\n", "| `SoT` | float64 | **Shots on target**. Note: Shots on target do not include penalty kicks. | Shooting Stats |\n", "| `SoT%` | float64 | **Shots on target percentage**. Percentage of shots that are on target. Minimum .395 shots per squad game to qualify as a leader. Note: Shots on target do not include penalty kicks | Shooting Stats |\n", "| `Sh/90` | float64 | **Shots total per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader | Shooting Stats |\n", "| `SoT/90` | float64 | **Shots on target per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader. Note: Shots on target do not include penalty kicks | Shooting Stats |\n", "| `G/Sh` | float64 | **Goals per shot**. Minimum .395 shots per squad game to qualify as a leader. | Shooting Stats |\n", "| `G/SoT` | float64 | **Goals per shot on target**. Minimum .111 shots on target per squad game to qualify as a leader. Note: Shots on target do not include penalty kicks. | Shooting Stats |\n", "| `Dist` | float64 | **Average distance, in yards, from goal of all shots taken**. Minimum .395 shots per squad game to qualify as a leader. Does not include penalty kicks. | Shooting Stats |\n", "| `FK` | float64 | **Shots from free kicks**. | Shooting Stats |\n", "| `npxG/Sh` | float64 | **Non-Penalty Expected Goals per shot**. Minimum .395 shots per squad game to qualify as a leader. | Shooting Stats |\n", "| `G-xG` | float64 | **Goals minus Expected Goals**. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Shooting Stats |\n", "| `np:G-xG` | float64 | **Non-Penalty Goals minus Non-Penalty Expected Goals**. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Shooting Stats |\n", "| `Cmp` | float64 | **Passes Completed**. | Passing Stats |\n", "| `Att` | float64 | **Passes Attempted**. | Passing Stats |\n", "| `Cmp%` | float64 | **Pass Completion Percentage**. Minimum 30 minutes played per squad game to qualify as a leader | Passing Stats |\n", "| `TotDist` | float64 | **Total distance, in yards, that completed passes have traveled in any direction**. | Passing Stats |\n", "| `PrgDist` | float64 | **Progressive Distance**. Total distance, in yards, that completed passes have traveled towards the opponent's goal. Note: Passes away from opponent's goal are counted as zero progressive yards. | Passing Stats |\n", "| `A-xA` | float64 | **Assists minus xG Assisted**. | Passing Stats |\n", "| `KP` | float64 | **Passes that directly lead to a shot (assisted shots)**. | Passing Stats |\n", "| `1/3` | float64 | **Completed passes that enter the 1/3 of the pitch closest to the goal**. Not including set pieces. | Passing Stats |\n", "| `PPA` | float64 | **Completed passes into the 18-yard box**. Not including set pieces. | Passing Stats |\n", "| `CrsPA` | float64 | **Completed crosses into the 18-yard box**. Not including set pieces. | Passing Stats |\n", "| `Prod` | float64 | **Progressive Passes**. Completed passes that move the ball towards the opponent's goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area. Excludes passes from the defending 40% of the pitch | Passing Stats |\n", "| `Live` | float64 | **Live-ball passes**. | Pass Type Stats |\n", "| `Dead` | float64 | **Dead-ball passes**. Includes free kicks, corner kicks, kick offs, throw-ins and goal kicks. | Pass Type Stats |\n", "| `TB` | float64 | **Completed pass sent between back defenders into open space**. | Pass Type Stats |\n", "| `Press` | float64 | **Passes made while under pressure from opponent**. | Pass Type Stats |\n", "| `Sw` | float64 | **Passes that travel more than 40 yards of the width of the pitch**. | Pass Type Stats |\n", "| `Crs` | float64 | **Crosses**. | Pass Type Stats |\n", "| `CK` | float64 | **Corner Kicks**. | Pass Type Stats |\n", "| `In` | float64 | **Inswinging Corner Kicks**. | Pass Type Stats |\n", "| `Out` | float64 | **Outswinging Corner Kicks**. | Pass Type Stats |\n", "| `Str` | float64 | **Straight Corner Kicks**. | Pass Type Stats |\n", "| `Groud` | float64 | **Ground passes**. | Pass Type Stats |\n", "| `Low` | float64 | **Passes that leave the ground, but stay below shoulder-level**. | Pass Type Stats |\n", "| `High` | float64 | **Passes that are above shoulder-level at the peak height**. | Pass Type Stats |\n", "| `Left` | float64 | **Passes attempted using left foot**. | Pass Type Stats |\n", "| `Right` | float64 | **Passes attempted using right foot**. | Pass Type Stats |\n", "| `Head` | float64 | **Passes attempted using head**. | Pass Type Stats |\n", "| `TI` | float64 | **Throw-Ins taken**. | Pass Type Stats |\n", "| `Other` | float64 | **Passes attempted using body parts other than the player's head or feet.** | Passing Stats |\n", "| `Off` | float64 | **Offsides**. | Passing Stats |\n", "| `Int` | float64 | **Intercepted**. | Passing Stats |\n", "| `Blocks` | float64 | **Blocked by the opponent who was standing it the path**. | Passing Stats |\n", "| `SCA` | float64 | **Shot-Creating Actions**. The two offensive actions directly leading to a shot, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit. | Goal and Shot Creation Stats |\n", "| `SCA90` | float64 | **Shot-Creating Actions per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader | Goal and Shot Creation Stats |\n", "| `PassLive` | float64 | **Completed live-ball passes that lead to a shot attempt**. | Goal and Shot Creation Stats |\n", "| `PassDead` | float64 | **Completed dead-ball passes that lead to a shot attempt**. Includes free kicks, corner kicks, kick offs, throw-ins and goal kicks. | Goal and Shot Creation Stats |\n", "| `Drib` | float64 | **Successful dribbles that lead to a shot attempt**. | Goal and Shot Creation Stats |\n", "| `Fld` | float64 | **Fouls drawn that lead to a shot attempt**. | Goal and Shot Creation Stats |\n", "| `Def` | float64 | **Defensive actions that lead to a shot attempt**. | Goal and Shot Creation Stats |\n", "| `GCA` | float64 | **Goal-Creating Actions**. The two offensive actions directly leading to a goal, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit. | Goal and Shot Creation Stats |\n", "| `GCA90` | float64 | **Goal-Creating Actions per 90 minutes**. Minimum 30 minutes played per squad game to qualify as a leader. | Goal and Shot Creation Stats |\n", "| `Tkl` | float64 | **Number of players tackled**. | Defensive Action Stats |\n", "| `TklW` | float64 | **Tackles in which the tackler's team won possession of the ball**. | Defensive Action Stats |\n", "| `Def 3rd` | float64 | **Tackles in defensive 1/3**. | Defensive Action Stats |\n", "| `Mid 3rd` | float64 | **Tackles in middle 1/3**. | Defensive Action Stats |\n", "| `Att 3rd` | float64 | **Tackles in attacking 1/3**. | Defensive Action Stats |\n", "| `Tkl%` | float64 | **Percentage of dribblers tackled**. Dribblers tackled divided by dribblers tackled plus times dribbled past. Minimum .625 dribblers contested per squad game to qualify as a leader. | Defensive Action Stats |\n", "| `Past` | float64 | **Number of times dribbled past by an opposing player**. | Defensive Action Stats |\n", "| `Succ` | float64 | **Number of times the squad gained possession withing five seconds of applying pressure**. | Defensive Action Stats |\n", "| `%` | float64 | **Successful Pressure Percentage**. Percentage of time the squad gained possession withing five seconds of applying pressure. Minimum 6.44 pressures per squad game to qualify as a leader | Defensive Action Stats |\n", "| `ShSv` | float64 | **Number of times blocking a shot that was on target, by standing in its path**. | Defensive Action Stats |\n", "| `Pass` | float64 | **Number of times blocking a pass by standing in its path**. | Defensive Action Stats |\n", "| `Tkl+Int` | float64 | **Number of players tackled plus number of interceptions**. | Defensive Action Stats |\n", "| `Clr` | float64 | **Clearances**. | Defensive Action Stats |\n", "| `Err` | float64 | **Mistakes leading to an opponent's shot**. | Defensive Action Stats |\n", "| `Touches` | float64 | **Number of times a player touched the ball**. Note: Receiving a pass, then dribbling, then sending a pass counts as one touch. | Possession Stats |\n", "| `Def Pen` | float64 | **Touches in defensive penalty area**. | Possession Stats |\n", "| `Att Pen` | float64 | **Touches in attacking penalty area**. | Possession Stats |\n", "| `Succ%` | float64 | **Percentage of Dribbles Completed Successfully**. Minimum .5 dribbles per squad game to qualify as a leader | Possession Stats |\n", "| `#Pl` | float64 | **Number of Players Dribbled Past**. | Possession Stats |\n", "| `Megs` | float64 | **Number of times a player dribbled the ball through an opposing player's legs**. | Possession Stats |\n", "| `Carries` | float64 | **Number of times the player controlled the ball with their feet**. | Possession Stats |\n", "| `CPA` | float64 | **Carries into the 18-yard box**. | Possession Stats |\n", "| `Mis` | float64 | **Number of times a player failed when attempting to gain control of a ball**. | Possession Stats |\n", "| `Dis` | float64 | **Number of times a player loses control of the ball after being tackled by an opposing player**. Does not include attempted dribbles. | Possession Stats |\n", "| `Targ` | float64 | **Number of times a player was the target of an attempted pass**. | Possession Stats |\n", "| `Rec` | float64 | **Number of times a player successfully received a pass**. | Possession Stats |\n", "| `Rec%` | float64 | **Passes Received Percentage**. Percentage of time a player successfully received a pass. Minimum 30 minutes played per squad game to qualify as a leader. | Possession Stats |\n", "| `Mn/MP` | float64 | **Minutes Per Match Played**. | Playing Time Stats |\n", "| `Min%` | float64 | **Percentage of Minutes Played**. Percentage of team's total minutes in which player was on the pitch. Player minutes played divided by team total minutes played. Minimum 30 minutes played per squad game to qualify as a leader. | Playing Time Stats |\n", "| `Mn/Start` | float64 | **Minutes Per Match Started**. Minimum 30 minutes played per squad game to qualify as a leader. | Playing Time Stats |\n", "| `Subs` | float64 | **Games as sub**. Game or games player did not start, so as a substitute. | Playing Time Stats |\n", "| `Mn/Sub` | float64 | **Minutes Per Substitution**. Minimum 30 minutes played per squad game to qualify as a leader. | Playing Time Stats |\n", "| `unSub` | float64 | **Games as an unused substitute**. | Playing Time Stats |\n", "| `PPM` | float64 | **Points per Match**. Average number of points earned by the team from matches in which the player appeared. Minimum 30 minutes played per squad game to qualify as a leader. | Playing Time Stats |\n", "| `onG` | float64 | **Goals scored by team while on pitch**. | Playing Time Stats |\n", "| `onGA` | float64 | **Goals allowed by team while on pitch**. | Playing Time Stats |\n", "| `+/-` | float64 | **Plus/Minus**. Goals scored minus goals allowed by the team while the player was on the pitch. | Playing Time Stats |\n", "| `+/-90` | float64 | **Plus/Minus per 90 Minutes**. Goals scored minus goals allowed by the team while the player was on the pitch per 90 minutes played. Minimum 30 minutes played per squad game to qualify as a leader. | Playing Time Stats |\n", "| `On-Off` | float64 | **Plus/Minus Net per 90 Minutes**. Net goals per 90 minutes by the team while the player was on the pitch minus net goals allowed per 90 minutes by the team while the player was off the pitch. Minimum 30 minutes played per squad game to qualify as a leader. | Playing Time Stats |\n", "| `onxG` | float64 | **Expected goals by team while on pitch**. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Playing Time Stats |\n", "| `onxGA` | float64 | **Expected goals allowed by team while on pitch**. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Playing Time Stats |\n", "| `xG+/-` | float64 | **xG Plus/Minus**. Expected goals scored minus expected goals allowed by the team while the player was on the pitch. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Playing Time Stats |\n", "| `xG+/-90` | float64 | **xG Plus/Minus per 90 Minutes**. Expected goals scored minus expected goals allowed by the team while the player was on the pitch per 90 minutes played. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Playing Time Stats |\n", "| `2CrdY` | float64 | **Second Yellow Card**. | Miscellaneous Stats |\n", "| `Fls` | float64 | **Fouls Committed**. | Miscellaneous Stats |\n", "| `PKwon` | float64 | **Penalty Kicks Won**. | Miscellaneous Stats |\n", "| `PKcon` | float64 | **Penalty Kicks Conceded**. | Miscellaneous Stats |\n", "| `OG` | float64 | **Own Goals**. | Miscellaneous Stats |\n", "| `Recov` | float64 | **Number of loose balls recovered**. | Miscellaneous Stats |\n", "| `Won` | float64 | **Aerials won**. | Miscellaneous Stats |\n", "| `Lost` | float64 | **Aerials lost**. | Miscellaneous Stats |\n", "| `Won%` | float64 | **Percentage of aerials won**. Minimum .97 aerial duels per squad game to qualify as a leader. | Miscellaneous Stats |\n", "| `League Name` | object | **Competition Name** | |\n", "| `League ID` | object | **Competition ID**, as per FBref. | |\n", "| `Season` | object | **Season**. | |\n", "| `Team Name` | object | **Club name**. | |\n", "| `Team Country` | object | **Country of the team/league**. | |\n", "| `Nationality Code` | object | **3 digit nationality code** | |\n", "| `Nationality Cleaned` | object | **Tidied nationality** | |\n", "| `Primary Pos` | object | **Primary position of the player**. | |\n", "| `Position Grouped` | object | **Primary position grouped** - Goalkeeper, Defender, Midfielder, Forward. | |\n", "| `Outfielder Goalkeeper` | object | **Playering position is in goal or outfield**. | |\n", "| `GA` | float64 | **Goals Against**. | Goalkeeping Stats |\n", "| `GA90` | float64 | **Goals Against per 90 minutes**. | Goalkeeping Stats |\n", "| `SoTA` | float64 | **Shots on Target Against**. | Goalkeeping Stats |\n", "| `Saves` | float64 | | Goalkeeping Stats |\n", "| `Save%` | float64 | **Save Percentage (Shots on Target Against - Goals Against)/Shots on Target Against**. Note that not all shots on target are stopped by the keeper, many will be stopped by defenders. | Goalkeeping Stats |\n", "| `W` | float64 | **Wins**. | Goalkeeping Stats |\n", "| `D` | float64 | **Draws**. | Goalkeeping Stats |\n", "| `L` | float64 | **Losses**. | Goalkeeping Stats |\n", "| `CS` | float64 | **Clean Sheets**. Full matches by goalkeeper where no goals are allowed. | Goalkeeping Stats |\n", "| `CS%` | float64 | **Clean Sheet Percentage**. Percentage of matches that result in clean sheets. | Goalkeeping Stats |\n", "| `PKA` | float64 | **Penalty Kicks Allowed** | Goalkeeping Stats |\n", "| `PKsv` | float64 | **Penalty Kicks Saved**. | Goalkeeping Stats |\n", "| `PKm` | float64 | **Penalty Kicks Missed**. | Goalkeeping Stats |\n", "| `PSxG` | float64 | **Post-Shot Expected Goals**. PSxG is expected goals based on how likely the goalkeeper is to save the shot xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Advanced Goalkeeping Stats |\n", "| `PSxG/SoT` | float64 | **Post-Shot Expected Goals per Shot on Target**. Not including penalty kicks. PSxG is expected goals based on how likely the goalkeeper is to save the shot. Higher numbers indicate that shots on target faced are more difficult to stop and more likely to score. | Advanced Goalkeeping Stats |\n", "| `PSxG+/-` | float64 | **Post-Shot Expected Goals minus Goals Allowed**. Positive numbers suggest better luck or an above average ability to stop shots. PSxG is expected goals based on how likely the goalkeeper is to save the shot. Note: Does not include own goals. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Advanced Goalkeeping Stats |\n", "| `/90` | float64 | **Post-Shot Expected Goals minus Goals Allowed per 90 minutes**. Positive numbers suggest better luck or an above average ability to stop shots. PSxG is expected goals based on how likely the goalkeeper is to save the shot. Note: Does not include own goals. xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted). | Advanced Goalkeeping Stats |\n", "| `Thr` | float64 | **Throws Attempted**. | Advanced Goalkeeping Stats |\n", "| `Launch%` | float64 | **Percentage of Passes that were Launched**. Not including goal kicks. Passes longer than 40 yards | Advanced Goalkeeping Stats |\n", "| `AvgLen` | float64 | **Average length of passes, in yards**. Not including goal kicks | Advanced Goalkeeping Stats |\n", "| `Opp` | float64 | **Opponent's attempted crosses into penalty area** | Advanced Goalkeeping Stats |\n", "| `Stp` | float64 | **Number of crosses into penalty area which were successfully stopped by the goalkeeper** | Advanced Goalkeeping Stats |\n", "| `Stp%` | float64 | **Percentage of crosses into penalty area which were successfully stopped by the goalkeeper** | Advanced Goalkeeping Stats |\n", "| `#OPA` | float64 | **# of defensive actions outside of penalty area** | Advanced Goalkeeping Stats |\n", "| `#OPA/90` | float64 | **Defensive actions outside of penalty area per 90 minutes** | Advanced Goalkeeping Stats |\n", "| `AvgDist` | float64 | **Average distance from goal (in yards) of all defensive actions** | Advanced Goalkeeping Stats |\n", "\n", "\n", "
\n", "\n", "The features will be cleaned, converted and also additional features will be created in the [Data Engineering](#section4) section (Section 4)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 3.2.2. Import CSV files as pandas DataFrames" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3441: DtypeWarning: Columns (5) have mixed types.Specify dtype option on import or set low_memory=False.\n", " exec(code_obj, self.user_global_ns, self.user_ns)\n" ] } ], "source": [ "# Import DataFrame as a CSV file\n", "df_fbref_outfield_raw = pd.read_csv(data_dir_fbref + '/raw/outfield/fbref_outfield_player_stats_combined_latest.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 3.2.3. Preliminary Data Handling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "##### 3.2.3.1. Summary Report\n", "Initial step of the data handling and Exploratory Data Analysis (EDA) is to create a quick summary report of the dataset using [pandas Profiling Report](https://github.com/pandas-profiling/pandas-profiling)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Summary of the data using pandas Profiling Report\n", "#pp.ProfileReport(df_fbref_outfield_raw)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "##### 3.2.3.2. Further Inspection\n", "The following commands go into more bespoke summary of the dataset. Some of the commands include content covered in the [pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) summary above, but using the standard [pandas](https://pandas.pydata.org/) functions and methods that most peoplem will be more familiar with.\n", "\n", "First check the quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerNationPosSquadCompAgeBornMPStartsMin90sGlsAstG-PKPKPKattCrdYCrdRGls.1Ast.1G+AG-PK.1G+A-PKxGnpxGxAnpxG+xAxG.1xA.1xG+xAnpxG.1npxG+xA.1MatchesShSoTSoT%Sh/90SoT/90G/ShG/SoTDistFKnpxG/ShG-xGnp:G-xGCmpAttCmp%TotDistPrgDistCmp.1Att.1Cmp%.1Cmp.2Att.2Cmp%.2Cmp.3Att.3Cmp%.3A-xAKP1/3PPACrsPAProgLiveDeadTBPressSwCrsCKInOutStrGroundLowHighLeftRightHeadTIOtherOffOut.1IntBlocksSCASCA90PassLivePassDeadDribFldDefGCAGCA90PassLive.1PassDead.1Drib.1Sh.1Fld.1Def.1TklTklWDef 3rdMid 3rdAtt 3rdTkl.1Tkl%PastSucc%Def 3rd.1Mid 3rd.1Att 3rd.1ShSvPassTkl+IntClrErrTouchesDef PenAtt PenSucc%#PlMegsCarriesCPAMisDisTargRecRec%Prog.1Mn/MPMin%Mn/StartComplSubsMn/SubunSubPPMonGonGA+/-+/-90On-OffonxGonxGAxG+/-xG+/-90On-Off.12CrdYFlsPKwonPKconOGRecovWonLostWon%League NameLeague IDSeason
0Aaron Cresswelleng ENGDFWest Hameng Premier League271989.03635306934.113100700.030.090.120.030.120.80.82.83.60.020.080.100.020.10Matches21.0628.60.620.180.050.1728.18.00.040.20.21224.01708.071.723519.010212.0560.0623.089.9472.0587.080.4183.0449.040.80.235.0117.021.014.096.01343.0365.01.0222.083.093.067.035.015.09.0893.0293.0522.01329.078.059.0210.05.015.044.039.052.062.01.8235.021.01.03.00.09.00.266.03.00.00.00.00.038.018.015.018.05.017.053.115.0115.032.1181.0123.054.00.038.090.0133.00.02050.0125.017.033.37.00.01071.02.018.019.01171.01094.093.431.08589.7NaN30.01NaN11.1445.060.0-15.0-0.440.8438.051.5-13.5-0.401.090.0200.00.00.0277.070.057.055.1Big-5-European-LeaguesBig52017-2018
1Aaron Huntde GERMF,FWHamburger SVde Bundesliga301986.02826208123.132211100.130.090.220.090.172.82.15.67.60.120.230.350.090.32Matches27.0622.21.170.260.070.3323.410.00.080.2-0.1883.01229.071.816889.05315.0406.0480.084.6292.0376.077.7165.0303.054.5-3.665.083.031.05.097.0977.0252.011.0245.067.066.0123.035.041.014.0672.0236.0321.0999.0137.042.023.09.05.029.029.049.0102.04.2554.043.01.02.01.06.00.255.01.00.00.00.00.030.022.012.016.02.05.013.532.0135.027.9102.0261.0121.00.028.044.021.00.01475.028.068.058.323.04.0892.07.045.042.01176.0893.075.9178.07468.0NaN14.02NaN01.0722.034.0-12.0-0.520.5827.031.3-4.3-0.180.940.0270.00.00.0213.022.037.037.3Big-5-European-LeaguesBig52017-2018
2Aaron Lennoneng ENGMFBurnleyeng Premier League301987.01413111812.402000200.000.160.160.000.160.60.61.42.00.050.110.160.050.16Matches10.0440.00.810.320.000.0016.60.00.06-0.6-0.6204.0294.069.43223.0887.0116.0142.081.768.092.073.917.034.050.00.68.011.013.05.022.0289.05.00.061.05.019.00.00.00.00.0193.051.050.027.0250.07.04.03.00.09.08.030.018.01.4512.00.01.01.00.03.00.242.00.00.01.00.00.018.010.06.011.01.04.019.017.061.026.374.0102.056.00.024.031.09.00.0424.019.036.048.012.02.0290.012.09.025.0353.0259.073.441.08032.7NaN6.01NaN01.4317.015.02.00.160.3613.815.4-1.5-0.120.490.0120.00.00.080.07.015.031.8Big-5-European-LeaguesBig52017-2018
3Aaron Lennoneng ENGFW,MFEvertoneng Premier League301987.01597938.800000000.000.000.000.000.000.30.30.50.80.040.050.090.040.09Matches4.0125.00.450.110.000.0014.80.00.08-0.3-0.3152.0214.071.02286.0672.092.0115.080.053.069.076.85.013.038.5-0.55.09.03.02.017.0199.015.00.049.02.08.00.00.00.00.0129.047.038.029.0159.010.014.00.01.03.07.015.016.01.8211.00.01.02.00.04.00.452.00.00.01.01.00.018.010.09.07.02.05.025.015.038.019.349.0102.046.00.018.025.09.00.0322.07.022.035.08.01.0186.08.09.017.0288.0195.067.733.05323.2NaN2.06NaN01.2715.014.01.00.110.6312.013.7-1.6-0.190.130.092.00.00.050.06.012.033.3Big-5-European-LeaguesBig52017-2018
4Aaron Mooyau AUSMFHuddersfieldeng Premier League261990.03634306734.143311400.120.090.210.090.182.61.83.14.90.080.090.170.050.14Matches28.0621.40.820.180.110.5022.03.00.061.41.21561.02067.075.527911.07921.0783.0876.089.4540.0678.079.6196.0397.049.4-0.148.0167.027.09.0163.01897.0170.01.0422.0100.085.077.035.021.05.01293.0283.0491.0507.01444.077.05.04.06.038.060.060.073.02.1454.016.00.01.02.05.00.154.01.00.00.00.00.0105.055.038.054.013.032.044.440.0193.029.5192.0355.0107.02.052.0151.070.00.02496.065.032.053.226.00.01543.06.033.060.01710.01540.090.185.08589.7NaN29.02NaN00.9425.052.0-27.0-0.79-0.0328.749.8-21.1-0.62-0.010.0260.00.00.0455.035.042.045.5Big-5-European-LeaguesBig52017-2018
\n", "
" ], "text/plain": [ " Player Nation Pos Squad Comp Age \\\n", "0 Aaron Cresswell eng ENG DF West Ham eng Premier League 27 \n", "1 Aaron Hunt de GER MF,FW Hamburger SV de Bundesliga 30 \n", "2 Aaron Lennon eng ENG MF Burnley eng Premier League 30 \n", "3 Aaron Lennon eng ENG FW,MF Everton eng Premier League 30 \n", "4 Aaron Mooy au AUS MF Huddersfield eng Premier League 26 \n", "\n", " Born MP Starts Min 90s Gls Ast G-PK PK PKatt CrdY CrdR \\\n", "0 1989.0 36 35 3069 34.1 1 3 1 0 0 7 0 \n", "1 1986.0 28 26 2081 23.1 3 2 2 1 1 1 0 \n", "2 1987.0 14 13 1118 12.4 0 2 0 0 0 2 0 \n", "3 1987.0 15 9 793 8.8 0 0 0 0 0 0 0 \n", "4 1990.0 36 34 3067 34.1 4 3 3 1 1 4 0 \n", "\n", " Gls.1 Ast.1 G+A G-PK.1 G+A-PK xG npxG xA npxG+xA xG.1 xA.1 \\\n", "0 0.03 0.09 0.12 0.03 0.12 0.8 0.8 2.8 3.6 0.02 0.08 \n", "1 0.13 0.09 0.22 0.09 0.17 2.8 2.1 5.6 7.6 0.12 0.23 \n", "2 0.00 0.16 0.16 0.00 0.16 0.6 0.6 1.4 2.0 0.05 0.11 \n", "3 0.00 0.00 0.00 0.00 0.00 0.3 0.3 0.5 0.8 0.04 0.05 \n", "4 0.12 0.09 0.21 0.09 0.18 2.6 1.8 3.1 4.9 0.08 0.09 \n", "\n", " xG+xA npxG.1 npxG+xA.1 Matches Sh SoT SoT% Sh/90 SoT/90 G/Sh \\\n", "0 0.10 0.02 0.10 Matches 21.0 6 28.6 0.62 0.18 0.05 \n", "1 0.35 0.09 0.32 Matches 27.0 6 22.2 1.17 0.26 0.07 \n", "2 0.16 0.05 0.16 Matches 10.0 4 40.0 0.81 0.32 0.00 \n", "3 0.09 0.04 0.09 Matches 4.0 1 25.0 0.45 0.11 0.00 \n", "4 0.17 0.05 0.14 Matches 28.0 6 21.4 0.82 0.18 0.11 \n", "\n", " G/SoT Dist FK npxG/Sh G-xG np:G-xG Cmp Att Cmp% TotDist \\\n", "0 0.17 28.1 8.0 0.04 0.2 0.2 1224.0 1708.0 71.7 23519.0 \n", "1 0.33 23.4 10.0 0.08 0.2 -0.1 883.0 1229.0 71.8 16889.0 \n", "2 0.00 16.6 0.0 0.06 -0.6 -0.6 204.0 294.0 69.4 3223.0 \n", "3 0.00 14.8 0.0 0.08 -0.3 -0.3 152.0 214.0 71.0 2286.0 \n", "4 0.50 22.0 3.0 0.06 1.4 1.2 1561.0 2067.0 75.5 27911.0 \n", "\n", " PrgDist Cmp.1 Att.1 Cmp%.1 Cmp.2 Att.2 Cmp%.2 Cmp.3 Att.3 Cmp%.3 \\\n", "0 10212.0 560.0 623.0 89.9 472.0 587.0 80.4 183.0 449.0 40.8 \n", "1 5315.0 406.0 480.0 84.6 292.0 376.0 77.7 165.0 303.0 54.5 \n", "2 887.0 116.0 142.0 81.7 68.0 92.0 73.9 17.0 34.0 50.0 \n", "3 672.0 92.0 115.0 80.0 53.0 69.0 76.8 5.0 13.0 38.5 \n", "4 7921.0 783.0 876.0 89.4 540.0 678.0 79.6 196.0 397.0 49.4 \n", "\n", " A-xA KP 1/3 PPA CrsPA Prog Live Dead TB Press Sw \\\n", "0 0.2 35.0 117.0 21.0 14.0 96.0 1343.0 365.0 1.0 222.0 83.0 \n", "1 -3.6 65.0 83.0 31.0 5.0 97.0 977.0 252.0 11.0 245.0 67.0 \n", "2 0.6 8.0 11.0 13.0 5.0 22.0 289.0 5.0 0.0 61.0 5.0 \n", "3 -0.5 5.0 9.0 3.0 2.0 17.0 199.0 15.0 0.0 49.0 2.0 \n", "4 -0.1 48.0 167.0 27.0 9.0 163.0 1897.0 170.0 1.0 422.0 100.0 \n", "\n", " Crs CK In Out Str Ground Low High Left Right Head \\\n", "0 93.0 67.0 35.0 15.0 9.0 893.0 293.0 522.0 1329.0 78.0 59.0 \n", "1 66.0 123.0 35.0 41.0 14.0 672.0 236.0 321.0 999.0 137.0 42.0 \n", "2 19.0 0.0 0.0 0.0 0.0 193.0 51.0 50.0 27.0 250.0 7.0 \n", "3 8.0 0.0 0.0 0.0 0.0 129.0 47.0 38.0 29.0 159.0 10.0 \n", "4 85.0 77.0 35.0 21.0 5.0 1293.0 283.0 491.0 507.0 1444.0 77.0 \n", "\n", " TI Other Off Out.1 Int Blocks SCA SCA90 PassLive PassDead \\\n", "0 210.0 5.0 15.0 44.0 39.0 52.0 62.0 1.82 35.0 21.0 \n", "1 23.0 9.0 5.0 29.0 29.0 49.0 102.0 4.25 54.0 43.0 \n", "2 4.0 3.0 0.0 9.0 8.0 30.0 18.0 1.45 12.0 0.0 \n", "3 14.0 0.0 1.0 3.0 7.0 15.0 16.0 1.82 11.0 0.0 \n", "4 5.0 4.0 6.0 38.0 60.0 60.0 73.0 2.14 54.0 16.0 \n", "\n", " Drib Fld Def GCA GCA90 PassLive.1 PassDead.1 Drib.1 Sh.1 Fld.1 \\\n", "0 1.0 3.0 0.0 9.0 0.26 6.0 3.0 0.0 0.0 0.0 \n", "1 1.0 2.0 1.0 6.0 0.25 5.0 1.0 0.0 0.0 0.0 \n", "2 1.0 1.0 0.0 3.0 0.24 2.0 0.0 0.0 1.0 0.0 \n", "3 1.0 2.0 0.0 4.0 0.45 2.0 0.0 0.0 1.0 1.0 \n", "4 0.0 1.0 2.0 5.0 0.15 4.0 1.0 0.0 0.0 0.0 \n", "\n", " Def.1 Tkl TklW Def 3rd Mid 3rd Att 3rd Tkl.1 Tkl% Past Succ \\\n", "0 0.0 38.0 18.0 15.0 18.0 5.0 17.0 53.1 15.0 115.0 \n", "1 0.0 30.0 22.0 12.0 16.0 2.0 5.0 13.5 32.0 135.0 \n", "2 0.0 18.0 10.0 6.0 11.0 1.0 4.0 19.0 17.0 61.0 \n", "3 0.0 18.0 10.0 9.0 7.0 2.0 5.0 25.0 15.0 38.0 \n", "4 0.0 105.0 55.0 38.0 54.0 13.0 32.0 44.4 40.0 193.0 \n", "\n", " % Def 3rd.1 Mid 3rd.1 Att 3rd.1 ShSv Pass Tkl+Int Clr Err \\\n", "0 32.1 181.0 123.0 54.0 0.0 38.0 90.0 133.0 0.0 \n", "1 27.9 102.0 261.0 121.0 0.0 28.0 44.0 21.0 0.0 \n", "2 26.3 74.0 102.0 56.0 0.0 24.0 31.0 9.0 0.0 \n", "3 19.3 49.0 102.0 46.0 0.0 18.0 25.0 9.0 0.0 \n", "4 29.5 192.0 355.0 107.0 2.0 52.0 151.0 70.0 0.0 \n", "\n", " Touches Def Pen Att Pen Succ% #Pl Megs Carries CPA Mis Dis \\\n", "0 2050.0 125.0 17.0 33.3 7.0 0.0 1071.0 2.0 18.0 19.0 \n", "1 1475.0 28.0 68.0 58.3 23.0 4.0 892.0 7.0 45.0 42.0 \n", "2 424.0 19.0 36.0 48.0 12.0 2.0 290.0 12.0 9.0 25.0 \n", "3 322.0 7.0 22.0 35.0 8.0 1.0 186.0 8.0 9.0 17.0 \n", "4 2496.0 65.0 32.0 53.2 26.0 0.0 1543.0 6.0 33.0 60.0 \n", "\n", " Targ Rec Rec% Prog.1 Mn/MP Min% Mn/Start Compl Subs Mn/Sub \\\n", "0 1171.0 1094.0 93.4 31.0 85 89.7 NaN 30.0 1 NaN \n", "1 1176.0 893.0 75.9 178.0 74 68.0 NaN 14.0 2 NaN \n", "2 353.0 259.0 73.4 41.0 80 32.7 NaN 6.0 1 NaN \n", "3 288.0 195.0 67.7 33.0 53 23.2 NaN 2.0 6 NaN \n", "4 1710.0 1540.0 90.1 85.0 85 89.7 NaN 29.0 2 NaN \n", "\n", " unSub PPM onG onGA +/- +/-90 On-Off onxG onxGA xG+/- xG+/-90 \\\n", "0 1 1.14 45.0 60.0 -15.0 -0.44 0.84 38.0 51.5 -13.5 -0.40 \n", "1 0 1.07 22.0 34.0 -12.0 -0.52 0.58 27.0 31.3 -4.3 -0.18 \n", "2 0 1.43 17.0 15.0 2.0 0.16 0.36 13.8 15.4 -1.5 -0.12 \n", "3 0 1.27 15.0 14.0 1.0 0.11 0.63 12.0 13.7 -1.6 -0.19 \n", "4 0 0.94 25.0 52.0 -27.0 -0.79 -0.03 28.7 49.8 -21.1 -0.62 \n", "\n", " On-Off.1 2CrdY Fls PKwon PKcon OG Recov Won Lost Won% \\\n", "0 1.09 0.0 20 0.0 0.0 0.0 277.0 70.0 57.0 55.1 \n", "1 0.94 0.0 27 0.0 0.0 0.0 213.0 22.0 37.0 37.3 \n", "2 0.49 0.0 12 0.0 0.0 0.0 80.0 7.0 15.0 31.8 \n", "3 0.13 0.0 9 2.0 0.0 0.0 50.0 6.0 12.0 33.3 \n", "4 -0.01 0.0 26 0.0 0.0 0.0 455.0 35.0 42.0 45.5 \n", "\n", " League Name League ID Season \n", "0 Big-5-European-Leagues Big5 2017-2018 \n", "1 Big-5-European-Leagues Big5 2017-2018 \n", "2 Big-5-European-Leagues Big5 2017-2018 \n", "3 Big-5-European-Leagues Big5 2017-2018 \n", "4 Big-5-European-Leagues Big5 2017-2018 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the first five rows of the raw DataFrame, df_fbref_outfield_raw\n", "df_fbref_outfield_raw.head()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerNationPosSquadCompAgeBornMPStartsMin90sGlsAstG-PKPKPKattCrdYCrdRGls.1Ast.1G+AG-PK.1G+A-PKxGnpxGxAnpxG+xAxG.1xA.1xG+xAnpxG.1npxG+xA.1MatchesShSoTSoT%Sh/90SoT/90G/ShG/SoTDistFKnpxG/ShG-xGnp:G-xGCmpAttCmp%TotDistPrgDistCmp.1Att.1Cmp%.1Cmp.2Att.2Cmp%.2Cmp.3Att.3Cmp%.3A-xAKP1/3PPACrsPAProgLiveDeadTBPressSwCrsCKInOutStrGroundLowHighLeftRightHeadTIOtherOffOut.1IntBlocksSCASCA90PassLivePassDeadDribFldDefGCAGCA90PassLive.1PassDead.1Drib.1Sh.1Fld.1Def.1TklTklWDef 3rdMid 3rdAtt 3rdTkl.1Tkl%PastSucc%Def 3rd.1Mid 3rd.1Att 3rd.1ShSvPassTkl+IntClrErrTouchesDef PenAtt PenSucc%#PlMegsCarriesCPAMisDisTargRecRec%Prog.1Mn/MPMin%Mn/StartComplSubsMn/SubunSubPPMonGonGA+/-+/-90On-OffonxGonxGAxG+/-xG+/-90On-Off.12CrdYFlsPKwonPKconOGRecovWonLostWon%League NameLeague IDSeason
12748Óscar de Marcoses ESPDFAthletic Clubes La Liga321989.011520.600000100.000.00.000.000.000.00.00.00.00.060.000.060.060.06Matches1.01100.01.731.730.00.020.10.00.030.00.021.032.065.6339.0139.09.013.069.211.014.078.61.03.033.30.00.01.00.00.03.025.07.00.02.00.03.00.00.00.00.019.08.05.01.023.01.07.00.00.00.02.03.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0NaN0.02.050.00.02.02.00.03.00.02.00.038.01.01.0NaN0.00.017.00.00.00.024.022.091.70.05219.352.00.00NaN11.000.00.00.00.00-0.410.20.8-0.6-0.97-1.360.010.00.00.04.01.00.0100.0Big-5-European-LeaguesBig52021-2022
12749İlkay Gündoğande GERMFManchester Cityeng Premier League301990.0332482.810100100.360.00.360.360.360.50.50.71.20.180.270.450.180.45Matches5.0120.01.810.360.21.018.71.00.100.50.5171.0204.083.82621.0469.0107.0110.097.346.055.083.616.032.050.0-0.74.014.05.00.011.0182.022.01.012.05.07.016.09.01.01.0165.013.026.057.0140.06.00.00.01.03.03.04.08.02.96.02.00.00.00.00.00.00.00.00.00.00.00.05.03.00.05.00.02.066.71.05.031.32.09.05.01.00.06.02.00.0229.03.013.075.03.01.0166.02.01.05.0201.0183.091.019.08391.983.02.00NaN02.008.01.07.02.54-5.647.11.45.72.07-2.910.040.00.00.011.01.01.050.0Big-5-European-LeaguesBig52021-2022
12750Łukasz Fabiańskipl POLGKWest Hameng Premier League361985.0332703.000000000.000.00.000.000.000.00.00.00.00.000.000.000.000.00Matches0.00NaN0.000.00NaNNaNNaN0.0NaN0.00.043.065.066.21637.01015.03.03.0100.014.014.0100.026.047.055.30.00.01.00.00.00.042.023.00.04.05.00.00.00.00.00.026.05.034.03.051.00.00.09.00.02.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0NaN0.00.0NaN0.00.00.00.00.00.00.00.067.061.00.0NaN0.00.030.00.00.00.032.031.096.90.090100.090.03.00NaN02.3310.05.05.01.67NaN6.02.93.11.03NaN0.000.00.00.06.00.00.0NaNBig-5-European-LeaguesBig52021-2022
12751Łukasz Skorupskipl POLGKBolognait Serie A301991.0221802.000000000.000.00.000.000.000.00.00.00.00.000.000.000.000.00Matches0.00NaN0.000.00NaNNaNNaN0.0NaN0.00.048.072.066.71200.0760.013.013.0100.016.016.0100.015.039.038.50.00.00.00.00.00.045.027.00.012.01.00.00.00.00.00.037.05.030.07.057.01.00.07.00.06.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0NaN0.00.0NaN0.00.00.00.00.00.00.00.076.067.00.0NaN0.00.038.00.00.00.034.034.0100.00.090100.090.02.00NaN02.003.02.01.00.50NaN1.82.6-0.8-0.38NaN0.000.00.00.07.00.00.0NaNBig-5-European-LeaguesBig52021-2022
12752Šime Vrsaljkohr CRODFAtlético Madrides La Liga291992.01010.000000000.000.00.000.000.000.00.00.00.00.000.000.000.000.00Matches0.00NaN0.000.00NaNNaNNaN0.0NaN0.00.01.01.0100.09.08.01.01.0100.00.00.0NaN0.00.0NaN0.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0NaN0.00.0NaN0.00.00.00.00.00.01.00.02.01.00.0NaN0.00.00.00.00.00.01.01.0100.00.010.4NaN0.011.023.000.00.00.00.00-0.670.00.3-0.3-26.51-27.190.000.00.00.00.00.00.0NaNBig-5-European-LeaguesBig52021-2022
\n", "
" ], "text/plain": [ " Player Nation Pos Squad Comp \\\n", "12748 Óscar de Marcos es ESP DF Athletic Club es La Liga \n", "12749 Ä°lkay Gündoğan de GER MF Manchester City eng Premier League \n", "12750 Łukasz Fabiański pl POL GK West Ham eng Premier League \n", "12751 Łukasz Skorupski pl POL GK Bologna it Serie A \n", "12752 Å ime Vrsaljko hr CRO DF Atlético Madrid es La Liga \n", "\n", " Age Born MP Starts Min 90s Gls Ast G-PK PK PKatt CrdY \\\n", "12748 32 1989.0 1 1 52 0.6 0 0 0 0 0 1 \n", "12749 30 1990.0 3 3 248 2.8 1 0 1 0 0 1 \n", "12750 36 1985.0 3 3 270 3.0 0 0 0 0 0 0 \n", "12751 30 1991.0 2 2 180 2.0 0 0 0 0 0 0 \n", "12752 29 1992.0 1 0 1 0.0 0 0 0 0 0 0 \n", "\n", " CrdR Gls.1 Ast.1 G+A G-PK.1 G+A-PK xG npxG xA npxG+xA \\\n", "12748 0 0.00 0.0 0.00 0.00 0.00 0.0 0.0 0.0 0.0 \n", "12749 0 0.36 0.0 0.36 0.36 0.36 0.5 0.5 0.7 1.2 \n", "12750 0 0.00 0.0 0.00 0.00 0.00 0.0 0.0 0.0 0.0 \n", "12751 0 0.00 0.0 0.00 0.00 0.00 0.0 0.0 0.0 0.0 \n", "12752 0 0.00 0.0 0.00 0.00 0.00 0.0 0.0 0.0 0.0 \n", "\n", " xG.1 xA.1 xG+xA npxG.1 npxG+xA.1 Matches Sh SoT SoT% Sh/90 \\\n", "12748 0.06 0.00 0.06 0.06 0.06 Matches 1.0 1 100.0 1.73 \n", "12749 0.18 0.27 0.45 0.18 0.45 Matches 5.0 1 20.0 1.81 \n", "12750 0.00 0.00 0.00 0.00 0.00 Matches 0.0 0 NaN 0.00 \n", "12751 0.00 0.00 0.00 0.00 0.00 Matches 0.0 0 NaN 0.00 \n", "12752 0.00 0.00 0.00 0.00 0.00 Matches 0.0 0 NaN 0.00 \n", "\n", " SoT/90 G/Sh G/SoT Dist FK npxG/Sh G-xG np:G-xG Cmp Att \\\n", "12748 1.73 0.0 0.0 20.1 0.0 0.03 0.0 0.0 21.0 32.0 \n", "12749 0.36 0.2 1.0 18.7 1.0 0.10 0.5 0.5 171.0 204.0 \n", "12750 0.00 NaN NaN NaN 0.0 NaN 0.0 0.0 43.0 65.0 \n", "12751 0.00 NaN NaN NaN 0.0 NaN 0.0 0.0 48.0 72.0 \n", "12752 0.00 NaN NaN NaN 0.0 NaN 0.0 0.0 1.0 1.0 \n", "\n", " Cmp% TotDist PrgDist Cmp.1 Att.1 Cmp%.1 Cmp.2 Att.2 Cmp%.2 \\\n", "12748 65.6 339.0 139.0 9.0 13.0 69.2 11.0 14.0 78.6 \n", "12749 83.8 2621.0 469.0 107.0 110.0 97.3 46.0 55.0 83.6 \n", "12750 66.2 1637.0 1015.0 3.0 3.0 100.0 14.0 14.0 100.0 \n", "12751 66.7 1200.0 760.0 13.0 13.0 100.0 16.0 16.0 100.0 \n", "12752 100.0 9.0 8.0 1.0 1.0 100.0 0.0 0.0 NaN \n", "\n", " Cmp.3 Att.3 Cmp%.3 A-xA KP 1/3 PPA CrsPA Prog Live Dead \\\n", "12748 1.0 3.0 33.3 0.0 0.0 1.0 0.0 0.0 3.0 25.0 7.0 \n", "12749 16.0 32.0 50.0 -0.7 4.0 14.0 5.0 0.0 11.0 182.0 22.0 \n", "12750 26.0 47.0 55.3 0.0 0.0 1.0 0.0 0.0 0.0 42.0 23.0 \n", "12751 15.0 39.0 38.5 0.0 0.0 0.0 0.0 0.0 0.0 45.0 27.0 \n", "12752 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "\n", " TB Press Sw Crs CK In Out Str Ground Low High Left \\\n", "12748 0.0 2.0 0.0 3.0 0.0 0.0 0.0 0.0 19.0 8.0 5.0 1.0 \n", "12749 1.0 12.0 5.0 7.0 16.0 9.0 1.0 1.0 165.0 13.0 26.0 57.0 \n", "12750 0.0 4.0 5.0 0.0 0.0 0.0 0.0 0.0 26.0 5.0 34.0 3.0 \n", "12751 0.0 12.0 1.0 0.0 0.0 0.0 0.0 0.0 37.0 5.0 30.0 7.0 \n", "12752 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 \n", "\n", " Right Head TI Other Off Out.1 Int Blocks SCA SCA90 PassLive \\\n", "12748 23.0 1.0 7.0 0.0 0.0 0.0 2.0 3.0 0.0 0.0 0.0 \n", "12749 140.0 6.0 0.0 0.0 1.0 3.0 3.0 4.0 8.0 2.9 6.0 \n", "12750 51.0 0.0 0.0 9.0 0.0 2.0 0.0 1.0 0.0 0.0 0.0 \n", "12751 57.0 1.0 0.0 7.0 0.0 6.0 0.0 1.0 0.0 0.0 0.0 \n", "12752 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " PassDead Drib Fld Def GCA GCA90 PassLive.1 PassDead.1 Drib.1 \\\n", "12748 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "12749 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "12750 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "12751 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "12752 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Sh.1 Fld.1 Def.1 Tkl TklW Def 3rd Mid 3rd Att 3rd Tkl.1 Tkl% \\\n", "12748 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN \n", "12749 0.0 0.0 0.0 5.0 3.0 0.0 5.0 0.0 2.0 66.7 \n", "12750 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN \n", "12751 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN \n", "12752 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN \n", "\n", " Past Succ % Def 3rd.1 Mid 3rd.1 Att 3rd.1 ShSv Pass Tkl+Int \\\n", "12748 0.0 2.0 50.0 0.0 2.0 2.0 0.0 3.0 0.0 \n", "12749 1.0 5.0 31.3 2.0 9.0 5.0 1.0 0.0 6.0 \n", "12750 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 \n", "12751 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 \n", "12752 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Clr Err Touches Def Pen Att Pen Succ% #Pl Megs Carries CPA \\\n", "12748 2.0 0.0 38.0 1.0 1.0 NaN 0.0 0.0 17.0 0.0 \n", "12749 2.0 0.0 229.0 3.0 13.0 75.0 3.0 1.0 166.0 2.0 \n", "12750 0.0 0.0 67.0 61.0 0.0 NaN 0.0 0.0 30.0 0.0 \n", "12751 0.0 0.0 76.0 67.0 0.0 NaN 0.0 0.0 38.0 0.0 \n", "12752 1.0 0.0 2.0 1.0 0.0 NaN 0.0 0.0 0.0 0.0 \n", "\n", " Mis Dis Targ Rec Rec% Prog.1 Mn/MP Min% Mn/Start Compl \\\n", "12748 0.0 0.0 24.0 22.0 91.7 0.0 52 19.3 52.0 0.0 \n", "12749 1.0 5.0 201.0 183.0 91.0 19.0 83 91.9 83.0 2.0 \n", "12750 0.0 0.0 32.0 31.0 96.9 0.0 90 100.0 90.0 3.0 \n", "12751 0.0 0.0 34.0 34.0 100.0 0.0 90 100.0 90.0 2.0 \n", "12752 0.0 0.0 1.0 1.0 100.0 0.0 1 0.4 NaN 0.0 \n", "\n", " Subs Mn/Sub unSub PPM onG onGA +/- +/-90 On-Off onxG onxGA \\\n", "12748 0 NaN 1 1.00 0.0 0.0 0.0 0.00 -0.41 0.2 0.8 \n", "12749 0 NaN 0 2.00 8.0 1.0 7.0 2.54 -5.64 7.1 1.4 \n", "12750 0 NaN 0 2.33 10.0 5.0 5.0 1.67 NaN 6.0 2.9 \n", "12751 0 NaN 0 2.00 3.0 2.0 1.0 0.50 NaN 1.8 2.6 \n", "12752 1 1.0 2 3.00 0.0 0.0 0.0 0.00 -0.67 0.0 0.3 \n", "\n", " xG+/- xG+/-90 On-Off.1 2CrdY Fls PKwon PKcon OG Recov Won \\\n", "12748 -0.6 -0.97 -1.36 0.0 1 0.0 0.0 0.0 4.0 1.0 \n", "12749 5.7 2.07 -2.91 0.0 4 0.0 0.0 0.0 11.0 1.0 \n", "12750 3.1 1.03 NaN 0.0 0 0.0 0.0 0.0 6.0 0.0 \n", "12751 -0.8 -0.38 NaN 0.0 0 0.0 0.0 0.0 7.0 0.0 \n", "12752 -0.3 -26.51 -27.19 0.0 0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Lost Won% League Name League ID Season \n", "12748 0.0 100.0 Big-5-European-Leagues Big5 2021-2022 \n", "12749 1.0 50.0 Big-5-European-Leagues Big5 2021-2022 \n", "12750 0.0 NaN Big-5-European-Leagues Big5 2021-2022 \n", "12751 0.0 NaN Big-5-European-Leagues Big5 2021-2022 \n", "12752 0.0 NaN Big-5-European-Leagues Big5 2021-2022 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the last five rows of the raw DataFrame, df_fbref_outfield_raw\n", "df_fbref_outfield_raw.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html) returns a tuple representing the dimensionality of the DataFrame." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12753, 164)\n" ] } ], "source": [ "# Print the shape of the raw DataFrame, df_fbref_outfield_raw\n", "print(df_fbref_outfield_raw.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[columns](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.columns.html) returns the column labels of the DataFrame." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Player', 'Nation', 'Pos', 'Squad', 'Comp', 'Age', 'Born', 'MP',\n", " 'Starts', 'Min',\n", " ...\n", " 'PKwon', 'PKcon', 'OG', 'Recov', 'Won', 'Lost', 'Won%', 'League Name',\n", " 'League ID', 'Season'],\n", " dtype='object', length=164)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Features (column names) of the raw DataFrame, df_fbref_outfield_raw\n", "df_fbref_outfield_raw.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html) method returns the data types of each attribute in the DataFrame." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Player object\n", "Nation object\n", "Pos object\n", "Squad object\n", "Comp object\n", " ... \n", "Lost float64\n", "Won% float64\n", "League Name object\n", "League ID object\n", "Season object\n", "Length: 164, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data types of the features of the raw DataFrame, df_fbref_outfield_raw\n", "df_fbref_outfield_raw.dtypes" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Player object\n", "Nation object\n", "Pos object\n", "Squad object\n", "Comp object\n", "Age object\n", "Born float64\n", "MP int64\n", "Starts int64\n", "Min int64\n", "90s float64\n", "Gls int64\n", "Ast int64\n", "G-PK int64\n", "PK int64\n", "PKatt int64\n", "CrdY int64\n", "CrdR int64\n", "Gls.1 float64\n", "Ast.1 float64\n", "G+A float64\n", "G-PK.1 float64\n", "G+A-PK float64\n", "xG float64\n", "npxG float64\n", "xA float64\n", "npxG+xA float64\n", "xG.1 float64\n", "xA.1 float64\n", "xG+xA float64\n", "npxG.1 float64\n", "npxG+xA.1 float64\n", "Matches object\n", "Sh float64\n", "SoT int64\n", "SoT% float64\n", "Sh/90 float64\n", "SoT/90 float64\n", "G/Sh float64\n", "G/SoT float64\n", "Dist float64\n", "FK float64\n", "npxG/Sh float64\n", "G-xG float64\n", "np:G-xG float64\n", "Cmp float64\n", "Att float64\n", "Cmp% float64\n", "TotDist float64\n", "PrgDist float64\n", "Cmp.1 float64\n", "Att.1 float64\n", "Cmp%.1 float64\n", "Cmp.2 float64\n", "Att.2 float64\n", "Cmp%.2 float64\n", "Cmp.3 float64\n", "Att.3 float64\n", "Cmp%.3 float64\n", "A-xA float64\n", "KP float64\n", "1/3 float64\n", "PPA float64\n", "CrsPA float64\n", "Prog float64\n", "Live float64\n", "Dead float64\n", "TB float64\n", "Press float64\n", "Sw float64\n", "Crs float64\n", "CK float64\n", "In float64\n", "Out float64\n", "Str float64\n", "Ground float64\n", "Low float64\n", "High float64\n", "Left float64\n", "Right float64\n", "Head float64\n", "TI float64\n", "Other float64\n", "Off float64\n", "Out.1 float64\n", "Int float64\n", "Blocks float64\n", "SCA float64\n", "SCA90 float64\n", "PassLive float64\n", "PassDead float64\n", "Drib float64\n", "Fld float64\n", "Def float64\n", "GCA float64\n", "GCA90 float64\n", "PassLive.1 float64\n", "PassDead.1 float64\n", "Drib.1 float64\n", "Sh.1 float64\n", "Fld.1 float64\n", "Def.1 float64\n", "Tkl float64\n", "TklW float64\n", "Def 3rd float64\n", "Mid 3rd float64\n", "Att 3rd float64\n", "Tkl.1 float64\n", "Tkl% float64\n", "Past float64\n", "Succ float64\n", "% float64\n", "Def 3rd.1 float64\n", "Mid 3rd.1 float64\n", "Att 3rd.1 float64\n", "ShSv float64\n", "Pass float64\n", "Tkl+Int float64\n", "Clr float64\n", "Err float64\n", "Touches float64\n", "Def Pen float64\n", "Att Pen float64\n", "Succ% float64\n", "#Pl float64\n", "Megs float64\n", "Carries float64\n", "CPA float64\n", "Mis float64\n", "Dis float64\n", "Targ float64\n", "Rec float64\n", "Rec% float64\n", "Prog.1 float64\n", "Mn/MP int64\n", "Min% float64\n", "Mn/Start float64\n", "Compl float64\n", "Subs int64\n", "Mn/Sub float64\n", "unSub int64\n", "PPM float64\n", "onG float64\n", "onGA float64\n", "+/- float64\n", "+/-90 float64\n", "On-Off float64\n", "onxG float64\n", "onxGA float64\n", "xG+/- float64\n", "xG+/-90 float64\n", "On-Off.1 float64\n", "2CrdY float64\n", "Fls int64\n", "PKwon float64\n", "PKcon float64\n", "OG float64\n", "Recov float64\n", "Won float64\n", "Lost float64\n", "Won% float64\n", "League Name object\n", "League ID object\n", "Season object\n", "dtype: object\n" ] } ], "source": [ "# Displays all one hundered and fifty one columns\n", "with pd.option_context('display.max_rows', None, 'display.max_columns', None):\n", " print(df_fbref_outfield_raw.dtypes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) method to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 12753 entries, 0 to 12752\n", "Columns: 164 entries, Player to Season\n", "dtypes: float64(139), int64(15), object(10)\n", "memory usage: 16.0+ MB\n" ] } ], "source": [ "# Info for the raw DataFrame, df_fbref_outfield_raw\n", "df_fbref_outfield_raw.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html) method to show some useful statistics for each numerical column in the DataFrame." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BornMPStartsMin90sGlsAstG-PKPKPKattCrdYCrdRGls.1Ast.1G+AG-PK.1G+A-PKxGnpxGxAnpxG+xAxG.1xA.1xG+xAnpxG.1npxG+xA.1ShSoTSoT%Sh/90SoT/90G/ShG/SoTDistFKnpxG/ShG-xGnp:G-xGCmpAttCmp%TotDistPrgDistCmp.1Att.1Cmp%.1Cmp.2Att.2Cmp%.2Cmp.3Att.3Cmp%.3A-xAKP1/3PPACrsPAProgLiveDeadTBPressSwCrsCKInOutStrGroundLowHighLeftRightHeadTIOtherOffOut.1IntBlocksSCASCA90PassLivePassDeadDribFldDefGCAGCA90PassLive.1PassDead.1Drib.1Sh.1Fld.1Def.1TklTklWDef 3rdMid 3rdAtt 3rdTkl.1Tkl%PastSucc%Def 3rd.1Mid 3rd.1Att 3rd.1ShSvPassTkl+IntClrErrTouchesDef PenAtt PenSucc%#PlMegsCarriesCPAMisDisTargRecRec%Prog.1Mn/MPMin%Mn/StartComplSubsMn/SubunSubPPMonGonGA+/-+/-90On-OffonxGonxGAxG+/-xG+/-90On-Off.12CrdYFlsPKwonPKconOGRecovWonLostWon%
count
mean1993.15989616.39598512.6776441138.25154912.6475261.5460681.0628091.3958280.1502390.1917202.3140440.1207560.1144230.0832270.1976450.1056210.1888411.5548911.4078741.0350452.4441280.1333570.0848760.2183360.1246800.20966614.2400754.74531529.9909061.2581550.3976060.0807470.26748917.4379350.5923220.090355-0.006689-0.010009447.062961564.93060176.7992198853.3835773025.276888177.886560204.14295886.103027192.683938224.68645082.64061869.117601116.40116258.9367590.02893710.36897533.6212919.1742822.50761538.398807509.41576455.5148371.02284587.25160917.11516714.0813595.6430371.5000791.4986650.351704369.48634076.785053118.659209160.727822335.23834224.60111523.6665887.7033291.94583110.68676410.45360313.98060922.4448891.86937916.0525202.0427851.4042241.3239130.4883812.5394100.1974931.7229550.1651750.1807190.2114930.2035640.05550319.75145212.3393229.8305867.5756792.3451886.60378432.81019112.59067449.96263128.18444758.92612780.01805638.6078660.08031113.98170830.74242426.4933270.295729699.20058172.80585625.78104962.02965912.5898890.662977449.7871724.32760214.32399113.159601528.246585447.05864383.77468644.82289261.46781142.11313480.8991208.7494413.71834120.9954555.0994281.3190017.50137317.4502940.051079-0.171789-0.13261616.84014016.8166430.024258-0.087383-0.0648230.05135514.5245820.1460750.1795920.046486103.55597417.55471817.55424745.919470
std4.63100812.09316711.458054999.25204611.1027913.1816501.8921832.7939460.7177910.8460672.6846960.3892410.3515890.4065130.5505260.3432050.5436772.8308732.4511411.5919003.6903150.3489000.2929610.4690960.3420510.46290420.1687267.77081522.6117722.6263800.9739850.1247510.2742635.9900282.1226420.0649271.0777131.059075470.932048567.69576711.4840169707.4651013787.970490191.574344214.74619010.802994220.313132245.61485113.53518889.168169151.03509118.7593350.91822214.63718342.69733113.3026894.61568345.879388519.11361186.5265552.09835886.77604222.47762822.76135517.4041325.6432265.7826731.474825398.14664183.320006140.651885278.205977415.42773429.19829657.80869424.0821912.62450211.56005811.81630016.31588428.8185542.72360419.9253335.7401922.8665002.4184190.9143583.9411420.5377572.7233730.6316900.6013500.5543580.5740430.24900822.19238514.17781312.2595099.2089983.1326598.11059122.57460314.59470451.22831812.38094666.28849390.44531553.6525030.32590415.22481534.27890842.6229350.684454671.437057173.29709138.52237223.89562417.7842121.407206449.5798628.41546919.38497317.299691513.065768450.64941916.02357767.89533625.38439631.2913419.9716469.9970754.34627211.6681036.5364770.7238517.62897115.81922012.3146452.9563833.31730915.99185315.0095349.1556391.7595721.9861650.23081615.1398190.4768900.4725860.223919103.79894725.84046523.10351822.725612
min1977.0000001.0000000.0000001.0000000.0000000.0000000.000000-1.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000-0.280000-0.2800000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000-0.330000-1.0000001.0000000.0000000.010000-7.700000-6.9000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000-6.2000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000001.0000000.0000001.0000000.0000000.0000000.0000000.0000000.000000.0000000.000000-52.000000-180.000000-179.7100000.0000000.000000-43.800000-81.020000-80.9900000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
25%1990.0000003.0000002.000000200.0000002.2000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.1000000.0100000.0000000.0400000.0100000.0400001.0000000.00000015.4000000.2300000.0000000.0000000.00000013.1000000.0000000.050000-0.400000-0.40000065.00000086.00000071.1000001222.250000330.00000025.00000029.00000082.00000025.00000031.00000076.2000007.00000013.00000048.300000-0.3000001.0000003.0000000.0000000.0000003.00000076.0000005.0000000.00000012.0000002.0000000.0000000.0000000.0000000.0000000.00000053.00000011.00000015.00000013.00000035.0000002.0000000.0000000.0000000.0000001.0000001.0000002.0000002.0000000.5000002.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000002.0000001.0000001.0000001.0000000.0000000.00000019.0000001.0000006.00000023.4000006.0000008.0000003.0000000.0000001.0000003.0000001.0000000.000000112.0000004.0000002.00000050.0000001.0000000.00000068.0000000.0000001.0000001.00000085.00000068.00000075.4000002.00000044.00000012.50000076.0000001.0000001.00000014.0000000.0000000.900003.0000003.000000-5.000000-0.680000-0.6300002.8000003.000000-3.800000-0.510000-0.4000000.0000002.0000000.0000000.0000000.00000015.0000001.0000002.00000032.800000
50%1993.00000016.00000010.000000903.00000010.0000000.0000000.0000000.0000000.0000000.0000001.0000000.0000000.0000000.0000000.0600000.0000000.0600000.5000000.5000000.4000001.0000000.0500000.0500000.1200000.0500000.1200006.0000002.00000030.0000000.7900000.1600000.0400000.24000017.3000000.0000000.0800000.0000000.000000283.000000372.00000077.9000005018.5000001448.000000113.000000133.00000087.500000107.000000134.00000084.60000031.00000054.00000059.2000000.0000004.00000017.0000004.0000001.00000020.000000338.50000021.0000000.00000061.0000008.0000004.0000000.0000000.0000000.0000000.000000232.00000049.00000067.00000057.000000149.00000013.0000002.0000002.0000001.0000007.0000006.0000008.00000011.0000001.5500008.0000000.0000000.0000000.0000000.0000001.0000000.0800001.0000000.0000000.0000000.0000000.0000000.00000011.0000007.0000004.0000004.0000001.0000003.00000033.3000007.00000034.00000028.10000031.00000048.00000014.0000000.0000009.00000018.0000008.0000000.000000501.00000018.00000011.00000062.1000005.0000000.000000314.0000001.0000006.0000006.000000381.000000309.50000089.20000015.00000068.00000038.90000084.0000004.0000002.00000020.0000003.0000001.2600012.00000014.000000-1.000000-0.040000-0.01000012.40000013.250000-0.400000-0.110000-0.0100000.00000010.0000000.0000000.0000000.00000070.0000007.00000010.00000047.500000
75%1997.00000027.00000022.0000001948.00000021.6000002.0000001.0000002.0000000.0000000.0000004.0000000.0000000.1300000.1000000.2700000.1200000.2600001.7000001.6000001.4000003.1000000.1700000.1200000.3000000.1600000.29000019.0000006.00000040.9000001.8000000.5600000.1300000.41000021.3000000.0000000.1200000.1000000.100000705.000000922.00000084.20000013973.7500004478.000000276.000000321.00000092.000000296.000000351.00000091.70000099.000000167.00000070.7000000.20000015.00000049.00000013.0000003.00000061.000000805.75000060.0000001.000000141.00000025.00000018.0000001.0000000.0000000.0000000.000000570.000000118.000000176.000000157.000000519.00000038.00000011.0000005.0000003.00000017.00000016.00000021.00000033.0000002.60000024.0000001.0000002.0000002.0000001.0000003.0000000.2800002.0000000.0000000.0000000.0000000.0000000.00000031.00000020.00000016.00000012.0000004.00000010.00000045.50000019.00000082.00000033.10000097.000000122.00000055.0000000.00000023.00000049.00000032.0000000.0000001152.75000064.00000033.00000075.00000017.0000001.000000717.0000005.00000020.00000019.000000846.000000707.75000096.40000058.00000083.00000068.20000089.00000015.0000006.00000026.0000007.0000001.7500028.00000029.0000003.0000000.4700000.55000027.70000027.8000002.1000000.3800000.3600000.00000023.0000000.0000000.0000000.000000167.00000023.00000025.00000060.000000
max2005.00000038.00000038.0000003420.00000038.00000041.00000021.00000033.00000014.00000015.00000017.0000005.00000022.50000030.00000030.00000022.50000030.00000031.60000025.80000018.40000035.90000024.04000025.40000025.40000024.04000025.400000196.00000092.000000100.000000180.00000045.0000001.0000001.00000074.30000047.0000000.88000012.60000012.3000002864.0000003229.000000100.00000065376.00000032451.0000001638.0000001745.000000100.0000001536.0000001619.000000100.000000664.0000001226.000000100.0000008.300000131.000000442.000000145.00000051.000000323.0000003153.000000718.00000048.000000509.000000227.000000223.000000190.00000085.00000078.00000025.0000002546.000000625.0000001136.0000002608.0000002900.000000205.000000477.000000269.00000034.00000098.000000140.000000122.000000241.00000090.000000167.00000068.00000039.00000032.0000009.00000043.00000030.00000028.0000009.00000011.0000006.0000007.0000003.000000168.000000114.000000100.00000079.00000029.00000061.000000100.000000126.000000313.000000100.000000469.000000604.000000579.0000005.00000096.000000241.000000368.0000007.0000003561.0000001584.000000320.000000100.000000197.00000021.0000002546.000000109.000000153.000000172.0000003096.0000002833.000000100.000000538.00000093.000000100.000000113.00000038.00000030.00000086.00000037.0000003.00000101.00000091.00000076.00000045.00000045.34000089.10000076.30000060.50000038.78000039.2900002.00000098.0000006.0000005.0000004.000000555.000000280.000000290.000000100.000000
\n", "
" ], "text/plain": [ " Born MP Starts Min 90s \\\n", "count 12752.000000 12753.000000 12753.000000 12753.000000 12753.000000 \n", "mean 1993.159896 16.395985 12.677644 1138.251549 12.647526 \n", "std 4.631008 12.093167 11.458054 999.252046 11.102791 \n", "min 1977.000000 1.000000 0.000000 1.000000 0.000000 \n", "25% 1990.000000 3.000000 2.000000 200.000000 2.200000 \n", "50% 1993.000000 16.000000 10.000000 903.000000 10.000000 \n", "75% 1997.000000 27.000000 22.000000 1948.000000 21.600000 \n", "max 2005.000000 38.000000 38.000000 3420.000000 38.000000 \n", "\n", " Gls Ast G-PK PK PKatt \\\n", "count 12753.000000 12753.000000 12753.000000 12753.000000 12753.000000 \n", "mean 1.546068 1.062809 1.395828 0.150239 0.191720 \n", "std 3.181650 1.892183 2.793946 0.717791 0.846067 \n", "min 0.000000 0.000000 -1.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "75% 2.000000 1.000000 2.000000 0.000000 0.000000 \n", "max 41.000000 21.000000 33.000000 14.000000 15.000000 \n", "\n", " CrdY CrdR Gls.1 Ast.1 G+A \\\n", "count 12753.000000 12753.000000 12753.000000 12753.000000 12753.000000 \n", "mean 2.314044 0.120756 0.114423 0.083227 0.197645 \n", "std 2.684696 0.389241 0.351589 0.406513 0.550526 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 1.000000 0.000000 0.000000 0.000000 0.060000 \n", "75% 4.000000 0.000000 0.130000 0.100000 0.270000 \n", "max 17.000000 5.000000 22.500000 30.000000 30.000000 \n", "\n", " G-PK.1 G+A-PK xG npxG xA \\\n", "count 12753.000000 12753.000000 12738.000000 12738.000000 12738.000000 \n", "mean 0.105621 0.188841 1.554891 1.407874 1.035045 \n", "std 0.343205 0.543677 2.830873 2.451141 1.591900 \n", "min -0.280000 -0.280000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.060000 0.500000 0.500000 0.400000 \n", "75% 0.120000 0.260000 1.700000 1.600000 1.400000 \n", "max 22.500000 30.000000 31.600000 25.800000 18.400000 \n", "\n", " npxG+xA xG.1 xA.1 xG+xA npxG.1 \\\n", "count 12738.000000 12737.000000 12737.000000 12737.000000 12737.000000 \n", "mean 2.444128 0.133357 0.084876 0.218336 0.124680 \n", "std 3.690315 0.348900 0.292961 0.469096 0.342051 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.100000 0.010000 0.000000 0.040000 0.010000 \n", "50% 1.000000 0.050000 0.050000 0.120000 0.050000 \n", "75% 3.100000 0.170000 0.120000 0.300000 0.160000 \n", "max 35.900000 24.040000 25.400000 25.400000 24.040000 \n", "\n", " npxG+xA.1 Sh SoT SoT% Sh/90 \\\n", "count 12737.000000 12746.000000 12753.000000 10138.000000 12746.000000 \n", "mean 0.209666 14.240075 4.745315 29.990906 1.258155 \n", "std 0.462904 20.168726 7.770815 22.611772 2.626380 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.040000 1.000000 0.000000 15.400000 0.230000 \n", "50% 0.120000 6.000000 2.000000 30.000000 0.790000 \n", "75% 0.290000 19.000000 6.000000 40.900000 1.800000 \n", "max 25.400000 196.000000 92.000000 100.000000 180.000000 \n", "\n", " SoT/90 G/Sh G/SoT Dist FK \\\n", "count 12753.000000 10138.000000 8225.000000 10133.000000 12738.000000 \n", "mean 0.397606 0.080747 0.267489 17.437935 0.592322 \n", "std 0.973985 0.124751 0.274263 5.990028 2.122642 \n", "min 0.000000 -0.330000 -1.000000 1.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 13.100000 0.000000 \n", "50% 0.160000 0.040000 0.240000 17.300000 0.000000 \n", "75% 0.560000 0.130000 0.410000 21.300000 0.000000 \n", "max 45.000000 1.000000 1.000000 74.300000 47.000000 \n", "\n", " npxG/Sh G-xG np:G-xG Cmp Att \\\n", "count 10133.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 0.090355 -0.006689 -0.010009 447.062961 564.930601 \n", "std 0.064927 1.077713 1.059075 470.932048 567.695767 \n", "min 0.010000 -7.700000 -6.900000 0.000000 0.000000 \n", "25% 0.050000 -0.400000 -0.400000 65.000000 86.000000 \n", "50% 0.080000 0.000000 0.000000 283.000000 372.000000 \n", "75% 0.120000 0.100000 0.100000 705.000000 922.000000 \n", "max 0.880000 12.600000 12.300000 2864.000000 3229.000000 \n", "\n", " Cmp% TotDist PrgDist Cmp.1 Att.1 \\\n", "count 12671.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 76.799219 8853.383577 3025.276888 177.886560 204.142958 \n", "std 11.484016 9707.465101 3787.970490 191.574344 214.746190 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 71.100000 1222.250000 330.000000 25.000000 29.000000 \n", "50% 77.900000 5018.500000 1448.000000 113.000000 133.000000 \n", "75% 84.200000 13973.750000 4478.000000 276.000000 321.000000 \n", "max 100.000000 65376.000000 32451.000000 1638.000000 1745.000000 \n", "\n", " Cmp%.1 Cmp.2 Att.2 Cmp%.2 Cmp.3 \\\n", "count 12519.000000 12738.000000 12738.000000 12497.000000 12738.000000 \n", "mean 86.103027 192.683938 224.686450 82.640618 69.117601 \n", "std 10.802994 220.313132 245.614851 13.535188 89.168169 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 82.000000 25.000000 31.000000 76.200000 7.000000 \n", "50% 87.500000 107.000000 134.000000 84.600000 31.000000 \n", "75% 92.000000 296.000000 351.000000 91.700000 99.000000 \n", "max 100.000000 1536.000000 1619.000000 100.000000 664.000000 \n", "\n", " Att.3 Cmp%.3 A-xA KP 1/3 \\\n", "count 12738.000000 12106.000000 12738.000000 12738.000000 12738.000000 \n", "mean 116.401162 58.936759 0.028937 10.368975 33.621291 \n", "std 151.035091 18.759335 0.918222 14.637183 42.697331 \n", "min 0.000000 0.000000 -6.200000 0.000000 0.000000 \n", "25% 13.000000 48.300000 -0.300000 1.000000 3.000000 \n", "50% 54.000000 59.200000 0.000000 4.000000 17.000000 \n", "75% 167.000000 70.700000 0.200000 15.000000 49.000000 \n", "max 1226.000000 100.000000 8.300000 131.000000 442.000000 \n", "\n", " PPA CrsPA Prog Live Dead \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 9.174282 2.507615 38.398807 509.415764 55.514837 \n", "std 13.302689 4.615683 45.879388 519.113611 86.526555 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 3.000000 76.000000 5.000000 \n", "50% 4.000000 1.000000 20.000000 338.500000 21.000000 \n", "75% 13.000000 3.000000 61.000000 805.750000 60.000000 \n", "max 145.000000 51.000000 323.000000 3153.000000 718.000000 \n", "\n", " TB Press Sw Crs CK \\\n", "count 12738.000000 12738.000000 12738.000000 12746.000000 12738.000000 \n", "mean 1.022845 87.251609 17.115167 14.081359 5.643037 \n", "std 2.098358 86.776042 22.477628 22.761355 17.404132 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 12.000000 2.000000 0.000000 0.000000 \n", "50% 0.000000 61.000000 8.000000 4.000000 0.000000 \n", "75% 1.000000 141.000000 25.000000 18.000000 1.000000 \n", "max 48.000000 509.000000 227.000000 223.000000 190.000000 \n", "\n", " In Out Str Ground Low \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 1.500079 1.498665 0.351704 369.486340 76.785053 \n", "std 5.643226 5.782673 1.474825 398.146641 83.320006 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 53.000000 11.000000 \n", "50% 0.000000 0.000000 0.000000 232.000000 49.000000 \n", "75% 0.000000 0.000000 0.000000 570.000000 118.000000 \n", "max 85.000000 78.000000 25.000000 2546.000000 625.000000 \n", "\n", " High Left Right Head TI \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 118.659209 160.727822 335.238342 24.601115 23.666588 \n", "std 140.651885 278.205977 415.427734 29.198296 57.808694 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 15.000000 13.000000 35.000000 2.000000 0.000000 \n", "50% 67.000000 57.000000 149.000000 13.000000 2.000000 \n", "75% 176.000000 157.000000 519.000000 38.000000 11.000000 \n", "max 1136.000000 2608.000000 2900.000000 205.000000 477.000000 \n", "\n", " Other Off Out.1 Int Blocks \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 7.703329 1.945831 10.686764 10.453603 13.980609 \n", "std 24.082191 2.624502 11.560058 11.816300 16.315884 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 1.000000 1.000000 2.000000 \n", "50% 2.000000 1.000000 7.000000 6.000000 8.000000 \n", "75% 5.000000 3.000000 17.000000 16.000000 21.000000 \n", "max 269.000000 34.000000 98.000000 140.000000 122.000000 \n", "\n", " SCA SCA90 PassLive PassDead Drib \\\n", "count 12738.000000 12737.000000 12738.000000 12738.000000 12738.000000 \n", "mean 22.444889 1.869379 16.052520 2.042785 1.404224 \n", "std 28.818554 2.723604 19.925333 5.740192 2.866500 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 2.000000 0.500000 2.000000 0.000000 0.000000 \n", "50% 11.000000 1.550000 8.000000 0.000000 0.000000 \n", "75% 33.000000 2.600000 24.000000 1.000000 2.000000 \n", "max 241.000000 90.000000 167.000000 68.000000 39.000000 \n", "\n", " Fld Def GCA GCA90 PassLive.1 \\\n", "count 12738.000000 12738.000000 12738.000000 12737.000000 12738.000000 \n", "mean 1.323913 0.488381 2.539410 0.197493 1.722955 \n", "std 2.418419 0.914358 3.941142 0.537757 2.723373 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.000000 1.000000 0.080000 1.000000 \n", "75% 2.000000 1.000000 3.000000 0.280000 2.000000 \n", "max 32.000000 9.000000 43.000000 30.000000 28.000000 \n", "\n", " PassDead.1 Drib.1 Sh.1 Fld.1 Def.1 \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 0.165175 0.180719 0.211493 0.203564 0.055503 \n", "std 0.631690 0.601350 0.554358 0.574043 0.249008 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "75% 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "max 9.000000 11.000000 6.000000 7.000000 3.000000 \n", "\n", " Tkl TklW Def 3rd Mid 3rd Att 3rd \\\n", "count 12738.000000 12746.000000 12738.000000 12738.000000 12738.000000 \n", "mean 19.751452 12.339322 9.830586 7.575679 2.345188 \n", "std 22.192385 14.177813 12.259509 9.208998 3.132659 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 2.000000 1.000000 1.000000 1.000000 0.000000 \n", "50% 11.000000 7.000000 4.000000 4.000000 1.000000 \n", "75% 31.000000 20.000000 16.000000 12.000000 4.000000 \n", "max 168.000000 114.000000 100.000000 79.000000 29.000000 \n", "\n", " Tkl.1 Tkl% Past Succ % \\\n", "count 12738.000000 11000.000000 12738.000000 12738.000000 12178.000000 \n", "mean 6.603784 32.810191 12.590674 49.962631 28.184447 \n", "std 8.110591 22.574603 14.594704 51.228318 12.380946 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 19.000000 1.000000 6.000000 23.400000 \n", "50% 3.000000 33.300000 7.000000 34.000000 28.100000 \n", "75% 10.000000 45.500000 19.000000 82.000000 33.100000 \n", "max 61.000000 100.000000 126.000000 313.000000 100.000000 \n", "\n", " Def 3rd.1 Mid 3rd.1 Att 3rd.1 ShSv Pass \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 58.926127 80.018056 38.607866 0.080311 13.981708 \n", "std 66.288493 90.445315 53.652503 0.325904 15.224815 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 6.000000 8.000000 3.000000 0.000000 1.000000 \n", "50% 31.000000 48.000000 14.000000 0.000000 9.000000 \n", "75% 97.000000 122.000000 55.000000 0.000000 23.000000 \n", "max 469.000000 604.000000 579.000000 5.000000 96.000000 \n", "\n", " Tkl+Int Clr Err Touches Def Pen \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 30.742424 26.493327 0.295729 699.200581 72.805856 \n", "std 34.278908 42.622935 0.684454 671.437057 173.297091 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 3.000000 1.000000 0.000000 112.000000 4.000000 \n", "50% 18.000000 8.000000 0.000000 501.000000 18.000000 \n", "75% 49.000000 32.000000 0.000000 1152.750000 64.000000 \n", "max 241.000000 368.000000 7.000000 3561.000000 1584.000000 \n", "\n", " Att Pen Succ% #Pl Megs Carries \\\n", "count 12738.000000 10479.000000 12738.000000 12738.000000 12738.000000 \n", "mean 25.781049 62.029659 12.589889 0.662977 449.787172 \n", "std 38.522372 23.895624 17.784212 1.407206 449.579862 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 2.000000 50.000000 1.000000 0.000000 68.000000 \n", "50% 11.000000 62.100000 5.000000 0.000000 314.000000 \n", "75% 33.000000 75.000000 17.000000 1.000000 717.000000 \n", "max 320.000000 100.000000 197.000000 21.000000 2546.000000 \n", "\n", " CPA Mis Dis Targ Rec \\\n", "count 12738.000000 12738.000000 12738.000000 12738.000000 12738.000000 \n", "mean 4.327602 14.323991 13.159601 528.246585 447.058643 \n", "std 8.415469 19.384973 17.299691 513.065768 450.649419 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 1.000000 1.000000 85.000000 68.000000 \n", "50% 1.000000 6.000000 6.000000 381.000000 309.500000 \n", "75% 5.000000 20.000000 19.000000 846.000000 707.750000 \n", "max 109.000000 153.000000 172.000000 3096.000000 2833.000000 \n", "\n", " Rec% Prog.1 Mn/MP Min% Mn/Start \\\n", "count 12677.000000 12738.000000 12753.000000 12753.000000 8753.000000 \n", "mean 83.774686 44.822892 61.467811 42.113134 80.899120 \n", "std 16.023577 67.895336 25.384396 31.291341 9.971646 \n", "min 0.000000 0.000000 1.000000 0.000000 1.000000 \n", "25% 75.400000 2.000000 44.000000 12.500000 76.000000 \n", "50% 89.200000 15.000000 68.000000 38.900000 84.000000 \n", "75% 96.400000 58.000000 83.000000 68.200000 89.000000 \n", "max 100.000000 538.000000 93.000000 100.000000 113.000000 \n", "\n", " Compl Subs Mn/Sub unSub PPM \\\n", "count 11630.000000 12753.000000 7481.000000 12753.000000 12745.00000 \n", "mean 8.749441 3.718341 20.995455 5.099428 1.31900 \n", "std 9.997075 4.346272 11.668103 6.536477 0.72385 \n", "min 0.000000 0.000000 0.000000 0.000000 0.00000 \n", "25% 1.000000 1.000000 14.000000 0.000000 0.90000 \n", "50% 4.000000 2.000000 20.000000 3.000000 1.26000 \n", "75% 15.000000 6.000000 26.000000 7.000000 1.75000 \n", "max 38.000000 30.000000 86.000000 37.000000 3.00000 \n", "\n", " onG onGA +/- +/-90 On-Off \\\n", "count 12745.000000 12745.000000 12745.000000 12745.000000 12262.000000 \n", "mean 17.501373 17.450294 0.051079 -0.171789 -0.132616 \n", "std 17.628971 15.819220 12.314645 2.956383 3.317309 \n", "min 0.000000 0.000000 -52.000000 -180.000000 -179.710000 \n", "25% 3.000000 3.000000 -5.000000 -0.680000 -0.630000 \n", "50% 12.000000 14.000000 -1.000000 -0.040000 -0.010000 \n", "75% 28.000000 29.000000 3.000000 0.470000 0.550000 \n", "max 101.000000 91.000000 76.000000 45.000000 45.340000 \n", "\n", " onxG onxGA xG+/- xG+/-90 On-Off.1 \\\n", "count 12738.000000 12738.000000 12738.000000 12737.000000 12254.000000 \n", "mean 16.840140 16.816643 0.024258 -0.087383 -0.064823 \n", "std 15.991853 15.009534 9.155639 1.759572 1.986165 \n", "min 0.000000 0.000000 -43.800000 -81.020000 -80.990000 \n", "25% 2.800000 3.000000 -3.800000 -0.510000 -0.400000 \n", "50% 12.400000 13.250000 -0.400000 -0.110000 -0.010000 \n", "75% 27.700000 27.800000 2.100000 0.380000 0.360000 \n", "max 89.100000 76.300000 60.500000 38.780000 39.290000 \n", "\n", " 2CrdY Fls PKwon PKcon OG \\\n", "count 12735.000000 12753.000000 12740.000000 12740.000000 12735.000000 \n", "mean 0.051355 14.524582 0.146075 0.179592 0.046486 \n", "std 0.230816 15.139819 0.476890 0.472586 0.223919 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 2.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 10.000000 0.000000 0.000000 0.000000 \n", "75% 0.000000 23.000000 0.000000 0.000000 0.000000 \n", "max 2.000000 98.000000 6.000000 5.000000 4.000000 \n", "\n", " Recov Won Lost Won% \n", "count 12738.000000 12738.000000 12738.000000 11063.000000 \n", "mean 103.555974 17.554718 17.554247 45.919470 \n", "std 103.798947 25.840465 23.103518 22.725612 \n", "min 0.000000 0.000000 0.000000 0.000000 \n", "25% 15.000000 1.000000 2.000000 32.800000 \n", "50% 70.000000 7.000000 10.000000 47.500000 \n", "75% 167.000000 23.000000 25.000000 60.000000 \n", "max 555.000000 280.000000 290.000000 100.000000 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Description of the raw DataFrame, df_fbref_outfield_raw, showing some summary statistics for each numberical column in the DataFrame\n", "df_fbref_outfield_raw.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will check to see how many missing values we have i.e. the number of NULL values in the dataset, and in what features these missing values are located. This can be plotted nicely using the [missingno](https://pypi.org/project/missingno/) library (pip install missingno)." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot visualisation of the missing values for each feature of the raw DataFrame, df_fbref_outfield_raw\n", "msno.matrix(df_fbref_outfield_raw, figsize = (30, 7))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Nation 1\n", "Pos 1\n", "Born 1\n", "xG 15\n", "npxG 15\n", " ... \n", "OG 18\n", "Recov 15\n", "Won 15\n", "Lost 15\n", "Won% 1690\n", "Length: 133, dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Counts of missing values\n", "null_value_stats = df_fbref_outfield_raw.isnull().sum(axis=0)\n", "null_value_stats[null_value_stats != 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualisation shows us very quickly that there are missing values in the dataset but as this data is scraped, this fine at this stage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### 3.3. Goalkeepers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3.1. Data Dictionary\n", "The raw dataset has one hundred and eighty eight features (columns) with the following definitions and data types:\n", "\n", "| Variable | Data Type | Description |\n", "|------|-----|-----|\n", "| `squad` | object | Squad name e.g. Arsenal |\n", "| `players_used` | float64 | Number of Players used in Games |\n", "| `possession` | float64 | Percentage of time with possession of the ball |\n", "\n", "\n", "
\n", "\n", "The features will be cleaned, converted and also additional features will be created in the [Data Engineering](#section4) section (Section 4)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 3.3.2. Import CSV files as pandas DataFrames" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Import DataFrame as a CSV file\n", "df_fbref_goalkeeper_raw = pd.read_csv(data_dir_fbref + f'/raw/goalkeeper/fbref_goalkeeper_stats_combined_latest.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 3.3.3. Preliminary Data Handling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "##### 3.3.3.1. Summary Report\n", "Initial step of the data handling and Exploratory Data Analysis (EDA) is to create a quick summary report of the dataset using [pandas Profiling Report](https://github.com/pandas-profiling/pandas-profiling)." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Summary of the data using pandas Profiling Report\n", "#pp.ProfileReport(df_fbref_goalkeeper_raw)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "##### 3.3.3.2. Further Inspection\n", "The following commands go into more bespoke summary of the dataset. Some of the commands include content covered in the [pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) summary above, but using the standard [pandas](https://pandas.pydata.org/) functions and methods that most peoplem will be more familiar with.\n", "\n", "First check the quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerNationPosSquadCompAgeBornMPStartsMin90sGAGA90SoTASavesSave%WDLCSCS%PKattPKAPKsvPKmSave%.1MatchesFKCKOGPSxGPSxG/SoTPSxG+/-/90CmpAttCmp%Att.1ThrLaunch%AvgLenAtt.2Launch%.1AvgLen.1OppStpStp%#OPA#OPA/90AvgDistGlsAstG-PKPKCrdYCrdRGls.1Ast.1G+AG-PK.1G+A-PKxGnpxGxAnpxG+xAxG.1xA.1xG+xAnpxG.1npxG+xA.1Mn/MPMin%Mn/StartComplSubsMn/SubunSubPPMonGonGA+/-+/-90On-OffonxGonxGAxG+/-xG+/-90On-Off.12CrdYFlsFldOffCrsIntTklWPKwonPKconRecovWonLostWon%League NameLeague IDSeason
0Abdoulaye Diallosn SENGKRennesfr Ligue 125199233270.03.051.67161275.002100.011000.0Matches1006.90.371.90.6493228.1531037.736.212100.064.219315.820.6718.90000000.000.00.000.000.000.00.00.00.00.000.00.000.000.00907.9NaN3.00NaN310.6745-1-0.33-0.534.85.1-0.3-0.080.030000000001300NaNBig-5-European-LeaguesBig52017-2018
1Adriánes ESPGKWest Hameng Premier League30198719191710.019.0291.53966670.8766631.611000.0Matches18127.30.28-0.7-0.0415947033.83874778.054.118690.361.9197126.1140.7413.90000200.000.00.000.000.000.00.00.00.00.000.00.000.000.009050.0NaN19.00NaN191.42302910.051.1621.430.3-8.9-0.470.0800300000011600NaNBig-5-European-LeaguesBig52017-2018
2Alban Lafontfr FRAGKToulousefr Ligue 118199938383420.038.0541.4216111370.2910191231.666000.0Matches39052.60.30-1.4-0.0423961538.979311555.044.320686.961.5448419.2441.1617.10000400.000.00.000.000.000.10.10.00.10.000.00.000.000.0090100.0NaN38.00NaN00.973854-16-0.42NaN46.053.3-7.3-0.19NaN03700000123600NaNBig-5-European-LeaguesBig52017-2018
3Albano Bizzarriar ARGGKUdineseit Serie A39197732322880.032.0521.621379163.511417825.0522150.0Matches19442.00.29-6.0-0.1926261242.877413552.541.127176.053.2306268.5130.4113.00000010.000.00.000.000.000.00.00.00.00.000.00.000.000.009084.2NaN31.00NaN51.164152-11-0.340.3237.142.6-5.5-0.170.1102300000018600NaNBig-5-European-LeaguesBig52017-2018
4Alberto Brignoliit ITAGKBeneventoit Serie A25199113111126.012.5312.48714259.21210218.242020.0Matches06029.00.40-2.0-0.169226634.63746151.343.210371.856.11442517.4161.2817.01010200.080.00.080.080.080.10.10.00.10.010.00.010.010.018732.9NaN11.02NaN250.381431-17-1.36-0.0213.325.5-12.2-0.97-0.050310000015900NaNBig-5-European-LeaguesBig52017-2018
\n", "
" ], "text/plain": [ " Player Nation Pos Squad Comp Age Born MP \\\n", "0 Abdoulaye Diallo sn SEN GK Rennes fr Ligue 1 25 1992 3 \n", "1 Adrián es ESP GK West Ham eng Premier League 30 1987 19 \n", "2 Alban Lafont fr FRA GK Toulouse fr Ligue 1 18 1999 38 \n", "3 Albano Bizzarri ar ARG GK Udinese it Serie A 39 1977 32 \n", "4 Alberto Brignoli it ITA GK Benevento it Serie A 25 1991 13 \n", "\n", " Starts Min 90s GA GA90 SoTA Saves Save% W D L CS CS% \\\n", "0 3 270.0 3.0 5 1.67 16 12 75.0 0 2 1 0 0.0 \n", "1 19 1710.0 19.0 29 1.53 96 66 70.8 7 6 6 6 31.6 \n", "2 38 3420.0 38.0 54 1.42 161 113 70.2 9 10 19 12 31.6 \n", "3 32 2880.0 32.0 52 1.62 137 91 63.5 11 4 17 8 25.0 \n", "4 11 1126.0 12.5 31 2.48 71 42 59.2 1 2 10 2 18.2 \n", "\n", " PKatt PKA PKsv PKm Save%.1 Matches FK CK OG PSxG PSxG/SoT \\\n", "0 1 1 0 0 0.0 Matches 1 0 0 6.9 0.37 \n", "1 1 1 0 0 0.0 Matches 1 8 1 27.3 0.28 \n", "2 6 6 0 0 0.0 Matches 3 9 0 52.6 0.30 \n", "3 5 2 2 1 50.0 Matches 1 9 4 42.0 0.29 \n", "4 4 2 0 2 0.0 Matches 0 6 0 29.0 0.40 \n", "\n", " PSxG+/- /90 Cmp Att Cmp% Att.1 Thr Launch% AvgLen Att.2 \\\n", "0 1.9 0.64 9 32 28.1 53 10 37.7 36.2 12 \n", "1 -0.7 -0.04 159 470 33.8 387 47 78.0 54.1 186 \n", "2 -1.4 -0.04 239 615 38.9 793 115 55.0 44.3 206 \n", "3 -6.0 -0.19 262 612 42.8 774 135 52.5 41.1 271 \n", "4 -2.0 -0.16 92 266 34.6 374 61 51.3 43.2 103 \n", "\n", " Launch%.1 AvgLen.1 Opp Stp Stp% #OPA #OPA/90 AvgDist Gls Ast \\\n", "0 100.0 64.2 19 3 15.8 2 0.67 18.9 0 0 \n", "1 90.3 61.9 197 12 6.1 14 0.74 13.9 0 0 \n", "2 86.9 61.5 448 41 9.2 44 1.16 17.1 0 0 \n", "3 76.0 53.2 306 26 8.5 13 0.41 13.0 0 0 \n", "4 71.8 56.1 144 25 17.4 16 1.28 17.0 1 0 \n", "\n", " G-PK PK CrdY CrdR Gls.1 Ast.1 G+A G-PK.1 G+A-PK xG npxG xA \\\n", "0 0 0 0 0 0.00 0.0 0.00 0.00 0.00 0.0 0.0 0.0 \n", "1 0 0 2 0 0.00 0.0 0.00 0.00 0.00 0.0 0.0 0.0 \n", "2 0 0 4 0 0.00 0.0 0.00 0.00 0.00 0.1 0.1 0.0 \n", "3 0 0 0 1 0.00 0.0 0.00 0.00 0.00 0.0 0.0 0.0 \n", "4 1 0 2 0 0.08 0.0 0.08 0.08 0.08 0.1 0.1 0.0 \n", "\n", " npxG+xA xG.1 xA.1 xG+xA npxG.1 npxG+xA.1 Mn/MP Min% Mn/Start \\\n", "0 0.0 0.00 0.0 0.00 0.00 0.00 90 7.9 NaN \n", "1 0.0 0.00 0.0 0.00 0.00 0.00 90 50.0 NaN \n", "2 0.1 0.00 0.0 0.00 0.00 0.00 90 100.0 NaN \n", "3 0.0 0.00 0.0 0.00 0.00 0.00 90 84.2 NaN \n", "4 0.1 0.01 0.0 0.01 0.01 0.01 87 32.9 NaN \n", "\n", " Compl Subs Mn/Sub unSub PPM onG onGA +/- +/-90 On-Off onxG \\\n", "0 3.0 0 NaN 31 0.67 4 5 -1 -0.33 -0.53 4.8 \n", "1 19.0 0 NaN 19 1.42 30 29 1 0.05 1.16 21.4 \n", "2 38.0 0 NaN 0 0.97 38 54 -16 -0.42 NaN 46.0 \n", "3 31.0 0 NaN 5 1.16 41 52 -11 -0.34 0.32 37.1 \n", "4 11.0 2 NaN 25 0.38 14 31 -17 -1.36 -0.02 13.3 \n", "\n", " onxGA xG+/- xG+/-90 On-Off.1 2CrdY Fls Fld Off Crs Int TklW \\\n", "0 5.1 -0.3 -0.08 0.03 0 0 0 0 0 0 0 \n", "1 30.3 -8.9 -0.47 0.08 0 0 3 0 0 0 0 \n", "2 53.3 -7.3 -0.19 NaN 0 3 7 0 0 0 0 \n", "3 42.6 -5.5 -0.17 0.11 0 2 3 0 0 0 0 \n", "4 25.5 -12.2 -0.97 -0.05 0 3 1 0 0 0 0 \n", "\n", " PKwon PKcon Recov Won Lost Won% League Name League ID \\\n", "0 0 0 13 0 0 NaN Big-5-European-Leagues Big5 \n", "1 0 0 116 0 0 NaN Big-5-European-Leagues Big5 \n", "2 0 1 236 0 0 NaN Big-5-European-Leagues Big5 \n", "3 0 0 186 0 0 NaN Big-5-European-Leagues Big5 \n", "4 0 1 59 0 0 NaN Big-5-European-Leagues Big5 \n", "\n", " Season \n", "0 2017-2018 \n", "1 2017-2018 \n", "2 2017-2018 \n", "3 2017-2018 \n", "4 2017-2018 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the first five rows of the raw DataFrame, df_fbref_goalkeeper_raw\n", "df_fbref_goalkeeper_raw.head()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerNationPosSquadCompAgeBornMPStartsMin90sGAGA90SoTASavesSave%WDLCSCS%PKattPKAPKsvPKmSave%.1MatchesFKCKOGPSxGPSxG/SoTPSxG+/-/90CmpAttCmp%Att.1ThrLaunch%AvgLenAtt.2Launch%.1AvgLen.1OppStpStp%#OPA#OPA/90AvgDistGlsAstG-PKPKCrdYCrdRGls.1Ast.1G+AG-PK.1G+A-PKxGnpxGxAnpxG+xAxG.1xA.1xG+xAnpxG.1npxG+xA.1Mn/MPMin%Mn/StartComplSubsMn/SubunSubPPMonGonGA+/-+/-90On-OffonxGonxGAxG+/-xG+/-90On-Off.12CrdYFlsFldOffCrsIntTklWPKwonPKconRecovWonLostWon%League NameLeague IDSeason
922Yann Sommerch SUIGKM'Gladbachde Bundesliga32198833270.03.072.33171158.801200.00000NaNMatches0117.30.431.30.43184936.789941.638.92548.041.62129.520.6713.00000000.00.00.00.00.00.00.00.00.00.00.00.00.00.090100.090.03.00NaN00.3327-5-1.67NaN5.96.6-0.8-0.25NaN0000000001800NaNBig-5-European-LeaguesBig52021-2022
923Yassine Bounouma MARGKSevillaes La Liga30199122180.02.010.504375.0110150.00000NaNMatches0000.20.06-0.8-0.3861442.9411022.030.11435.738.61400.042.0019.90000000.00.00.00.00.00.00.00.00.00.00.00.00.00.09066.790.02.00NaN02.002110.50-2.52.41.21.20.62-2.59000000000800NaNBig-5-European-LeaguesBig52021-2022
924Álex Remiroes ESPGKReal Sociedades La Liga26199533270.03.041.339555.6201266.70000NaNMatches0003.50.39-0.5-0.16223857.970840.036.71855.647.921314.320.6718.80000000.00.00.00.00.00.00.00.00.00.00.00.00.00.090100.090.03.00NaN02.004400.00NaN3.83.40.40.14NaN0010000001000NaNBig-5-European-LeaguesBig52021-2022
925Łukasz Fabiańskipl POLGKWest Hameng Premier League36198533270.03.051.676116.721000.00000NaNMatches0003.10.52-1.9-0.62194047.545957.844.42070.048.32528.000.0013.60000000.00.00.00.00.00.00.00.00.00.00.00.00.00.090100.090.03.00NaN02.3310551.67NaN6.02.93.11.03NaN000000000600NaNBig-5-European-LeaguesBig52021-2022
926Łukasz Skorupskipl POLGKBolognait Serie A30199122180.02.021.006583.3110150.011000.0Matches0002.10.290.10.0362821.452740.436.82035.034.41616.300.0011.40000000.00.00.00.00.00.00.00.00.00.00.00.00.00.090100.090.02.00NaN02.003210.50NaN1.82.6-0.8-0.38NaN000000000700NaNBig-5-European-LeaguesBig52021-2022
\n", "
" ], "text/plain": [ " Player Nation Pos Squad Comp Age \\\n", "922 Yann Sommer ch SUI GK M'Gladbach de Bundesliga 32 \n", "923 Yassine Bounou ma MAR GK Sevilla es La Liga 30 \n", "924 Álex Remiro es ESP GK Real Sociedad es La Liga 26 \n", "925 Łukasz Fabiański pl POL GK West Ham eng Premier League 36 \n", "926 Łukasz Skorupski pl POL GK Bologna it Serie A 30 \n", "\n", " Born MP Starts Min 90s GA GA90 SoTA Saves Save% W D L CS \\\n", "922 1988 3 3 270.0 3.0 7 2.33 17 11 58.8 0 1 2 0 \n", "923 1991 2 2 180.0 2.0 1 0.50 4 3 75.0 1 1 0 1 \n", "924 1995 3 3 270.0 3.0 4 1.33 9 5 55.6 2 0 1 2 \n", "925 1985 3 3 270.0 3.0 5 1.67 6 1 16.7 2 1 0 0 \n", "926 1991 2 2 180.0 2.0 2 1.00 6 5 83.3 1 1 0 1 \n", "\n", " CS% PKatt PKA PKsv PKm Save%.1 Matches FK CK OG PSxG \\\n", "922 0.0 0 0 0 0 NaN Matches 0 1 1 7.3 \n", "923 50.0 0 0 0 0 NaN Matches 0 0 0 0.2 \n", "924 66.7 0 0 0 0 NaN Matches 0 0 0 3.5 \n", "925 0.0 0 0 0 0 NaN Matches 0 0 0 3.1 \n", "926 50.0 1 1 0 0 0.0 Matches 0 0 0 2.1 \n", "\n", " PSxG/SoT PSxG+/- /90 Cmp Att Cmp% Att.1 Thr Launch% AvgLen \\\n", "922 0.43 1.3 0.43 18 49 36.7 89 9 41.6 38.9 \n", "923 0.06 -0.8 -0.38 6 14 42.9 41 10 22.0 30.1 \n", "924 0.39 -0.5 -0.16 22 38 57.9 70 8 40.0 36.7 \n", "925 0.52 -1.9 -0.62 19 40 47.5 45 9 57.8 44.4 \n", "926 0.29 0.1 0.03 6 28 21.4 52 7 40.4 36.8 \n", "\n", " Att.2 Launch%.1 AvgLen.1 Opp Stp Stp% #OPA #OPA/90 AvgDist Gls \\\n", "922 25 48.0 41.6 21 2 9.5 2 0.67 13.0 0 \n", "923 14 35.7 38.6 14 0 0.0 4 2.00 19.9 0 \n", "924 18 55.6 47.9 21 3 14.3 2 0.67 18.8 0 \n", "925 20 70.0 48.3 25 2 8.0 0 0.00 13.6 0 \n", "926 20 35.0 34.4 16 1 6.3 0 0.00 11.4 0 \n", "\n", " Ast G-PK PK CrdY CrdR Gls.1 Ast.1 G+A G-PK.1 G+A-PK xG npxG \\\n", "922 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "923 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "924 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "925 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "926 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " xA npxG+xA xG.1 xA.1 xG+xA npxG.1 npxG+xA.1 Mn/MP Min% \\\n", "922 0.0 0.0 0.0 0.0 0.0 0.0 0.0 90 100.0 \n", "923 0.0 0.0 0.0 0.0 0.0 0.0 0.0 90 66.7 \n", "924 0.0 0.0 0.0 0.0 0.0 0.0 0.0 90 100.0 \n", "925 0.0 0.0 0.0 0.0 0.0 0.0 0.0 90 100.0 \n", "926 0.0 0.0 0.0 0.0 0.0 0.0 0.0 90 100.0 \n", "\n", " Mn/Start Compl Subs Mn/Sub unSub PPM onG onGA +/- +/-90 \\\n", "922 90.0 3.0 0 NaN 0 0.33 2 7 -5 -1.67 \n", "923 90.0 2.0 0 NaN 0 2.00 2 1 1 0.50 \n", "924 90.0 3.0 0 NaN 0 2.00 4 4 0 0.00 \n", "925 90.0 3.0 0 NaN 0 2.33 10 5 5 1.67 \n", "926 90.0 2.0 0 NaN 0 2.00 3 2 1 0.50 \n", "\n", " On-Off onxG onxGA xG+/- xG+/-90 On-Off.1 2CrdY Fls Fld Off Crs \\\n", "922 NaN 5.9 6.6 -0.8 -0.25 NaN 0 0 0 0 0 \n", "923 -2.5 2.4 1.2 1.2 0.62 -2.59 0 0 0 0 0 \n", "924 NaN 3.8 3.4 0.4 0.14 NaN 0 0 1 0 0 \n", "925 NaN 6.0 2.9 3.1 1.03 NaN 0 0 0 0 0 \n", "926 NaN 1.8 2.6 -0.8 -0.38 NaN 0 0 0 0 0 \n", "\n", " Int TklW PKwon PKcon Recov Won Lost Won% League Name \\\n", "922 0 0 0 0 18 0 0 NaN Big-5-European-Leagues \n", "923 0 0 0 0 8 0 0 NaN Big-5-European-Leagues \n", "924 0 0 0 0 10 0 0 NaN Big-5-European-Leagues \n", "925 0 0 0 0 6 0 0 NaN Big-5-European-Leagues \n", "926 0 0 0 0 7 0 0 NaN Big-5-European-Leagues \n", "\n", " League ID Season \n", "922 Big5 2021-2022 \n", "923 Big5 2021-2022 \n", "924 Big5 2021-2022 \n", "925 Big5 2021-2022 \n", "926 Big5 2021-2022 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the last five rows of the raw DataFrame, df_fbref_goalkeeper_raw\n", "df_fbref_goalkeeper_raw.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html) returns a tuple representing the dimensionality of the DataFrame." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(927, 104)\n" ] } ], "source": [ "# Print the shape of the raw DataFrame, df_fbref_goalkeeper_raw\n", "print(df_fbref_goalkeeper_raw.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The raw DataFrame has:\n", "* 744 observations (rows), each observation represents one individual tourist stranded in Peru, and\n", "* 20 attributes (columns)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[columns](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.columns.html) returns the column labels of the DataFrame." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Player', 'Nation', 'Pos', 'Squad', 'Comp', 'Age', 'Born', 'MP',\n", " 'Starts', 'Min',\n", " ...\n", " 'TklW', 'PKwon', 'PKcon', 'Recov', 'Won', 'Lost', 'Won%', 'League Name',\n", " 'League ID', 'Season'],\n", " dtype='object', length=104)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Features (column names) of the raw DataFrame, df_fbref_goalkeeper_raw\n", "df_fbref_goalkeeper_raw.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html) method returns the data types of each attribute in the DataFrame." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Player object\n", "Nation object\n", "Pos object\n", "Squad object\n", "Comp object\n", " ... \n", "Lost int64\n", "Won% float64\n", "League Name object\n", "League ID object\n", "Season object\n", "Length: 104, dtype: object" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data types of the features of the raw DataFrame, df_fbref_goalkeeper_raw\n", "df_fbref_goalkeeper_raw.dtypes" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Player object\n", "Nation object\n", "Pos object\n", "Squad object\n", "Comp object\n", "Age int64\n", "Born int64\n", "MP int64\n", "Starts int64\n", "Min float64\n", "90s float64\n", "GA int64\n", "GA90 float64\n", "SoTA int64\n", "Saves int64\n", "Save% float64\n", "W int64\n", "D int64\n", "L int64\n", "CS int64\n", "CS% float64\n", "PKatt int64\n", "PKA int64\n", "PKsv int64\n", "PKm int64\n", "Save%.1 float64\n", "Matches object\n", "FK int64\n", "CK int64\n", "OG int64\n", "PSxG float64\n", "PSxG/SoT float64\n", "PSxG+/- float64\n", "/90 float64\n", "Cmp int64\n", "Att int64\n", "Cmp% float64\n", "Att.1 int64\n", "Thr int64\n", "Launch% float64\n", "AvgLen float64\n", "Att.2 int64\n", "Launch%.1 float64\n", "AvgLen.1 float64\n", "Opp int64\n", "Stp int64\n", "Stp% float64\n", "#OPA int64\n", "#OPA/90 float64\n", "AvgDist float64\n", "Gls int64\n", "Ast int64\n", "G-PK int64\n", "PK int64\n", "CrdY int64\n", "CrdR int64\n", "Gls.1 float64\n", "Ast.1 float64\n", "G+A float64\n", "G-PK.1 float64\n", "G+A-PK float64\n", "xG float64\n", "npxG float64\n", "xA float64\n", "npxG+xA float64\n", "xG.1 float64\n", "xA.1 float64\n", "xG+xA float64\n", "npxG.1 float64\n", "npxG+xA.1 float64\n", "Mn/MP int64\n", "Min% float64\n", "Mn/Start float64\n", "Compl float64\n", "Subs int64\n", "Mn/Sub float64\n", "unSub int64\n", "PPM float64\n", "onG int64\n", "onGA int64\n", "+/- int64\n", "+/-90 float64\n", "On-Off float64\n", "onxG float64\n", "onxGA float64\n", "xG+/- float64\n", "xG+/-90 float64\n", "On-Off.1 float64\n", "2CrdY int64\n", "Fls int64\n", "Fld int64\n", "Off int64\n", "Crs int64\n", "Int int64\n", "TklW int64\n", "PKwon int64\n", "PKcon int64\n", "Recov int64\n", "Won int64\n", "Lost int64\n", "Won% float64\n", "League Name object\n", "League ID object\n", "Season object\n", "dtype: object\n" ] } ], "source": [ "# Displays all one hundered and four columns\n", "with pd.option_context('display.max_rows', None, 'display.max_columns', None):\n", " print(df_fbref_goalkeeper_raw.dtypes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) method to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 927 entries, 0 to 926\n", "Columns: 104 entries, Player to Season\n", "dtypes: float64(45), int64(50), object(9)\n", "memory usage: 753.3+ KB\n" ] } ], "source": [ "# Info for the raw DataFrame, df_fbref_goalkeeper_raw\n", "df_fbref_goalkeeper_raw.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html) method to show some useful statistics for each numerical column in the DataFrame." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeBornMPStartsMin90sGAGA90SoTASavesSave%WDLCSCS%PKattPKAPKsvPKmSave%.1FKCKOGPSxGPSxG/SoTPSxG+/-/90CmpAttCmp%Att.1ThrLaunch%AvgLenAtt.2Launch%.1AvgLen.1OppStpStp%#OPA#OPA/90AvgDistGlsAstG-PKPKCrdYCrdRGls.1Ast.1G+AG-PK.1G+A-PKxGnpxGxAnpxG+xAxG.1xA.1xG+xAnpxG.1npxG+xA.1Mn/MPMin%Mn/StartComplSubsMn/SubunSubPPMonGonGA+/-+/-90On-OffonxGonxGAxG+/-xG+/-90On-Off.12CrdYFlsFldOffCrsIntTklWPKwonPKconRecovWonLostWon%
count
mean27.8910461990.58576116.03775615.8640781429.94486515.85631121.9439051.49435765.27400245.61380868.3268945.9309603.9967645.9288034.29665625.0545962.6580372.0647250.4422870.15102516.6715430.4789642.7723840.64293420.9976270.299451-0.269903-0.057584104.208198258.99352840.969348387.74110065.85760546.04583839.524432116.71089562.54570249.192057145.06256711.0970877.26054610.1866240.66010814.3860410.0043150.0269690.0032360.0010790.6796120.0550160.0001830.0022550.0024380.0001510.0024060.0050700.0034520.0391590.0428260.0001290.0022440.0023950.0000760.00235286.20172652.87745489.11773015.4007090.16828542.70192311.3495151.30142421.94929921.957929-0.008630-0.114175-0.07363421.07659121.104746-0.027616-0.081834-0.0704050.0043150.3872712.0463860.0064720.0086300.0366770.0280470.00.20388378.8565260.025890.04315036.842105
std4.7501274.86945514.10680014.2040391271.60759814.13057120.1097170.84721859.97134342.65089813.6163606.5634954.0540346.0308154.64616521.6536402.8979402.3615250.7655250.39551623.6770820.7991223.0519220.98301519.3130340.0854903.3892010.399200104.670484260.01345711.591413365.26034163.36052718.0241688.203695108.68793624.77086413.184107134.36392211.4234835.05105911.0887210.5649022.6482100.0655820.1686110.0568270.0328441.0117260.2281350.0031290.0244640.0246460.0029720.0246300.0581030.0252190.1150100.1197090.0020220.0104920.0107130.0011820.01056112.57578739.2713086.02589414.0103580.42051522.92871212.6631370.76918422.03724120.05089015.2475591.3233271.42768220.17764419.20138911.3986641.0635791.1042720.0655820.7078472.4916720.0802340.0925460.1992240.2059350.00.46528172.3224860.165550.21366446.902132
min17.0000001977.0000001.0000000.0000001.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.010000-15.300000-2.2300000.0000000.0000000.0000000.0000000.0000000.00000015.0000000.0000000.00000012.5000000.0000000.0000000.0000000.0000000.0000003.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000001.0000000.0000007.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000-47.000000-11.250000-11.7500000.0000000.000000-37.700000-19.230000-19.2200000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.000000.0000000.000000
25%24.0000001987.0000003.0000003.000000270.0000003.0000004.0000001.00000011.0000007.00000063.2000001.0000000.5000001.0000000.0000009.1000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000003.4000000.260000-1.900000-0.22000013.00000032.50000035.10000058.00000010.00000033.70000034.00000019.00000045.00000039.90000021.0000001.0000004.7000001.0000000.33000013.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00000089.0000009.50000090.0000003.0000000.00000025.7500000.0000000.8300003.0000004.000000-6.000000-0.670000-0.6500003.2500003.350000-4.300000-0.500000-0.4400000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.00000012.0000000.000000.0000000.000000
50%28.0000001991.00000010.00000010.000000900.00000010.00000014.0000001.41000042.00000029.00000069.2000003.0000002.0000003.0000002.00000024.2000002.0000001.0000000.0000000.0000000.0000000.0000002.0000000.00000013.6000000.290000-0.200000-0.03000061.000000153.00000040.350000236.00000041.00000045.70000038.90000073.00000065.60000049.80000093.0000007.0000007.1000006.0000000.57000014.3000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00000090.00000052.80000090.00000010.0000000.00000045.0000005.0000001.25000013.00000014.000000-1.000000-0.040000-0.01000013.20000014.000000-0.500000-0.110000-0.0200000.0000000.0000001.0000000.0000000.0000000.0000000.0000000.00.00000049.0000000.000000.0000000.000000
75%31.0000001994.00000031.00000031.0000002790.00000031.00000040.0000001.770000121.00000084.00000075.00000010.0000007.00000010.0000007.00000033.3000004.0000003.0000001.0000000.00000026.7250001.0000005.0000001.00000037.9000000.3300001.3000000.140000181.500000450.50000046.400000739.000000118.00000057.10000044.400000213.00000083.20000059.700000272.50000020.0000009.50000016.0000000.93000015.6000000.0000000.0000000.0000000.0000001.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00000090.00000094.70000090.00000031.0000000.00000059.50000022.0000001.75000037.00000040.0000003.0000000.5000000.58000037.20000038.2000002.1000000.3750000.3900000.0000001.0000003.0000000.0000000.0000000.0000000.0000000.00.000000145.0000000.000000.000000100.000000
max42.0000002002.00000038.00000038.0000003420.00000038.00000091.00000011.250000231.000000167.000000100.00000032.00000018.00000029.00000022.000000100.00000015.00000013.0000004.0000003.000000100.0000005.00000015.0000006.00000080.9000000.90000012.3000001.490000488.0000001119.000000100.0000001720.000000269.000000100.00000062.600000451.000000100.00000079.300000569.00000053.00000050.00000059.0000005.63000031.0000001.0000002.0000001.0000001.0000005.0000001.0000000.0800000.5000000.5000000.0800000.5000001.6000000.4000001.0000001.0000000.0500000.1800000.1800000.0300000.18000090.000000100.00000091.00000038.0000003.00000086.00000037.0000003.000000101.00000091.00000076.00000015.00000014.95000089.10000072.80000060.5000006.9800006.2000001.0000005.00000015.0000001.0000001.0000002.0000004.0000000.04.000000252.0000002.000002.000000100.000000
\n", "
" ], "text/plain": [ " Age Born MP Starts Min \\\n", "count 927.000000 927.000000 927.000000 927.000000 925.000000 \n", "mean 27.891046 1990.585761 16.037756 15.864078 1429.944865 \n", "std 4.750127 4.869455 14.106800 14.204039 1271.607598 \n", "min 17.000000 1977.000000 1.000000 0.000000 1.000000 \n", "25% 24.000000 1987.000000 3.000000 3.000000 270.000000 \n", "50% 28.000000 1991.000000 10.000000 10.000000 900.000000 \n", "75% 31.000000 1994.000000 31.000000 31.000000 2790.000000 \n", "max 42.000000 2002.000000 38.000000 38.000000 3420.000000 \n", "\n", " 90s GA GA90 SoTA Saves Save% \\\n", "count 927.000000 927.000000 925.000000 927.000000 927.000000 911.000000 \n", "mean 15.856311 21.943905 1.494357 65.274002 45.613808 68.326894 \n", "std 14.130571 20.109717 0.847218 59.971343 42.650898 13.616360 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 3.000000 4.000000 1.000000 11.000000 7.000000 63.200000 \n", "50% 10.000000 14.000000 1.410000 42.000000 29.000000 69.200000 \n", "75% 31.000000 40.000000 1.770000 121.000000 84.000000 75.000000 \n", "max 38.000000 91.000000 11.250000 231.000000 167.000000 100.000000 \n", "\n", " W D L CS CS% PKatt \\\n", "count 927.000000 927.000000 927.000000 927.000000 903.000000 927.000000 \n", "mean 5.930960 3.996764 5.928803 4.296656 25.054596 2.658037 \n", "std 6.563495 4.054034 6.030815 4.646165 21.653640 2.897940 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 1.000000 0.500000 1.000000 0.000000 9.100000 0.000000 \n", "50% 3.000000 2.000000 3.000000 2.000000 24.200000 2.000000 \n", "75% 10.000000 7.000000 10.000000 7.000000 33.300000 4.000000 \n", "max 32.000000 18.000000 29.000000 22.000000 100.000000 15.000000 \n", "\n", " PKA PKsv PKm Save%.1 FK CK \\\n", "count 927.000000 927.000000 927.000000 622.000000 927.000000 927.000000 \n", "mean 2.064725 0.442287 0.151025 16.671543 0.478964 2.772384 \n", "std 2.361525 0.765525 0.395516 23.677082 0.799122 3.051922 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 1.000000 0.000000 0.000000 0.000000 0.000000 2.000000 \n", "75% 3.000000 1.000000 0.000000 26.725000 1.000000 5.000000 \n", "max 13.000000 4.000000 3.000000 100.000000 5.000000 15.000000 \n", "\n", " OG PSxG PSxG/SoT PSxG+/- /90 Cmp \\\n", "count 927.000000 927.000000 911.000000 927.000000 927.000000 927.000000 \n", "mean 0.642934 20.997627 0.299451 -0.269903 -0.057584 104.208198 \n", "std 0.983015 19.313034 0.085490 3.389201 0.399200 104.670484 \n", "min 0.000000 0.000000 0.010000 -15.300000 -2.230000 0.000000 \n", "25% 0.000000 3.400000 0.260000 -1.900000 -0.220000 13.000000 \n", "50% 0.000000 13.600000 0.290000 -0.200000 -0.030000 61.000000 \n", "75% 1.000000 37.900000 0.330000 1.300000 0.140000 181.500000 \n", "max 6.000000 80.900000 0.900000 12.300000 1.490000 488.000000 \n", "\n", " Att Cmp% Att.1 Thr Launch% \\\n", "count 927.000000 920.000000 927.000000 927.000000 925.000000 \n", "mean 258.993528 40.969348 387.741100 65.857605 46.045838 \n", "std 260.013457 11.591413 365.260341 63.360527 18.024168 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 32.500000 35.100000 58.000000 10.000000 33.700000 \n", "50% 153.000000 40.350000 236.000000 41.000000 45.700000 \n", "75% 450.500000 46.400000 739.000000 118.000000 57.100000 \n", "max 1119.000000 100.000000 1720.000000 269.000000 100.000000 \n", "\n", " AvgLen Att.2 Launch%.1 AvgLen.1 Opp Stp \\\n", "count 925.000000 927.000000 919.000000 919.000000 927.000000 927.000000 \n", "mean 39.524432 116.710895 62.545702 49.192057 145.062567 11.097087 \n", "std 8.203695 108.687936 24.770864 13.184107 134.363922 11.423483 \n", "min 15.000000 0.000000 0.000000 12.500000 0.000000 0.000000 \n", "25% 34.000000 19.000000 45.000000 39.900000 21.000000 1.000000 \n", "50% 38.900000 73.000000 65.600000 49.800000 93.000000 7.000000 \n", "75% 44.400000 213.000000 83.200000 59.700000 272.500000 20.000000 \n", "max 62.600000 451.000000 100.000000 79.300000 569.000000 53.000000 \n", "\n", " Stp% #OPA #OPA/90 AvgDist Gls Ast \\\n", "count 915.000000 927.000000 924.000000 917.000000 927.000000 927.000000 \n", "mean 7.260546 10.186624 0.660108 14.386041 0.004315 0.026969 \n", "std 5.051059 11.088721 0.564902 2.648210 0.065582 0.168611 \n", "min 0.000000 0.000000 0.000000 3.000000 0.000000 0.000000 \n", "25% 4.700000 1.000000 0.330000 13.000000 0.000000 0.000000 \n", "50% 7.100000 6.000000 0.570000 14.300000 0.000000 0.000000 \n", "75% 9.500000 16.000000 0.930000 15.600000 0.000000 0.000000 \n", "max 50.000000 59.000000 5.630000 31.000000 1.000000 2.000000 \n", "\n", " G-PK PK CrdY CrdR Gls.1 Ast.1 \\\n", "count 927.000000 927.000000 927.000000 927.000000 927.000000 927.000000 \n", "mean 0.003236 0.001079 0.679612 0.055016 0.000183 0.002255 \n", "std 0.056827 0.032844 1.011726 0.228135 0.003129 0.024464 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "75% 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 \n", "max 1.000000 1.000000 5.000000 1.000000 0.080000 0.500000 \n", "\n", " G+A G-PK.1 G+A-PK xG npxG xA \\\n", "count 927.000000 927.000000 927.000000 927.000000 927.000000 927.000000 \n", "mean 0.002438 0.000151 0.002406 0.005070 0.003452 0.039159 \n", "std 0.024646 0.002972 0.024630 0.058103 0.025219 0.115010 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "75% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "max 0.500000 0.080000 0.500000 1.600000 0.400000 1.000000 \n", "\n", " npxG+xA xG.1 xA.1 xG+xA npxG.1 npxG+xA.1 \\\n", "count 927.000000 927.000000 927.000000 927.000000 927.000000 927.000000 \n", "mean 0.042826 0.000129 0.002244 0.002395 0.000076 0.002352 \n", "std 0.119709 0.002022 0.010492 0.010713 0.001182 0.010561 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "75% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "max 1.000000 0.050000 0.180000 0.180000 0.030000 0.180000 \n", "\n", " Mn/MP Min% Mn/Start Compl Subs Mn/Sub \\\n", "count 927.000000 927.000000 705.000000 846.000000 927.000000 104.000000 \n", "mean 86.201726 52.877454 89.117730 15.400709 0.168285 42.701923 \n", "std 12.575787 39.271308 6.025894 14.010358 0.420515 22.928712 \n", "min 1.000000 0.000000 7.000000 0.000000 0.000000 0.000000 \n", "25% 89.000000 9.500000 90.000000 3.000000 0.000000 25.750000 \n", "50% 90.000000 52.800000 90.000000 10.000000 0.000000 45.000000 \n", "75% 90.000000 94.700000 90.000000 31.000000 0.000000 59.500000 \n", "max 90.000000 100.000000 91.000000 38.000000 3.000000 86.000000 \n", "\n", " unSub PPM onG onGA +/- +/-90 \\\n", "count 927.000000 927.000000 927.000000 927.000000 927.000000 927.000000 \n", "mean 11.349515 1.301424 21.949299 21.957929 -0.008630 -0.114175 \n", "std 12.663137 0.769184 22.037241 20.050890 15.247559 1.323327 \n", "min 0.000000 0.000000 0.000000 0.000000 -47.000000 -11.250000 \n", "25% 0.000000 0.830000 3.000000 4.000000 -6.000000 -0.670000 \n", "50% 5.000000 1.250000 13.000000 14.000000 -1.000000 -0.040000 \n", "75% 22.000000 1.750000 37.000000 40.000000 3.000000 0.500000 \n", "max 37.000000 3.000000 101.000000 91.000000 76.000000 15.000000 \n", "\n", " On-Off onxG onxGA xG+/- xG+/-90 On-Off.1 \\\n", "count 765.000000 927.000000 927.000000 927.000000 927.000000 765.000000 \n", "mean -0.073634 21.076591 21.104746 -0.027616 -0.081834 -0.070405 \n", "std 1.427682 20.177644 19.201389 11.398664 1.063579 1.104272 \n", "min -11.750000 0.000000 0.000000 -37.700000 -19.230000 -19.220000 \n", "25% -0.650000 3.250000 3.350000 -4.300000 -0.500000 -0.440000 \n", "50% -0.010000 13.200000 14.000000 -0.500000 -0.110000 -0.020000 \n", "75% 0.580000 37.200000 38.200000 2.100000 0.375000 0.390000 \n", "max 14.950000 89.100000 72.800000 60.500000 6.980000 6.200000 \n", "\n", " 2CrdY Fls Fld Off Crs Int \\\n", "count 927.000000 927.000000 927.000000 927.000000 927.000000 927.000000 \n", "mean 0.004315 0.387271 2.046386 0.006472 0.008630 0.036677 \n", "std 0.065582 0.707847 2.491672 0.080234 0.092546 0.199224 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "50% 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 \n", "75% 0.000000 1.000000 3.000000 0.000000 0.000000 0.000000 \n", "max 1.000000 5.000000 15.000000 1.000000 1.000000 2.000000 \n", "\n", " TklW PKwon PKcon Recov Won Lost \\\n", "count 927.000000 927.0 927.000000 927.000000 927.00000 927.000000 \n", "mean 0.028047 0.0 0.203883 78.856526 0.02589 0.043150 \n", "std 0.205935 0.0 0.465281 72.322486 0.16555 0.213664 \n", "min 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 \n", "25% 0.000000 0.0 0.000000 12.000000 0.00000 0.000000 \n", "50% 0.000000 0.0 0.000000 49.000000 0.00000 0.000000 \n", "75% 0.000000 0.0 0.000000 145.000000 0.00000 0.000000 \n", "max 4.000000 0.0 4.000000 252.000000 2.00000 2.000000 \n", "\n", " Won% \n", "count 57.000000 \n", "mean 36.842105 \n", "std 46.902132 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 0.000000 \n", "75% 100.000000 \n", "max 100.000000 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Description of the raw DataFrame, df_fbref_goalkeeper_raw, showing some summary statistics for each numberical column in the DataFrame\n", "df_fbref_goalkeeper_raw.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will check to see how many missing values we have i.e. the number of NULL values in the dataset, and in what features these missing values are located. This can be plotted nicely using the [missingno](https://pypi.org/project/missingno/) library (pip install missingno)." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot visualisation of the missing values for each feature of the raw DataFrame, df_fbref_goalkeeper_raw\n", "msno.matrix(df_fbref_goalkeeper_raw, figsize = (30, 7))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Min 2\n", "GA90 2\n", "Save% 16\n", "CS% 24\n", "Save%.1 305\n", "PSxG/SoT 16\n", "Cmp% 7\n", "Launch% 2\n", "AvgLen 2\n", "Launch%.1 8\n", "AvgLen.1 8\n", "Stp% 12\n", "#OPA/90 3\n", "AvgDist 10\n", "Mn/Start 222\n", "Compl 81\n", "Mn/Sub 823\n", "On-Off 162\n", "On-Off.1 162\n", "Won% 870\n", "dtype: int64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Counts of missing values\n", "null_value_stats = df_fbref_goalkeeper_raw.isnull().sum(axis=0)\n", "null_value_stats[null_value_stats != 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualisation shows us very quickly that there are missing values in the dataset but as this data is scraped, this fine at this stage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "## 4. Data Engineering\n", "Before we answer the questions in the brief through [Exploratory Data Analysis (EDA)](#section5), we'll first need to clean and wrangle the datasets to a form that meet our needs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### 4.1. Outfield Players" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.1.1. Assign Raw DataFrame to New Engineered DataFrame" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# Assign Raw DataFrame to new Engineered DataFrame\n", "df_fbref_outfield = df_fbref_outfield_raw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.1.2. Include League Name and League Country for each team" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data already saved previously\n" ] } ], "source": [ "# Create DataFrame of Home and Away teams\n", "\n", "## All unique Home and Away teams\n", "lst_teams = list(df_fbref_outfield['Squad'].unique())\n", "\n", "\n", "## DataFrames of Home and Away teams\n", "df_teams = pd.DataFrame(lst_teams)\n", "\n", "\n", "## Export DataFrame\n", "if not os.path.exists(os.path.join(data_dir + '/reference/teams/fbref_teams_big5_latest.csv')):\n", " \n", " ### Save latest version\n", " df_teams.to_csv(data_dir + '/reference/teams/fbref_teams_big5_latest.csv', index=None, header=True)\n", "\n", " ### Save a copy to archive folder (dated)\n", " df_teams.to_csv(data_dir + f'/reference/teams/archive/fbref_teams_big5_last_updated_{today}.csv', index=None, header=True) \n", "\n", "else:\n", " df_teams = pd.read_csv(data_dir + '/reference/teams/fbref_teams_big5_latest.csv')\n", " print('Data already saved previously')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Team NameLeague NameTeam Country
0AugsburgBundesligaGermany
1Bayern MunichBundesligaGermany
2DortmundBundesligaGermany
3Eint FrankfurtBundesligaGermany
4FreiburgBundesligaGermany
\n", "
" ], "text/plain": [ " Team Name League Name Team Country\n", "0 Augsburg Bundesliga Germany\n", "1 Bayern Munich Bundesliga Germany\n", "2 Dortmund Bundesliga Germany\n", "3 Eint Frankfurt Bundesliga Germany\n", "4 Freiburg Bundesliga Germany" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_teams.head()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# Join Teams DataFrame that adds the 'league_name' and 'league_country' columns\n", "df_fbref_outfield = pd.merge(df_fbref_outfield, df_teams, left_on='Squad', right_on='Team Name', how='left')\n", "\n", "# Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns\n", "df_fbref_outfield = df_fbref_outfield[df_fbref_outfield.columns.drop(list(df_fbref_outfield.filter(regex='_y')))]\n", "df_fbref_outfield.columns = df_fbref_outfield.columns.str.replace('_x','')" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(12753, 166)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fbref_outfield.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.1.3. String Cleaning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Player" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "# Remove accents and create lowercase name\n", "df_fbref_outfield['Player Lower'] = (df_fbref_outfield['Player']\n", " .str.normalize('NFKD')\n", " .str.encode('ascii', errors='ignore')\n", " .str.decode('utf-8')\n", " .str.lower()\n", " )" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "# First Name Lower\n", "df_fbref_outfield['First Name Lower'] = df_fbref_outfield['Player Lower'].str.rsplit(' ', 0).str[0]\n", "\n", "# Last Name Lower\n", "df_fbref_outfield['Last Name Lower'] = df_fbref_outfield['Player Lower'].str.rsplit(' ', 1).str[-1]\n", "\n", "# First Initial Lower\n", "df_fbref_outfield['First Initial Lower'] = df_fbref_outfield['Player Lower'].astype(str).str[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### League Country lower" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "# Remove accents and create lowercase name\n", "df_fbref_outfield['Team Country Lower'] = (df_fbref_outfield['Team Country']\n", " .str.normalize('NFKD')\n", " .str.encode('ascii', errors='ignore')\n", " .str.decode('utf-8')\n", " .str.lower()\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Countries" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDFull Country NameFIFA CodeIOC CodeISO Code
01AfghanistanAFGAFGAFG
12Åland IslandsALANaNALA
23AlbaniaALBALBALB
34AlgeriaALGALGDZA
45American SamoaASAASAASM
..................
249250Wallis and FutunaWLFNaNWLF
250251Western SaharaESHNaNESH
251252YemenYEMYEMYEM
252253ZambiaZAMZAMZMB
253254ZimbabweZIMZIMZWE
\n", "

254 rows × 5 columns

\n", "
" ], "text/plain": [ " ID Full Country Name FIFA Code IOC Code ISO Code\n", "0 1 Afghanistan AFG AFG AFG\n", "1 2 Åland Islands ALA NaN ALA\n", "2 3 Albania ALB ALB ALB\n", "3 4 Algeria ALG ALG DZA\n", "4 5 American Samoa ASA ASA ASM\n", ".. ... ... ... ... ...\n", "249 250 Wallis and Futuna WLF NaN WLF\n", "250 251 Western Sahara ESH NaN ESH\n", "251 252 Yemen YEM YEM YEM\n", "252 253 Zambia ZAM ZAM ZMB\n", "253 254 Zimbabwe ZIM ZIM ZWE\n", "\n", "[254 rows x 5 columns]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Import reference CSV of country names and codes used to map to countries as part of the string cleaning\n", "df_countries = pd.read_csv(data_dir + '/reference/countries/countries_all.csv')\n", "\n", "df_countries " ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# Extract the nationality code\n", "df_fbref_outfield['Nationality Code'] = df_fbref_outfield['Nation'].str.strip().str[-3:]" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'AFG': 'Afghanistan',\n", " 'ALA': 'Åland Islands',\n", " 'ALB': 'Albania',\n", " 'ALG': 'Algeria',\n", " 'ASA': 'American Samoa',\n", " 'AND': 'Andorra',\n", " 'ANG': 'Angola',\n", " 'AIA': 'Anguilla',\n", " 'ATA': 'Antarctica',\n", " 'ATG': 'Antigua and Barbuda',\n", " 'ARG': 'Argentina',\n", " 'ARM': 'Armenia',\n", " 'ARU': 'Aruba',\n", " 'AUS': 'Australia',\n", " 'AUT': 'Austria',\n", " 'AZE': 'Azerbaijan',\n", " 'BAH': 'The Bahamas',\n", " 'BHR': 'Bahrain',\n", " 'BAN': 'Bangladesh',\n", " 'BRB': 'Barbados',\n", " 'BLR': 'Belarus',\n", " 'BEL': 'Belgium',\n", " 'BLZ': 'Belize',\n", " 'BEN': 'Benin',\n", " 'BER': 'Bermuda',\n", " 'BHU': 'Bhutan',\n", " 'BOL': 'Bolivia',\n", " 'BES': 'Caribbean Netherlands: Bonaire, Sint Eustatius and Saba',\n", " 'BIH': 'Bosnia and Herzegovina',\n", " 'BOT': 'Botswana',\n", " 'BVT': 'Bouvet Island',\n", " 'BRA': 'Brazil',\n", " 'IOT': 'British Indian Ocean Territory',\n", " 'VGB': 'British Virgin Islands',\n", " 'BRU': 'Brunei',\n", " 'BUL': 'Bulgaria',\n", " 'BFA': 'Burkina Faso',\n", " 'BDI': 'Burundi',\n", " 'CAM': 'Cambodia',\n", " 'CMR': 'Cameroon',\n", " 'CAN': 'Canada',\n", " 'CPV': 'Cape Verde',\n", " 'CAY': 'Cayman Islands',\n", " 'CTA': 'Central African Republic',\n", " 'CHA': 'Chad',\n", " 'CHI': 'Chile',\n", " 'CHN': 'China',\n", " 'CXR': 'Christmas Island',\n", " 'CCK': 'Cocos (Keeling) Islands',\n", " 'COL': 'Colombia',\n", " 'COM': 'Comoros',\n", " 'COD': 'Democratic Republic of the Congo',\n", " 'CGO': 'Republic of the Congo',\n", " 'COK': 'Cook Islands',\n", " 'CRC': 'Costa Rica',\n", " 'CIV': \"Côte d'Ivoire\",\n", " 'CRO': 'Croatia',\n", " 'CUB': 'Cuba',\n", " 'CUW': 'Curaçao',\n", " 'CYP': 'Cyprus',\n", " 'CZE': 'Czech Republic',\n", " 'DEN': 'Denmark',\n", " 'DJI': 'Djibouti',\n", " 'DMA': 'Dominica',\n", " 'DOM': 'Dominican Republic',\n", " 'ECU': 'Ecuador',\n", " 'EGY': 'Egypt',\n", " 'SLV': 'El Salvador',\n", " 'ENG': 'England',\n", " 'EQG': 'Equatorial Guinea',\n", " 'ERI': 'Eritrea',\n", " 'EST': 'Estonia',\n", " 'SWZ': 'Eswatini',\n", " 'ETH': 'Ethiopia',\n", " 'FLK': 'Falkland Islands',\n", " 'FRO': 'Faroe Islands',\n", " 'FIJ': 'Fiji',\n", " 'FIN': 'Finland',\n", " 'FRA': 'France',\n", " 'GUF': 'French Guiana',\n", " 'TAH': 'French Polynesia',\n", " 'ATF': 'French Southern and Antarctic Lands',\n", " 'GAB': 'Gabon',\n", " 'GAM': 'Gambia',\n", " 'GEO': 'Georgia',\n", " 'GER': 'Germany',\n", " 'GHA': 'Ghana',\n", " 'GIB': 'Gibraltar',\n", " 'GRE': 'Greece',\n", " 'GRL': 'Greenland',\n", " 'GRN': 'Grenada',\n", " 'GPE': 'Guadeloupe',\n", " 'GUM': 'Guam',\n", " 'GUA': 'Guatemala',\n", " 'GGY': 'Guernsey',\n", " 'GUI': 'Guinea',\n", " 'GNB': 'Guinea-Bissau',\n", " 'GUY': 'Guyana',\n", " 'HAI': 'Haiti',\n", " 'HMD': 'Heard Island and McDonald Islands',\n", " 'HON': 'Honduras',\n", " 'HKG': 'Hong Kong',\n", " 'HUN': 'Hungary',\n", " 'ISL': 'Iceland',\n", " 'IND': 'India',\n", " 'IDN': 'Indonesia',\n", " 'IRN': 'Iran',\n", " 'IRQ': 'Iraq',\n", " 'IRL': 'Ireland',\n", " 'IMN': 'Isle of Man',\n", " 'ISR': 'Israel',\n", " 'ITA': 'Italy',\n", " 'JAM': 'Jamaica',\n", " 'JPN': 'Japan',\n", " 'JEY': 'Jersey',\n", " 'JOR': 'Jordan',\n", " 'KAZ': 'Kazakhstan',\n", " 'KEN': 'Kenya',\n", " 'KIR': 'Kiribati',\n", " 'PRK': 'North Korea',\n", " 'KOR': 'South Korea',\n", " 'KVX': 'Kosovo',\n", " 'KUW': 'Kuwait',\n", " 'KGZ': 'Kyrgyzstan',\n", " 'LAO': 'Laos',\n", " 'LVA': 'Latvia',\n", " 'LBN': 'Lebanon',\n", " 'LES': 'Lesotho',\n", " 'LBR': 'Liberia',\n", " 'LBY': 'Libya',\n", " 'LIE': 'Liechtenstein',\n", " 'LTU': 'Lithuania',\n", " 'LUX': 'Luxembourg',\n", " 'MAC': 'Macau',\n", " 'MAD': 'Madagascar',\n", " 'MWI': 'Malawi',\n", " 'MAS': 'Malaysia',\n", " 'MDV': 'Maldives',\n", " 'MLI': 'Mali',\n", " 'MLT': 'Malta',\n", " 'MHL': 'Marshall Islands',\n", " 'MTQ': 'Martinique',\n", " 'MTN': 'Mauritania',\n", " 'MRI': 'Mauritius',\n", " 'MYT': 'Mayotte',\n", " 'MEX': 'Mexico',\n", " 'FSM': 'Micronesia, Federated States of',\n", " 'MDA': 'Moldova',\n", " 'MON': 'Monaco',\n", " 'MNG': 'Mongolia',\n", " 'MNE': 'Montenegro',\n", " 'MSR': 'Montserrat',\n", " 'MAR': 'Morocco',\n", " 'MOZ': 'Mozambique',\n", " 'MYA': 'Myanmar',\n", " 'NAM': 'Namibia',\n", " 'NRU': 'Nauru',\n", " 'NEP': 'Nepal',\n", " 'NED': 'Netherlands',\n", " 'NCL': 'New Caledonia',\n", " 'NZL': 'New Zealand',\n", " 'NCA': 'Nicaragua',\n", " 'NIG': 'Niger',\n", " 'NGA': 'Nigeria',\n", " 'NIU': 'Niue',\n", " 'NFK': 'Norfolk Island',\n", " 'NIR': 'Northern Ireland',\n", " 'MNP': 'Northern Mariana Islands',\n", " 'MKD': 'North Macedonia',\n", " 'NOR': 'Norway',\n", " 'OMA': 'Oman',\n", " 'PAK': 'Pakistan',\n", " 'PLW': 'Palau',\n", " 'PLE': 'State of Palestine',\n", " 'PAN': 'Panama',\n", " 'PNG': 'Papua New Guinea',\n", " 'PAR': 'Paraguay',\n", " 'PER': 'Peru',\n", " 'PHI': 'Philippines',\n", " 'PCN': 'Pitcairn Islands',\n", " 'POL': 'Poland',\n", " 'POR': 'Portugal',\n", " 'PUR': 'Puerto Rico',\n", " 'QAT': 'Qatar',\n", " 'REU': 'Réunion',\n", " 'ROU': 'Romania',\n", " 'RUS': 'Russian Federation',\n", " 'RWA': 'Rwanda',\n", " 'BLM': 'Saint Barthélemy',\n", " 'SHN': 'Saint Helena, Ascension and Tristan da Cunha',\n", " 'SKN': 'Saint Kitts and Nevis',\n", " 'LCA': 'Saint Lucia',\n", " 'MAF': 'Saint Martin (French part)',\n", " 'SPM': 'Saint Pierre and Miquelon',\n", " 'VIN': 'Saint Vincent and the Grenadines',\n", " 'SAM': 'Samoa',\n", " 'SMR': 'San Marino',\n", " 'STP': 'São Tomé and Príncipe',\n", " 'KSA': 'Saudi Arabia',\n", " 'SCO': 'Scotland',\n", " 'SEN': 'Senegal',\n", " 'SRB': 'Serbia',\n", " 'SEY': 'Seychelles',\n", " 'SLE': 'Sierra Leone',\n", " 'SIN': 'Singapore',\n", " 'SXM': 'Sint Maarten (Dutch part)',\n", " 'SVK': 'Slovakia',\n", " 'SVN': 'Slovenia',\n", " 'SOL': 'Solomon Islands',\n", " 'SOM': 'Somalia',\n", " 'RSA': 'South Africa',\n", " 'SGS': 'South Georgia and the South Sandwich Islands',\n", " 'SSD': 'South Sudan',\n", " 'ESP': 'Spain',\n", " 'SRI': 'Sri Lanka',\n", " 'SDN': 'Sudan',\n", " 'SUR': 'Suriname',\n", " 'SJM': 'Svalbard and Jan Mayen',\n", " 'SWE': 'Sweden',\n", " 'SUI': 'Switzerland',\n", " 'SYR': 'Syria',\n", " 'TPE': 'Taiwan',\n", " 'TJK': 'Tajikistan',\n", " 'TAN': 'Tanzania',\n", " 'THA': 'Thailand',\n", " 'TLS': 'Timor-Leste',\n", " 'TOG': 'Togo',\n", " 'TKL': 'Tokelau',\n", " 'TGA': 'Tonga',\n", " 'TRI': 'Trinidad and Tobago',\n", " 'TUN': 'Tunisia',\n", " 'TUR': 'Turkey',\n", " 'TKM': 'Turkmenistan',\n", " 'TCA': 'Turks and Caicos Islands',\n", " 'TUV': 'Tuvalu',\n", " 'UGA': 'Uganda',\n", " 'UKR': 'Ukraine',\n", " 'UAE': 'United Arab Emirates',\n", " 'GBR': 'United Kingdom',\n", " 'USA': 'United States',\n", " 'UMI': 'United States Minor Outlying Islands',\n", " 'VIR': 'United States Virgin Islands',\n", " 'URU': 'Uruguay',\n", " 'UZB': 'Uzbekistan',\n", " 'VAN': 'Vanuatu',\n", " 'VAT': 'Vatican City State',\n", " 'VEN': 'Venezuela',\n", " 'VIE': 'Vietnam',\n", " 'WAL': 'Wales',\n", " 'WLF': 'Wallis and Futuna',\n", " 'ESH': 'Western Sahara',\n", " 'YEM': 'Yemen',\n", " 'ZAM': 'Zambia',\n", " 'ZIM': 'Zimbabwe'}" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dict(zip(df_countries['FIFA Code'], df_countries['Full Country Name']))\n", "pd.Series(df_countries['FIFA Code'].values,index=df_countries['Full Country Name']).to_dict()\n", "dict_countries = df_countries.set_index('FIFA Code').to_dict()['Full Country Name']\n", "dict_countries " ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "df_fbref_outfield['Nationality Cleaned'] = df_fbref_outfield['Nationality Code'].map(dict_countries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Comp" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "df_fbref_outfield['Comp'] = df_fbref_outfield['Comp'].map(dict_league_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Primary Position" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "df_fbref_outfield['Primary Pos'] = df_fbref_outfield['Pos'].str[:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Position Grouped" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# Map grouped positions to DataFrame\n", "df_fbref_outfield['Position Grouped'] = df_fbref_outfield['Pos'].map(dict_positions_grouped)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Goalkeeper / Outfielder" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "# Separate Goalkeeper and Outfielders\n", "df_fbref_outfield['Outfielder Goalkeeper'] = np.where(df_fbref_outfield['Position Grouped'] == 'Goalkeeper', 'Goalkeeper', 'Outfielder')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.1.4. Converting Data Types\n", "We are required to convert all the columns with their proper data types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Age\n", "The calculated `age` column needs to be converted from a float to an integer, with all null values ignored, using to [astype()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html) method." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "# Redetermine the age using the newly created birth_date column (after formatted to datetime data type)\n", "\n", "## Convert age to string\n", "df_fbref_outfield['Age'] = df_fbref_outfield['Age'].astype(str)\n", "\n", "## Fix 20/21 parsing that's throwing in hyphens and other numbers\n", "df_fbref_outfield['Age'] = df_fbref_outfield['Age'].str[:2]\n", "\n", "## Remove all not numeric values use to_numeric with parameter errors='coerce' - it replaces non numeric to NaNs\n", "df_fbref_outfield['Age'] = pd.to_numeric(df_fbref_outfield['Age'], errors='coerce')\n", "\n", "## Convert floats to integers and leave null values\n", "df_fbref_outfield['Age'] = np.nan_to_num(df_fbref_outfield['Age']).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Born (Birth Year)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "# Convert string to integer\n", "df_fbref_outfield['Born'] = pd.to_numeric(df_fbref_outfield['Born'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.1.5. Export DataFrame" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# Export DataFrame as a CSV file\n", "\n", "## Export a copy to the FBref Engineered Outfield folder called 'latest' (can be overwritten)\n", "df_fbref_outfield.to_csv(data_dir_fbref + '/engineered/outfield/fbref_outfield_player_stats_combined_latest.csv', index=None, header=True)\n", "\n", "## Export a copy to the 'archive' subfolder, including the date\n", "df_fbref_outfield.to_csv(data_dir_fbref + f'/engineered/outfield/archive/fbref_outfield_player_stats_combined_last_updated_{today}.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### 4.2. Goalkeepers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.2.1. Assign Raw DataFrame to new Engineered DataFrame" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "# Assign Raw DataFrame to new Engineered DataFrame\n", "df_fbref_goalkeeper = df_fbref_goalkeeper_raw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.2.2. Include League Name and League Country for each team" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "# Join Teams DataFrame that adds the 'league_name' and 'league_country' columns\n", "df_fbref_goalkeeper = pd.merge(df_fbref_goalkeeper, df_teams, left_on='Squad', right_on='Team Name', how='left')\n", "\n", "# Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns\n", "df_fbref_goalkeeper = df_fbref_goalkeeper[df_fbref_goalkeeper.columns.drop(list(df_fbref_goalkeeper.filter(regex='_y')))]\n", "df_fbref_goalkeeper.columns = df_fbref_goalkeeper.columns.str.replace('_x','')" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(927, 106)" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fbref_goalkeeper.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.2.3. String Cleaning" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "# Remove accents and create lowercase name\n", "df_fbref_goalkeeper['Player Lower'] = (df_fbref_goalkeeper['Player']\n", " .str.normalize('NFKD')\n", " .str.encode('ascii', errors='ignore')\n", " .str.decode('utf-8')\n", " .str.lower()\n", " )" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "# First Name Lower\n", "df_fbref_goalkeeper['First Name Lower'] = df_fbref_goalkeeper['Player Lower'].str.rsplit(' ', 0).str[0]\n", "\n", "# Last Name Lower\n", "df_fbref_goalkeeper['Last Name Lower'] = df_fbref_goalkeeper['Player Lower'].str.rsplit(' ', 1).str[-1]\n", "\n", "# First Initial Lower\n", "df_fbref_goalkeeper['First Initial Lower'] = df_fbref_goalkeeper['Player Lower'].astype(str).str[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### League Country lower" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "# Remove accents and create lowercase name\n", "df_fbref_goalkeeper['Team Country Lower'] = (df_fbref_goalkeeper['Team Country']\n", " .str.normalize('NFKD')\n", " .str.encode('ascii', errors='ignore')\n", " .str.decode('utf-8')\n", " .str.lower()\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Countries" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "# Extract the nationality code\n", "df_fbref_goalkeeper['Nationality Code'] = df_fbref_goalkeeper['Nation'].str.strip().str[-3:]" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "df_fbref_goalkeeper['Nationality Cleaned'] = df_fbref_goalkeeper['Nationality Code'].map(dict_countries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Comp" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "df_fbref_goalkeeper['Comp'] = df_fbref_goalkeeper['Comp'].map(dict_league_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Primary Position" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "df_fbref_goalkeeper['Primary Pos'] = df_fbref_goalkeeper['Pos'].str[:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Position Grouped" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "# Map grouped positions to DataFrame\n", "df_fbref_goalkeeper['Position Grouped'] = df_fbref_goalkeeper['Pos'].map(dict_positions_grouped)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.2.4. Converting Data Types\n", "We are required to convert all the columns with their proper data types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Age\n", "The calculated `age` column needs to be converted from a float to an integer, with all null values ignored, using to [astype()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html) method." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "# Redetermine the age using the newly created birth_date column (after formatted to datetime data type)\n", "\n", "## Convert age to string\n", "df_fbref_goalkeeper['Age'] = df_fbref_goalkeeper['Age'].astype(str)\n", "\n", "## Fix 20/21 parsing that's throwing in hyphens and other numbers\n", "df_fbref_goalkeeper['Age'] = df_fbref_goalkeeper['Age'].str[:2]\n", "\n", "## Remove all not numeric values use to_numeric with parameter errors='coerce' - it replaces non numeric to NaNs\n", "df_fbref_goalkeeper['Age'] = pd.to_numeric(df_fbref_goalkeeper['Age'], errors='coerce')\n", "\n", "## Convert floats to integers and leave null values\n", "df_fbref_goalkeeper['Age'] = np.nan_to_num(df_fbref_goalkeeper['Age']).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Born (Birth Year)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "# Convert string to integer\n", "df_fbref_goalkeeper['Born'] = pd.to_numeric(df_fbref_goalkeeper['Born'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.2.5. Export DataFrame" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "# Export DataFrame as a CSV file\n", "\n", "## Export a copy to the FBref Engineered Outfield folder called 'latest' (can be overwritten)\n", "df_fbref_goalkeeper.to_csv(data_dir_fbref + '/engineered/goalkeeper/fbref_goalkeeper_stats_combined_latest.csv', index=None, header=True)\n", "\n", "## Export a copy to the 'archive' subfolder, including the date\n", "df_fbref_goalkeeper.to_csv(data_dir_fbref + f'/engineered/goalkeeper/archive/fbref_goalkeeper_stats_combined_last_updated_{today}.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### 4.3. Players and Goalkeepers Combined" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.3.1. Concatenate DataFrames\n", "Union together both datasets." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "df_fbref_outfield_goakeeper = pd.concat([df_fbref_outfield, df_fbref_goalkeeper])" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(12753, 176)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fbref_outfield.shape" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(927, 115)" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fbref_goalkeeper.shape" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(13680, 205)" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fbref_outfield_goakeeper.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.3.2. Dedupe Columns" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "# Drop duplicate columns\n", "df_fbref_outfield_goakeeper = df_fbref_outfield_goakeeper.loc[:,~df_fbref_outfield_goakeeper.columns.duplicated()]" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "# Remove duplicate columns after join (contain '_y') and remove '_x' suffix from kept columns\n", "df_fbref_outfield_goakeeper = df_fbref_outfield_goakeeper[df_fbref_outfield_goakeeper.columns.drop(list(df_fbref_outfield_goakeeper.filter(regex='_y')))]\n", "df_fbref_outfield_goakeeper.columns = df_fbref_outfield_goakeeper.columns.str.replace('_x','')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.3.3. Dedupe Rows" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "df_fbref_outfield_goakeeper = df_fbref_outfield_goakeeper.drop_duplicates()" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(13680, 205)" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fbref_outfield_goakeeper.shape" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlayerNationPosSquadCompAgeBornMPStartsMin90sGlsAstG-PKPKPKattCrdYCrdRGls.1Ast.1G+AG-PK.1G+A-PKxGnpxGxAnpxG+xAxG.1xA.1xG+xAnpxG.1npxG+xA.1MatchesShSoTSoT%Sh/90SoT/90G/ShG/SoTDistFKnpxG/ShG-xGnp:G-xGCmpAttCmp%TotDistPrgDistCmp.1Att.1Cmp%.1Cmp.2Att.2Cmp%.2Cmp.3Att.3Cmp%.3A-xAKP1/3PPACrsPAProgLiveDeadTBPressSwCrsCKInOutStrGroundLowHighLeftRightHeadTIOtherOffOut.1IntBlocksSCASCA90PassLivePassDeadDribFldDefGCAGCA90PassLive.1PassDead.1Drib.1Sh.1Fld.1Def.1TklTklWDef 3rdMid 3rdAtt 3rdTkl.1Tkl%PastSucc%Def 3rd.1Mid 3rd.1Att 3rd.1ShSvPassTkl+IntClrErrTouchesDef PenAtt PenSucc%#PlMegsCarriesCPAMisDisTargRecRec%Prog.1Mn/MPMin%Mn/StartComplSubsMn/SubunSubPPMonGonGA+/-+/-90On-OffonxGonxGAxG+/-xG+/-90On-Off.12CrdYFlsPKwonPKconOGRecovWonLostWon%League NameLeague IDSeasonTeam NameTeam CountryPlayer LowerFirst Name LowerLast Name LowerFirst Initial LowerTeam Country LowerNationality CodeNationality CleanedPrimary PosPosition GroupedOutfielder GoalkeeperGAGA90SoTASavesSave%WDLCSCS%PKAPKsvPKmSave%.1PSxGPSxG/SoTPSxG+/-/90ThrLaunch%AvgLenLaunch%.1AvgLen.1OppStpStp%#OPA#OPA/90AvgDist
0Aaron Cresswelleng ENGDFWest HamPremier League271989.036353069.034.113100700.030.090.120.030.120.80.82.83.60.020.080.100.020.10Matches21.06.028.60.620.180.050.1728.18.00.040.20.21224.01708.071.723519.010212.0560.0623.089.9472.0587.080.4183.0449.040.80.235.0117.021.014.096.01343.0365.01.0222.083.093.067.035.015.09.0893.0293.0522.01329.078.059.0210.05.015.044.039.052.062.01.8235.021.01.03.00.09.00.266.03.00.00.00.00.038.018.015.018.05.017.053.115.0115.032.1181.0123.054.00.038.090.0133.00.02050.0125.017.033.37.00.01071.02.018.019.01171.01094.093.431.08589.7NaN30.01NaN11.1445.060.0-15.0-0.440.8438.051.5-13.5-0.401.090.0200.00.00.0277.070.057.055.1Big-5-European-LeaguesBig52017-2018West HamEnglandaaron cresswellaaroncresswellaenglandENGEnglandDFDefenderOutfielderNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1Aaron Huntde GERMF,FWHamburger SVBundeliga301986.028262081.023.132211100.130.090.220.090.172.82.15.67.60.120.230.350.090.32Matches27.06.022.21.170.260.070.3323.410.00.080.2-0.1883.01229.071.816889.05315.0406.0480.084.6292.0376.077.7165.0303.054.5-3.665.083.031.05.097.0977.0252.011.0245.067.066.0123.035.041.014.0672.0236.0321.0999.0137.042.023.09.05.029.029.049.0102.04.2554.043.01.02.01.06.00.255.01.00.00.00.00.030.022.012.016.02.05.013.532.0135.027.9102.0261.0121.00.028.044.021.00.01475.028.068.058.323.04.0892.07.045.042.01176.0893.075.9178.07468.0NaN14.02NaN01.0722.034.0-12.0-0.520.5827.031.3-4.3-0.180.940.0270.00.00.0213.022.037.037.3Big-5-European-LeaguesBig52017-2018Hamburger SVGermanyaaron huntaaronhuntagermanyGERGermanyMFMidfielderOutfielderNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2Aaron Lennoneng ENGMFBurnleyPremier League301987.014131118.012.402000200.000.160.160.000.160.60.61.42.00.050.110.160.050.16Matches10.04.040.00.810.320.000.0016.60.00.06-0.6-0.6204.0294.069.43223.0887.0116.0142.081.768.092.073.917.034.050.00.68.011.013.05.022.0289.05.00.061.05.019.00.00.00.00.0193.051.050.027.0250.07.04.03.00.09.08.030.018.01.4512.00.01.01.00.03.00.242.00.00.01.00.00.018.010.06.011.01.04.019.017.061.026.374.0102.056.00.024.031.09.00.0424.019.036.048.012.02.0290.012.09.025.0353.0259.073.441.08032.7NaN6.01NaN01.4317.015.02.00.160.3613.815.4-1.5-0.120.490.0120.00.00.080.07.015.031.8Big-5-European-LeaguesBig52017-2018BurnleyEnglandaaron lennonaaronlennonaenglandENGEnglandMFMidfielderOutfielderNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3Aaron Lennoneng ENGFW,MFEvertonPremier League301987.0159793.08.800000000.000.000.000.000.000.30.30.50.80.040.050.090.040.09Matches4.01.025.00.450.110.000.0014.80.00.08-0.3-0.3152.0214.071.02286.0672.092.0115.080.053.069.076.85.013.038.5-0.55.09.03.02.017.0199.015.00.049.02.08.00.00.00.00.0129.047.038.029.0159.010.014.00.01.03.07.015.016.01.8211.00.01.02.00.04.00.452.00.00.01.01.00.018.010.09.07.02.05.025.015.038.019.349.0102.046.00.018.025.09.00.0322.07.022.035.08.01.0186.08.09.017.0288.0195.067.733.05323.2NaN2.06NaN01.2715.014.01.00.110.6312.013.7-1.6-0.190.130.092.00.00.050.06.012.033.3Big-5-European-LeaguesBig52017-2018EvertonEnglandaaron lennonaaronlennonaenglandENGEnglandFWForwardOutfielderNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4Aaron Mooyau AUSMFHuddersfieldPremier League261990.036343067.034.143311400.120.090.210.090.182.61.83.14.90.080.090.170.050.14Matches28.06.021.40.820.180.110.5022.03.00.061.41.21561.02067.075.527911.07921.0783.0876.089.4540.0678.079.6196.0397.049.4-0.148.0167.027.09.0163.01897.0170.01.0422.0100.085.077.035.021.05.01293.0283.0491.0507.01444.077.05.04.06.038.060.060.073.02.1454.016.00.01.02.05.00.154.01.00.00.00.00.0105.055.038.054.013.032.044.440.0193.029.5192.0355.0107.02.052.0151.070.00.02496.065.032.053.226.00.01543.06.033.060.01710.01540.090.185.08589.7NaN29.02NaN00.9425.052.0-27.0-0.79-0.0328.749.8-21.1-0.62-0.010.0260.00.00.0455.035.042.045.5Big-5-European-LeaguesBig52017-2018HuddersfieldEnglandaaron mooyaaronmooyaenglandAUSAustraliaMFMidfielderOutfielderNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " Player Nation Pos Squad Comp Age Born \\\n", "0 Aaron Cresswell eng ENG DF West Ham Premier League 27 1989.0 \n", "1 Aaron Hunt de GER MF,FW Hamburger SV Bundeliga 30 1986.0 \n", "2 Aaron Lennon eng ENG MF Burnley Premier League 30 1987.0 \n", "3 Aaron Lennon eng ENG FW,MF Everton Premier League 30 1987.0 \n", "4 Aaron Mooy au AUS MF Huddersfield Premier League 26 1990.0 \n", "\n", " MP Starts Min 90s Gls Ast G-PK PK PKatt CrdY CrdR Gls.1 \\\n", "0 36 35 3069.0 34.1 1 3 1 0 0 7 0 0.03 \n", "1 28 26 2081.0 23.1 3 2 2 1 1 1 0 0.13 \n", "2 14 13 1118.0 12.4 0 2 0 0 0 2 0 0.00 \n", "3 15 9 793.0 8.8 0 0 0 0 0 0 0 0.00 \n", "4 36 34 3067.0 34.1 4 3 3 1 1 4 0 0.12 \n", "\n", " Ast.1 G+A G-PK.1 G+A-PK xG npxG xA npxG+xA xG.1 xA.1 xG+xA \\\n", "0 0.09 0.12 0.03 0.12 0.8 0.8 2.8 3.6 0.02 0.08 0.10 \n", "1 0.09 0.22 0.09 0.17 2.8 2.1 5.6 7.6 0.12 0.23 0.35 \n", "2 0.16 0.16 0.00 0.16 0.6 0.6 1.4 2.0 0.05 0.11 0.16 \n", "3 0.00 0.00 0.00 0.00 0.3 0.3 0.5 0.8 0.04 0.05 0.09 \n", "4 0.09 0.21 0.09 0.18 2.6 1.8 3.1 4.9 0.08 0.09 0.17 \n", "\n", " npxG.1 npxG+xA.1 Matches Sh SoT SoT% Sh/90 SoT/90 G/Sh G/SoT \\\n", "0 0.02 0.10 Matches 21.0 6.0 28.6 0.62 0.18 0.05 0.17 \n", "1 0.09 0.32 Matches 27.0 6.0 22.2 1.17 0.26 0.07 0.33 \n", "2 0.05 0.16 Matches 10.0 4.0 40.0 0.81 0.32 0.00 0.00 \n", "3 0.04 0.09 Matches 4.0 1.0 25.0 0.45 0.11 0.00 0.00 \n", "4 0.05 0.14 Matches 28.0 6.0 21.4 0.82 0.18 0.11 0.50 \n", "\n", " Dist FK npxG/Sh G-xG np:G-xG Cmp Att Cmp% TotDist PrgDist \\\n", "0 28.1 8.0 0.04 0.2 0.2 1224.0 1708.0 71.7 23519.0 10212.0 \n", "1 23.4 10.0 0.08 0.2 -0.1 883.0 1229.0 71.8 16889.0 5315.0 \n", "2 16.6 0.0 0.06 -0.6 -0.6 204.0 294.0 69.4 3223.0 887.0 \n", "3 14.8 0.0 0.08 -0.3 -0.3 152.0 214.0 71.0 2286.0 672.0 \n", "4 22.0 3.0 0.06 1.4 1.2 1561.0 2067.0 75.5 27911.0 7921.0 \n", "\n", " Cmp.1 Att.1 Cmp%.1 Cmp.2 Att.2 Cmp%.2 Cmp.3 Att.3 Cmp%.3 A-xA \\\n", "0 560.0 623.0 89.9 472.0 587.0 80.4 183.0 449.0 40.8 0.2 \n", "1 406.0 480.0 84.6 292.0 376.0 77.7 165.0 303.0 54.5 -3.6 \n", "2 116.0 142.0 81.7 68.0 92.0 73.9 17.0 34.0 50.0 0.6 \n", "3 92.0 115.0 80.0 53.0 69.0 76.8 5.0 13.0 38.5 -0.5 \n", "4 783.0 876.0 89.4 540.0 678.0 79.6 196.0 397.0 49.4 -0.1 \n", "\n", " KP 1/3 PPA CrsPA Prog Live Dead TB Press Sw Crs \\\n", "0 35.0 117.0 21.0 14.0 96.0 1343.0 365.0 1.0 222.0 83.0 93.0 \n", "1 65.0 83.0 31.0 5.0 97.0 977.0 252.0 11.0 245.0 67.0 66.0 \n", "2 8.0 11.0 13.0 5.0 22.0 289.0 5.0 0.0 61.0 5.0 19.0 \n", "3 5.0 9.0 3.0 2.0 17.0 199.0 15.0 0.0 49.0 2.0 8.0 \n", "4 48.0 167.0 27.0 9.0 163.0 1897.0 170.0 1.0 422.0 100.0 85.0 \n", "\n", " CK In Out Str Ground Low High Left Right Head TI \\\n", "0 67.0 35.0 15.0 9.0 893.0 293.0 522.0 1329.0 78.0 59.0 210.0 \n", "1 123.0 35.0 41.0 14.0 672.0 236.0 321.0 999.0 137.0 42.0 23.0 \n", "2 0.0 0.0 0.0 0.0 193.0 51.0 50.0 27.0 250.0 7.0 4.0 \n", "3 0.0 0.0 0.0 0.0 129.0 47.0 38.0 29.0 159.0 10.0 14.0 \n", "4 77.0 35.0 21.0 5.0 1293.0 283.0 491.0 507.0 1444.0 77.0 5.0 \n", "\n", " Other Off Out.1 Int Blocks SCA SCA90 PassLive PassDead Drib \\\n", "0 5.0 15.0 44.0 39.0 52.0 62.0 1.82 35.0 21.0 1.0 \n", "1 9.0 5.0 29.0 29.0 49.0 102.0 4.25 54.0 43.0 1.0 \n", "2 3.0 0.0 9.0 8.0 30.0 18.0 1.45 12.0 0.0 1.0 \n", "3 0.0 1.0 3.0 7.0 15.0 16.0 1.82 11.0 0.0 1.0 \n", "4 4.0 6.0 38.0 60.0 60.0 73.0 2.14 54.0 16.0 0.0 \n", "\n", " Fld Def GCA GCA90 PassLive.1 PassDead.1 Drib.1 Sh.1 Fld.1 Def.1 \\\n", "0 3.0 0.0 9.0 0.26 6.0 3.0 0.0 0.0 0.0 0.0 \n", "1 2.0 1.0 6.0 0.25 5.0 1.0 0.0 0.0 0.0 0.0 \n", "2 1.0 0.0 3.0 0.24 2.0 0.0 0.0 1.0 0.0 0.0 \n", "3 2.0 0.0 4.0 0.45 2.0 0.0 0.0 1.0 1.0 0.0 \n", "4 1.0 2.0 5.0 0.15 4.0 1.0 0.0 0.0 0.0 0.0 \n", "\n", " Tkl TklW Def 3rd Mid 3rd Att 3rd Tkl.1 Tkl% Past Succ % \\\n", "0 38.0 18.0 15.0 18.0 5.0 17.0 53.1 15.0 115.0 32.1 \n", "1 30.0 22.0 12.0 16.0 2.0 5.0 13.5 32.0 135.0 27.9 \n", "2 18.0 10.0 6.0 11.0 1.0 4.0 19.0 17.0 61.0 26.3 \n", "3 18.0 10.0 9.0 7.0 2.0 5.0 25.0 15.0 38.0 19.3 \n", "4 105.0 55.0 38.0 54.0 13.0 32.0 44.4 40.0 193.0 29.5 \n", "\n", " Def 3rd.1 Mid 3rd.1 Att 3rd.1 ShSv Pass Tkl+Int Clr Err Touches \\\n", "0 181.0 123.0 54.0 0.0 38.0 90.0 133.0 0.0 2050.0 \n", "1 102.0 261.0 121.0 0.0 28.0 44.0 21.0 0.0 1475.0 \n", "2 74.0 102.0 56.0 0.0 24.0 31.0 9.0 0.0 424.0 \n", "3 49.0 102.0 46.0 0.0 18.0 25.0 9.0 0.0 322.0 \n", "4 192.0 355.0 107.0 2.0 52.0 151.0 70.0 0.0 2496.0 \n", "\n", " Def Pen Att Pen Succ% #Pl Megs Carries CPA Mis Dis Targ \\\n", "0 125.0 17.0 33.3 7.0 0.0 1071.0 2.0 18.0 19.0 1171.0 \n", "1 28.0 68.0 58.3 23.0 4.0 892.0 7.0 45.0 42.0 1176.0 \n", "2 19.0 36.0 48.0 12.0 2.0 290.0 12.0 9.0 25.0 353.0 \n", "3 7.0 22.0 35.0 8.0 1.0 186.0 8.0 9.0 17.0 288.0 \n", "4 65.0 32.0 53.2 26.0 0.0 1543.0 6.0 33.0 60.0 1710.0 \n", "\n", " Rec Rec% Prog.1 Mn/MP Min% Mn/Start Compl Subs Mn/Sub unSub \\\n", "0 1094.0 93.4 31.0 85 89.7 NaN 30.0 1 NaN 1 \n", "1 893.0 75.9 178.0 74 68.0 NaN 14.0 2 NaN 0 \n", "2 259.0 73.4 41.0 80 32.7 NaN 6.0 1 NaN 0 \n", "3 195.0 67.7 33.0 53 23.2 NaN 2.0 6 NaN 0 \n", "4 1540.0 90.1 85.0 85 89.7 NaN 29.0 2 NaN 0 \n", "\n", " PPM onG onGA +/- +/-90 On-Off onxG onxGA xG+/- xG+/-90 \\\n", "0 1.14 45.0 60.0 -15.0 -0.44 0.84 38.0 51.5 -13.5 -0.40 \n", "1 1.07 22.0 34.0 -12.0 -0.52 0.58 27.0 31.3 -4.3 -0.18 \n", "2 1.43 17.0 15.0 2.0 0.16 0.36 13.8 15.4 -1.5 -0.12 \n", "3 1.27 15.0 14.0 1.0 0.11 0.63 12.0 13.7 -1.6 -0.19 \n", "4 0.94 25.0 52.0 -27.0 -0.79 -0.03 28.7 49.8 -21.1 -0.62 \n", "\n", " On-Off.1 2CrdY Fls PKwon PKcon OG Recov Won Lost Won% \\\n", "0 1.09 0.0 20 0.0 0.0 0.0 277.0 70.0 57.0 55.1 \n", "1 0.94 0.0 27 0.0 0.0 0.0 213.0 22.0 37.0 37.3 \n", "2 0.49 0.0 12 0.0 0.0 0.0 80.0 7.0 15.0 31.8 \n", "3 0.13 0.0 9 2.0 0.0 0.0 50.0 6.0 12.0 33.3 \n", "4 -0.01 0.0 26 0.0 0.0 0.0 455.0 35.0 42.0 45.5 \n", "\n", " League Name League ID Season Team Name Team Country \\\n", "0 Big-5-European-Leagues Big5 2017-2018 West Ham England \n", "1 Big-5-European-Leagues Big5 2017-2018 Hamburger SV Germany \n", "2 Big-5-European-Leagues Big5 2017-2018 Burnley England \n", "3 Big-5-European-Leagues Big5 2017-2018 Everton England \n", "4 Big-5-European-Leagues Big5 2017-2018 Huddersfield England \n", "\n", " Player Lower First Name Lower Last Name Lower First Initial Lower \\\n", "0 aaron cresswell aaron cresswell a \n", "1 aaron hunt aaron hunt a \n", "2 aaron lennon aaron lennon a \n", "3 aaron lennon aaron lennon a \n", "4 aaron mooy aaron mooy a \n", "\n", " Team Country Lower Nationality Code Nationality Cleaned Primary Pos \\\n", "0 england ENG England DF \n", "1 germany GER Germany MF \n", "2 england ENG England MF \n", "3 england ENG England FW \n", "4 england AUS Australia MF \n", "\n", " Position Grouped Outfielder Goalkeeper GA GA90 SoTA Saves Save% W \\\n", "0 Defender Outfielder NaN NaN NaN NaN NaN NaN \n", "1 Midfielder Outfielder NaN NaN NaN NaN NaN NaN \n", "2 Midfielder Outfielder NaN NaN NaN NaN NaN NaN \n", "3 Forward Outfielder NaN NaN NaN NaN NaN NaN \n", "4 Midfielder Outfielder NaN NaN NaN NaN NaN NaN \n", "\n", " D L CS CS% PKA PKsv PKm Save%.1 PSxG PSxG/SoT PSxG+/- /90 \\\n", "0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " Thr Launch% AvgLen Launch%.1 AvgLen.1 Opp Stp Stp% #OPA #OPA/90 \\\n", "0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " AvgDist \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN " ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_fbref_outfield_goakeeper.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.3.4. Reorder Columns" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "# Do this later" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 4.3.5. Export DataFrame" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "# Export DataFrame as a CSV file\n", "\n", "## Export a copy to the FBref Engineered Outfield-Goalkeeper folder called 'latest' (can be overwritten)\n", "df_fbref_outfield_goakeeper.to_csv(data_dir_fbref + '/engineered/outfield-goalkeeper-combined/fbref_outfield_player_goalkeeper_stats_combined_latest.csv', index=None, header=True)\n", "\n", "## Export a copy to the 'archive' subfolder, including the date\n", "df_fbref_outfield_goakeeper.to_csv(data_dir_fbref + f'/engineered/outfield-goalkeeper-combined/archive/fbref_outfield_player_goalkeeper_stats_combined_last_updated_{today}.csv', index=None, header=True)\n", "\n", "## Export a copy to the Export folder (can be overwritten)\n", "df_fbref_outfield_goakeeper.to_csv(data_dir + '/export/fbref_players_big5_latest.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "## 5. Summary\n", "This notebook scrapes player performance data from [StatsBomb](https://statsbomb.com/) via [FBref](https://fbref.com/en/), using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames, [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.\n", "\n", "With this notebook we now have aggregated player performance data for players in the 'Big 5' European leagues for the 17/18-present seasons." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "## 6. Next Steps\n", "This data is now ready to be exported and analysed in further Jupyter notebooks or Tableau.\n", "\n", "The Data Engineering subfolder in GitHub can be found [here](https://github.com/eddwebster/football_analytics/tree/master/notebooks/B\\)%20Data%20Engineering) and a static version of the record linkage notebook in which the FBref data is joined to TransferMarkt data can be found [here](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/B%29%20Data%20Engineering/Record%20Linkage%20of%20FBref%20and%20TransferMarkt%20Datasets.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "## 7. References\n", "\n", "#### Data and Web Scraping\n", "* [FBref](https://fbref.com/) for the data to scrape\n", "* FBref statement for using StatsBomb's data: https://fbref.com/en/statsbomb/\n", "* [StatsBomb](https://statsbomb.com/) providing the data to FBref\n", "* [FBref_EPL GitHub repository](https://github.com/chmartin/FBref_EPL) by [chmartin](https://github.com/chmartin) for the original web scraping code\n", "* [Scrape-FBref-data GitHub repository](https://github.com/parth1902/Scrape-FBref-data) by [parth1902](https://github.com/parth1902) for the revised web scraping code for the new FBref metrics\n", "\n", "\n", "#### Countries\n", "* [Comparison of alphabetic country codes Wiki](https://en.wikipedia.org/wiki/Comparison_of_alphabetic_country_codes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "***Visit my website [eddwebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Back to the top](#top)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "oldHeight": 642, "position": { "height": "664px", "left": "1119px", "right": "20px", "top": "-7px", "width": "489px" }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "varInspector_section_display": "block", "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }