{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Engineering of StrataBet Data\n", "##### Notebook to engineer the event data provided by [StrataBet]( http://www.stratagem.co/)\n", "\n", "### By [Edd Webster](https://www.twitter.com/eddwebster)\n", "Notebook first written: 13/12/2020
\n", "Notebook last updated: 26/12/2020" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![title](../../img/stratabet_logo.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Click [here](#section5) to jump straight to the Exploratory Data Analysis section and skip the [Task Brief](#section2), [Data Sources](#section3), and [Data Engineering](#section4) sections. Or click [here](#section6) to jump straight to the Conclusion.\n", "\n", "This article was written with the aid of StrataData, which is property of [Stratagem Technologies](http://www.stratagem.co/). StrataData powers the [StrataBet Sports Trading Platform](http://www.stratabet.com/), in addition to [StrataBet Premium Recommendations](http://app.stratabet.com/recommendations)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "This notebook engineers [StrataBet]( http://www.stratagem.co/) data for football matches in a variety of European leagues during the 16/17 and 17/18 seasons, using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames.\n", "\n", "For more information about this notebook and the author, I'm available through all the following channels:\n", "* [eddwebster.com](https://www.eddwebster.com/);\n", "* edd.j.webster@gmail.com;\n", "* [@eddwebster](https://www.twitter.com/eddwebster);\n", "* [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);\n", "* [github/eddwebster](https://github.com/eddwebster/);\n", "* [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster);\n", "* [kaggle.com/eddwebster](https://www.kaggle.com/eddwebster); and\n", "* [hackerrank.com/eddwebster](https://www.hackerrank.com/eddwebster).\n", "\n", "![title](../../img/fifa21eddwebsterbanner.png)\n", "\n", "The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/fifa-league) and a static version of this notebook can be found [here](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/B%29%20Data%20Engineering/Opta%20%23mcfcanalytics%20PL%202011-2012.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook Contents\n", "1. [Notebook Dependencies](#section1)
\n", "2. [Project Brief](#section2)
\n", "3. [Data Sources](#section3)
\n", " 1. [Introduction](#section3.1)
\n", " 2. [Data Dictionary](#section3.2)
\n", " 3. [Creating the DataFrame](#section3.3)
\n", " 4. [Initial Data Handling](#section3.4)
\n", " 5. [Export the Raw DataFrame](#section3.5)
\n", "4. [Data Engineering](#section4)
\n", " 1. [Introduction](#section4.1)
\n", " 2. [Columns of Interest](#section4.2)
\n", " 3. [String Cleaning](#section4.3)
\n", " 4. [Converting Data Types](#section4.4)
\n", " 5. [Export the Engineered DataFrame](#section4.5)
\n", "5. [Exploratory Data Analysis (EDA)](#section5)
\n", " 1. [...](#section5.1)
\n", " 2. [...](#section5.2)
\n", " 3. [...](#section5.3)
\n", "6. [Summary](#section6)
\n", "7. [Next Steps](#section7)
\n", "8. [Bibliography](#section8)
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Notebook Dependencies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:\n", "* [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;\n", "* [`NumPy`](http://www.numpy.org/) for multidimensional array computing;\n", "* [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation; and\n", "* [`matplotlib`](https://matplotlib.org/contents.html?v=20200411155018) for data visualisations;\n", "\n", "All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import Libraries and Modules" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setup Complete\n" ] } ], "source": [ "# Python ≥3.5 (ideally)\n", "import platform\n", "import sys, getopt\n", "assert sys.version_info >= (3, 5)\n", "import csv\n", "\n", "# Import Dependencies\n", "%matplotlib inline\n", "\n", "# Math Operations\n", "import numpy as np\n", "from math import pi\n", "\n", "# Datetime\n", "import datetime\n", "from datetime import date\n", "import time\n", "\n", "# Data Preprocessing\n", "import pandas as pd # version 1.0.3\n", "import os # used to read the csv filenames\n", "import re\n", "import random\n", "from io import BytesIO\n", "from pathlib import Path\n", "\n", "# Reading directories\n", "import glob\n", "import os\n", "\n", "# Working with JSON\n", "import json\n", "from pandas.io.json import json_normalize\n", "from ast import literal_eval\n", "\n", "# Data Visualisation\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "plt.style.use('seaborn-whitegrid')\n", "import missingno as msno # visually display missing data\n", "\n", "# Display in Jupyter\n", "from IPython.display import Image, YouTubeVideo\n", "from IPython.core.display import HTML\n", "\n", "# Ignore Warnings\n", "import warnings\n", "warnings.filterwarnings(action=\"ignore\", message=\"^internal gelsd\")\n", "\n", "print('Setup Complete')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python: 3.7.6\n", "NumPy: 1.18.1\n", "pandas: 1.0.1\n", "matplotlib: 3.1.3\n", "Seaborn: 0.10.0\n" ] } ], "source": [ "# Python / module versions used here for reference\n", "print('Python: {}'.format(platform.python_version()))\n", "print('NumPy: {}'.format(np.__version__))\n", "print('pandas: {}'.format(pd.__version__))\n", "print('matplotlib: {}'.format(mpl.__version__))\n", "print('Seaborn: {}'.format(sns.__version__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defined Filepaths" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Set up initial paths to subfolders\n", "base_dir = os.path.join('..', '..', )\n", "data_dir = os.path.join(base_dir, 'data')\n", "data_dir_fbref = os.path.join(base_dir, 'data', 'fbref')\n", "data_dir_stratabet = os.path.join(base_dir, 'data', 'stratabet')\n", "img_dir = os.path.join(base_dir, 'img')\n", "fig_dir = os.path.join(base_dir, 'img', 'fig')\n", "video_dir = os.path.join(base_dir, 'video')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook Settings" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "pd.set_option('display.max_columns', None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Project Brief\n", "This Jupyter notebook explores how to engineer [StrataBet]( http://www.stratagem.co/) Events data using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames.\n", "\n", "The resulting engineered Datarame is exported as a CSV file. This data can be further analysed in Python, joined to other datasets, or explored using Tableau, PowerBI, Microsoft Excel." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Data Sources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1. Introduction\n", "This StrataData has been made availble by [Stratagem Technologies](http://www.stratagem.co/). StrataData powers the [StrataBet Sports Trading Platform](http://www.stratabet.com/), in addition to [StrataBet Premium Recommendations](http://app.stratabet.com/recommendations).\n", "\n", "Before conducting our EDA, the data needs to be imported as a DataFrame in the Data Sources section [Section 3](#section3) and cleaned in the Data Engineering section [Section 4](#section4).\n", "\n", "We'll be using the [pandas](http://pandas.pydata.org/) library to import our data to this workbook as a DataFrame." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2. Chances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2.1. Data Dictionary\n", "The [StrataBet]( http://www.stratagem.co/) Events dataset has twelve features (columns) with the following definitions and data types:\n", "\n", "| Feature | Data type |\n", "|------|-----|\n", "| `eventId` | int64 |\n", "| `subEventName` | object |\n", "| `tags` | object |\n", "| `playerId` | int64 |\n", "| `positions` | object |\n", "| `matchId` | int64 |\n", "| `eventName` | object |\n", "| `teamId` | int64 |\n", "| `matchPeriod` | object |\n", "| `eventSec` | float64 |\n", "| `subEventId` | object |\n", "| `id` | int64 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2.2. Import Data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Combine individual csv files to form one DataFrame, df_raw, using glob\n", "lst_files_chances = glob.glob(data_dir_stratabet + '/raw/chances/individual_competitions' + \"/*.csv\") # Creates a list of all csv files\n", "\n", "li = [] # pd.concat takes a list of DataFrames as an argument\n", "\n", "for filename in lst_files_chances:\n", " df_raw_temp = pd.read_csv(filename, index_col=None, header=0)\n", " li.append(df_raw_temp)\n", "\n", "df_stratabet_chances_raw = pd.concat(li, axis=0, ignore_index=True) # ignore_index=True as we don't want pandas to try an align row indexes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2.3. Initial Data Handling\n", "Let's quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2iconchanceRatingteamtypetimeplayerlocation_xlocation_ybodyPartshotQualitydefPressurenumDefPlayersnumAttPlayersoutcomeprimaryPlayerprimaryTypeprimaryLocation_xprimaryLocation_ysecondaryPlayersecondaryType
04684GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play24:43:00D. Epstein8148Left3520Saved------
14685GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play45:29:00D. Epstein2760Left2220DefendedThuramOpen Play Pass-2982--
24686GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play44:34:00S. Siontis23117Right2141Missed------
34687GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchancePlataniasOpen Play42:39:00O. Gnjatic-9118Left1131MissedG. ManousosOpen Play Pass7792--
44688GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play40:46:00D. Epstein4215Left2520Saved------
\n", "
" ], "text/plain": [ " index competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "0 4684 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "1 4685 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "2 4686 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "3 4687 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "4 4688 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "\n", " awayteam_team2 icon chanceRating team type time \\\n", "0 Platanias poorchance poorchance Kerkyra Open Play 24:43:00 \n", "1 Platanias goodchance goodchance Kerkyra Open Play 45:29:00 \n", "2 Platanias poorchance poorchance Kerkyra Open Play 44:34:00 \n", "3 Platanias poorchance poorchance Platanias Open Play 42:39:00 \n", "4 Platanias goodchance goodchance Kerkyra Open Play 40:46:00 \n", "\n", " player location_x location_y bodyPart shotQuality defPressure \\\n", "0 D. Epstein 81 48 Left 3 5 \n", "1 D. Epstein 27 60 Left 2 2 \n", "2 S. Siontis 23 117 Right 2 1 \n", "3 O. Gnjatic -9 118 Left 1 1 \n", "4 D. Epstein 42 15 Left 2 5 \n", "\n", " numDefPlayers numAttPlayers outcome primaryPlayer primaryType \\\n", "0 2 0 Saved - - \n", "1 2 0 Defended Thuram Open Play Pass \n", "2 4 1 Missed - - \n", "3 3 1 Missed G. Manousos Open Play Pass \n", "4 2 0 Saved - - \n", "\n", " primaryLocation_x primaryLocation_y secondaryPlayer secondaryType \n", "0 - - - - \n", "1 -29 82 - - \n", "2 - - - - \n", "3 77 92 - - \n", "4 - - - - " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the first 5 rows of the raw DataFrame, df_stratabet_chances_raw \n", "df_stratabet_chances_raw.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2iconchanceRatingteamtypetimeplayerlocation_xlocation_ybodyPartshotQualitydefPressurenumDefPlayersnumAttPlayersoutcomeprimaryPlayerprimaryTypeprimaryLocation_xprimaryLocation_ysecondaryPlayersecondaryType
11914333422EngPr246309112/03/201820:00:00Stoke CityManchester CitygoodchancegoodchanceManchester CityOpen Play41:13:00L. Sane4542Left2220MissedK. De BruyneCross High-6948Gabriel JesusOpen Play Pass
11914433424EngPr246309112/03/201820:00:00Stoke CityManchester CitygreatchancegreatchanceManchester CityOpen Play56:58:00R. Sterling-1813Left3420SavedDavid SilvaOpen Play Pass-3686L. SaneOpen Play Pass
11914533418EngPr246309112/03/201820:00:00Stoke CityManchester CitypoorchancepoorchanceManchester CityOpen Play25:50:00Fernandinho397Left2132Missed------
11914633426EngPr246309112/03/201820:00:00Stoke CityManchester CitypoorchancepoorchanceManchester CityOpen Play57:48:00R. Sterling3192Right2141DefendedL. SaneOpen Play Pass-786K. De BruyneOpen Play Pass
11914733427EngPr246309112/03/201820:00:00Stoke CityManchester CitypoorchancepoorchanceManchester CityOpen Play58:29:00L. Sane-27107Left2153MissedR. SterlingOpen Play Pass1788David SilvaOpen Play Pass
\n", "
" ], "text/plain": [ " index competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "119143 33422 EngPr 2463091 12/03/2018 20:00:00 Stoke City \n", "119144 33424 EngPr 2463091 12/03/2018 20:00:00 Stoke City \n", "119145 33418 EngPr 2463091 12/03/2018 20:00:00 Stoke City \n", "119146 33426 EngPr 2463091 12/03/2018 20:00:00 Stoke City \n", "119147 33427 EngPr 2463091 12/03/2018 20:00:00 Stoke City \n", "\n", " awayteam_team2 icon chanceRating team type \\\n", "119143 Manchester City goodchance goodchance Manchester City Open Play \n", "119144 Manchester City greatchance greatchance Manchester City Open Play \n", "119145 Manchester City poorchance poorchance Manchester City Open Play \n", "119146 Manchester City poorchance poorchance Manchester City Open Play \n", "119147 Manchester City poorchance poorchance Manchester City Open Play \n", "\n", " time player location_x location_y bodyPart shotQuality \\\n", "119143 41:13:00 L. Sane 45 42 Left 2 \n", "119144 56:58:00 R. Sterling -18 13 Left 3 \n", "119145 25:50:00 Fernandinho 3 97 Left 2 \n", "119146 57:48:00 R. Sterling 31 92 Right 2 \n", "119147 58:29:00 L. Sane -27 107 Left 2 \n", "\n", " defPressure numDefPlayers numAttPlayers outcome primaryPlayer \\\n", "119143 2 2 0 Missed K. De Bruyne \n", "119144 4 2 0 Saved David Silva \n", "119145 1 3 2 Missed - \n", "119146 1 4 1 Defended L. Sane \n", "119147 1 5 3 Missed R. Sterling \n", "\n", " primaryType primaryLocation_x primaryLocation_y secondaryPlayer \\\n", "119143 Cross High -69 48 Gabriel Jesus \n", "119144 Open Play Pass -36 86 L. Sane \n", "119145 - - - - \n", "119146 Open Play Pass -7 86 K. De Bruyne \n", "119147 Open Play Pass 17 88 David Silva \n", "\n", " secondaryType \n", "119143 Open Play Pass \n", "119144 Open Play Pass \n", "119145 - \n", "119146 Open Play Pass \n", "119147 Open Play Pass " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the last 5 rows of the raw DataFrame, df_stratabet_chances_raw \n", "df_stratabet_chances_raw.tail()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(119148, 27)\n" ] } ], "source": [ "# Print the shape of the raw DataFrame, df_stratabet_chances_raw \n", "print(df_stratabet_chances_raw.shape)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['index', 'competition', 'gsm_id', 'kickoffDate', 'kickoffTime',\n", " 'hometeam_team1', 'awayteam_team2', 'icon', 'chanceRating', 'team',\n", " 'type', 'time', 'player', 'location_x', 'location_y', 'bodyPart',\n", " 'shotQuality', 'defPressure', 'numDefPlayers', 'numAttPlayers',\n", " 'outcome', 'primaryPlayer', 'primaryType', 'primaryLocation_x',\n", " 'primaryLocation_y', 'secondaryPlayer', 'secondaryType'],\n", " dtype='object')\n" ] } ], "source": [ "# Print the column names of the raw DataFrame, df_stratabet_chances_raw \n", "print(df_stratabet_chances_raw.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset has six features (columns). Full details of these attributes can be found in the [Data Dictionary](section3.3.1)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "index int64\n", "competition object\n", "gsm_id int64\n", "kickoffDate object\n", "kickoffTime object\n", "hometeam_team1 object\n", "awayteam_team2 object\n", "icon object\n", "chanceRating object\n", "team object\n", "type object\n", "time object\n", "player object\n", "location_x object\n", "location_y object\n", "bodyPart object\n", "shotQuality object\n", "defPressure object\n", "numDefPlayers object\n", "numAttPlayers object\n", "outcome object\n", "primaryPlayer object\n", "primaryType object\n", "primaryLocation_x object\n", "primaryLocation_y object\n", "secondaryPlayer object\n", "secondaryType object\n", "dtype: object" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data types of the features of the raw DataFrame, df_stratabet_chances_raw \n", "df_stratabet_chances_raw .dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All six of the columns have the object data type. Full details of these attributes and their data types can be found in the [Data Dictionary](section3.3.1)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 119148 entries, 0 to 119147\n", "Data columns (total 27 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 index 119148 non-null int64 \n", " 1 competition 119148 non-null object\n", " 2 gsm_id 119148 non-null int64 \n", " 3 kickoffDate 119148 non-null object\n", " 4 kickoffTime 119148 non-null object\n", " 5 hometeam_team1 119148 non-null object\n", " 6 awayteam_team2 119148 non-null object\n", " 7 icon 119148 non-null object\n", " 8 chanceRating 119148 non-null object\n", " 9 team 119148 non-null object\n", " 10 type 119148 non-null object\n", " 11 time 119148 non-null object\n", " 12 player 119148 non-null object\n", " 13 location_x 119148 non-null object\n", " 14 location_y 119148 non-null object\n", " 15 bodyPart 119148 non-null object\n", " 16 shotQuality 117618 non-null object\n", " 17 defPressure 119148 non-null object\n", " 18 numDefPlayers 119148 non-null object\n", " 19 numAttPlayers 119148 non-null object\n", " 20 outcome 119148 non-null object\n", " 21 primaryPlayer 119148 non-null object\n", " 22 primaryType 119148 non-null object\n", " 23 primaryLocation_x 119148 non-null object\n", " 24 primaryLocation_y 119148 non-null object\n", " 25 secondaryPlayer 119148 non-null object\n", " 26 secondaryType 119148 non-null object\n", "dtypes: int64(2), object(25)\n", "memory usage: 24.5+ MB\n" ] } ], "source": [ "# Info for the raw DataFrame, df_stratabet_chances_raw \n", "df_stratabet_chances_raw.info()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexgsm_id
count119148.0000001.191480e+05
mean9941.6645012.380177e+06
std17678.0107941.127321e+05
min0.0000002.237445e+06
25%1580.0000002.247140e+06
50%3387.0000002.404032e+06
75%6534.0000002.467501e+06
max80263.0000002.701477e+06
\n", "
" ], "text/plain": [ " index gsm_id\n", "count 119148.000000 1.191480e+05\n", "mean 9941.664501 2.380177e+06\n", "std 17678.010794 1.127321e+05\n", "min 0.000000 2.237445e+06\n", "25% 1580.000000 2.247140e+06\n", "50% 3387.000000 2.404032e+06\n", "75% 6534.000000 2.467501e+06\n", "max 80263.000000 2.701477e+06" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Description of the raw DataFrame, df_stratabet_raw, showing some summary statistics for each numberical column in the DataFrame\n", "df_stratabet_chances_raw.describe()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot visualisation of the missing values for each feature of the raw DataFrame, df_stratabet_chances_raw \n", "msno.matrix(df_stratabet_chances_raw, figsize = (30, 7))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "shotQuality 1530\n", "dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Counts of missing values\n", "tm_null_value_stats = df_stratabet_chances_raw.isnull().sum(axis=0)\n", "tm_null_value_stats[tm_null_value_stats != 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualisation shows us very quickly that there a few missing values in the `shotQuality` column, but otherwise the dataset is complete. This data is now ready for Data Engineering." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2.4. Export Complete DataFrame" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances_raw.to_csv(data_dir_stratabet + '/raw/chances/' + 'stratabet_chances_all.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3. Key Entries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3.1. Data Dictionary\n", "The [StrataBet]( http://www.stratagem.co/) Events dataset has twelve features (columns) with the following definitions and data types:\n", "\n", "| Feature | Data type |\n", "|------|-----|\n", "| `eventId` | int64 |\n", "| `subEventName` | object |\n", "| `tags` | object |\n", "| `playerId` | int64 |\n", "| `positions` | object |\n", "| `matchId` | int64 |\n", "| `eventName` | object |\n", "| `teamId` | int64 |\n", "| `matchPeriod` | object |\n", "| `eventSec` | float64 |\n", "| `subEventId` | object |\n", "| `id` | int64 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3.2. Import Data" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Combine individual csv files to form one DataFrame, df_raw, using glob\n", "lst_files_key_entries = glob.glob(data_dir_stratabet + '/raw/key_entries/individual_competitions' + \"/*.csv\") # Creates a list of all csv files\n", "\n", "li = [] # pd.concat takes a list of DataFrames as an argument\n", "\n", "for filename in lst_files_key_entries:\n", " df_raw_temp = pd.read_csv(filename, index_col=None, header=0)\n", " li.append(df_raw_temp)\n", "\n", "df_stratabet_key_entries_raw = pd.concat(li, axis=0, ignore_index=True) # ignore_index=True as we don't want pandas to try an align row indexes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3.3. Initial Data Handling\n", "Let's quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2teamkeyentryAreakeyentryType
01535649570ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersRightPass
11535659571ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersBoxPass
21535669572ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersRightPass
31535679573ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalHamilton AcademicalRightPass
41535689574ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersBoxRun
\n", "
" ], "text/plain": [ " True index competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "0 153564 9570 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "1 153565 9571 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "2 153566 9572 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "3 153567 9573 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "4 153568 9574 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "\n", " awayteam_team2 team keyentryArea keyentryType \n", "0 Hamilton Academical Rangers Right Pass \n", "1 Hamilton Academical Rangers Box Pass \n", "2 Hamilton Academical Rangers Right Pass \n", "3 Hamilton Academical Hamilton Academical Right Pass \n", "4 Hamilton Academical Rangers Box Run " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the first 5 rows of the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.head()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2teamkeyentryAreakeyentryType
1956821112817814GreSL27014772018-03-1115:15:00LamiaLevadiakosLevadiakosRightTurnover
1956831112827815GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaLeftTurnover
1956841112837816GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaRightPass
1956851112847817GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaBoxPass
1956861112857818GreSL27014772018-03-1115:15:00LamiaLevadiakosLevadiakosBoxRun
\n", "
" ], "text/plain": [ " True index competition gsm_id kickoffDate kickoffTime \\\n", "195682 111281 7814 GreSL 2701477 2018-03-11 15:15:00 \n", "195683 111282 7815 GreSL 2701477 2018-03-11 15:15:00 \n", "195684 111283 7816 GreSL 2701477 2018-03-11 15:15:00 \n", "195685 111284 7817 GreSL 2701477 2018-03-11 15:15:00 \n", "195686 111285 7818 GreSL 2701477 2018-03-11 15:15:00 \n", "\n", " hometeam_team1 awayteam_team2 team keyentryArea keyentryType \n", "195682 Lamia Levadiakos Levadiakos Right Turnover \n", "195683 Lamia Levadiakos Lamia Left Turnover \n", "195684 Lamia Levadiakos Lamia Right Pass \n", "195685 Lamia Levadiakos Lamia Box Pass \n", "195686 Lamia Levadiakos Levadiakos Box Run " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the last 5 rows of the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.tail()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(195687, 11)\n" ] } ], "source": [ "# Print the shape of the raw DataFrame, df_stratabet_key_entries_raw \n", "print(df_stratabet_key_entries_raw.shape)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['True', 'index', 'competition', 'gsm_id', 'kickoffDate', 'kickoffTime',\n", " 'hometeam_team1', 'awayteam_team2', 'team', 'keyentryArea',\n", " 'keyentryType'],\n", " dtype='object')\n" ] } ], "source": [ "# Print the column names of the raw DataFrame, df_stratabet_key_entries_raw \n", "print(df_stratabet_key_entries_raw.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset has six features (columns). Full details of these attributes can be found in the [Data Dictionary](section3.3.1)." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 195687 entries, 0 to 195686\n", "Data columns (total 11 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 True 195687 non-null int64 \n", " 1 index 195687 non-null int64 \n", " 2 competition 195687 non-null object\n", " 3 gsm_id 195687 non-null int64 \n", " 4 kickoffDate 195687 non-null object\n", " 5 kickoffTime 195687 non-null object\n", " 6 hometeam_team1 195687 non-null object\n", " 7 awayteam_team2 195687 non-null object\n", " 8 team 195687 non-null object\n", " 9 keyentryArea 195687 non-null object\n", " 10 keyentryType 195687 non-null object\n", "dtypes: int64(3), object(8)\n", "memory usage: 16.4+ MB\n" ] } ], "source": [ "# Info for the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.info()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexgsm_id
count195687.000000195687.0000001.956870e+05
mean97843.0000006453.8474142.354363e+06
std56490.1154015040.2883731.125099e+05
min0.0000000.0000002.237445e+06
25%48921.5000002675.0000002.246823e+06
50%97843.0000005401.0000002.360808e+06
75%146764.5000008928.0000002.467223e+06
max195686.00000025384.0000002.701477e+06
\n", "
" ], "text/plain": [ " True index gsm_id\n", "count 195687.000000 195687.000000 1.956870e+05\n", "mean 97843.000000 6453.847414 2.354363e+06\n", "std 56490.115401 5040.288373 1.125099e+05\n", "min 0.000000 0.000000 2.237445e+06\n", "25% 48921.500000 2675.000000 2.246823e+06\n", "50% 97843.000000 5401.000000 2.360808e+06\n", "75% 146764.500000 8928.000000 2.467223e+06\n", "max 195686.000000 25384.000000 2.701477e+06" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Description of the raw DataFrame, df_stratabet_key_entries_raw, showing some summary statistics for each numberical column in the DataFrame\n", "df_stratabet_key_entries_raw.describe()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot visualisation of the missing values for each feature of the raw DataFrame, df_stratabet_key_entries_raw \n", "msno.matrix(df_stratabet_key_entries_raw, figsize = (30, 7))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Series([], dtype: int64)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Counts of missing values\n", "tm_null_value_stats = df_stratabet_key_entries_raw.isnull().sum(axis=0)\n", "tm_null_value_stats[tm_null_value_stats != 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualisation shows us very quickly that there a few missing values in the `shotQuality` column, but otherwise the dataset is complete. This data is now ready for Data Engineering." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3.4. Export Complete DataFrame" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "df_stratabet_key_entries_raw.to_csv(data_dir_stratabet + '/raw/key_entries/' + 'stratabet_key_entries_all.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4. Match Info" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.4.1. Data Dictionary\n", "The [StrataBet]( http://www.stratagem.co/) Events dataset has twelve features (columns) with the following definitions and data types:\n", "\n", "| Feature | Data type |\n", "|------|-----|\n", "| `eventId` | int64 |\n", "| `subEventName` | object |\n", "| `tags` | object |\n", "| `playerId` | int64 |\n", "| `positions` | object |\n", "| `matchId` | int64 |\n", "| `eventName` | object |\n", "| `teamId` | int64 |\n", "| `matchPeriod` | object |\n", "| `eventSec` | float64 |\n", "| `subEventId` | object |\n", "| `id` | int64 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.4.2. Import Data" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# Combine individual csv files to form one DataFrame, df_raw, using glob\n", "lst_files_key_entries = glob.glob(data_dir_stratabet + '/raw/key_entries/individual_competitions' + \"/*.csv\") # Creates a list of all csv files\n", "\n", "li = [] # pd.concat takes a list of DataFrames as an argument\n", "\n", "for filename in lst_files_key_entries:\n", " df_raw_temp = pd.read_csv(filename, index_col=None, header=0)\n", " li.append(df_raw_temp)\n", "\n", "df_stratabet_key_entries_raw = pd.concat(li, axis=0, ignore_index=True) # ignore_index=True as we don't want pandas to try an align row indexes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.4.3. Initial Data Handling\n", "Let's quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2teamkeyentryAreakeyentryType
01535649570ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersRightPass
11535659571ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersBoxPass
21535669572ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersRightPass
31535679573ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalHamilton AcademicalRightPass
41535689574ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersBoxRun
\n", "
" ], "text/plain": [ " True index competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "0 153564 9570 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "1 153565 9571 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "2 153566 9572 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "3 153567 9573 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "4 153568 9574 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "\n", " awayteam_team2 team keyentryArea keyentryType \n", "0 Hamilton Academical Rangers Right Pass \n", "1 Hamilton Academical Rangers Box Pass \n", "2 Hamilton Academical Rangers Right Pass \n", "3 Hamilton Academical Hamilton Academical Right Pass \n", "4 Hamilton Academical Rangers Box Run " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the first 5 rows of the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.head()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2teamkeyentryAreakeyentryType
1956821112817814GreSL27014772018-03-1115:15:00LamiaLevadiakosLevadiakosRightTurnover
1956831112827815GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaLeftTurnover
1956841112837816GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaRightPass
1956851112847817GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaBoxPass
1956861112857818GreSL27014772018-03-1115:15:00LamiaLevadiakosLevadiakosBoxRun
\n", "
" ], "text/plain": [ " True index competition gsm_id kickoffDate kickoffTime \\\n", "195682 111281 7814 GreSL 2701477 2018-03-11 15:15:00 \n", "195683 111282 7815 GreSL 2701477 2018-03-11 15:15:00 \n", "195684 111283 7816 GreSL 2701477 2018-03-11 15:15:00 \n", "195685 111284 7817 GreSL 2701477 2018-03-11 15:15:00 \n", "195686 111285 7818 GreSL 2701477 2018-03-11 15:15:00 \n", "\n", " hometeam_team1 awayteam_team2 team keyentryArea keyentryType \n", "195682 Lamia Levadiakos Levadiakos Right Turnover \n", "195683 Lamia Levadiakos Lamia Left Turnover \n", "195684 Lamia Levadiakos Lamia Right Pass \n", "195685 Lamia Levadiakos Lamia Box Pass \n", "195686 Lamia Levadiakos Levadiakos Box Run " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the last 5 rows of the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.tail()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(195687, 11)\n" ] } ], "source": [ "# Print the shape of the raw DataFrame, df_stratabet_key_entries_raw \n", "print(df_stratabet_key_entries_raw.shape)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['True', 'index', 'competition', 'gsm_id', 'kickoffDate', 'kickoffTime',\n", " 'hometeam_team1', 'awayteam_team2', 'team', 'keyentryArea',\n", " 'keyentryType'],\n", " dtype='object')\n" ] } ], "source": [ "# Print the column names of the raw DataFrame, df_stratabet_key_entries_raw \n", "print(df_stratabet_key_entries_raw.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset has six features (columns). Full details of these attributes can be found in the [Data Dictionary](section3.3.1)." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 195687 entries, 0 to 195686\n", "Data columns (total 11 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 True 195687 non-null int64 \n", " 1 index 195687 non-null int64 \n", " 2 competition 195687 non-null object\n", " 3 gsm_id 195687 non-null int64 \n", " 4 kickoffDate 195687 non-null object\n", " 5 kickoffTime 195687 non-null object\n", " 6 hometeam_team1 195687 non-null object\n", " 7 awayteam_team2 195687 non-null object\n", " 8 team 195687 non-null object\n", " 9 keyentryArea 195687 non-null object\n", " 10 keyentryType 195687 non-null object\n", "dtypes: int64(3), object(8)\n", "memory usage: 16.4+ MB\n" ] } ], "source": [ "# Info for the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.info()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexgsm_id
count195687.000000195687.0000001.956870e+05
mean97843.0000006453.8474142.354363e+06
std56490.1154015040.2883731.125099e+05
min0.0000000.0000002.237445e+06
25%48921.5000002675.0000002.246823e+06
50%97843.0000005401.0000002.360808e+06
75%146764.5000008928.0000002.467223e+06
max195686.00000025384.0000002.701477e+06
\n", "
" ], "text/plain": [ " True index gsm_id\n", "count 195687.000000 195687.000000 1.956870e+05\n", "mean 97843.000000 6453.847414 2.354363e+06\n", "std 56490.115401 5040.288373 1.125099e+05\n", "min 0.000000 0.000000 2.237445e+06\n", "25% 48921.500000 2675.000000 2.246823e+06\n", "50% 97843.000000 5401.000000 2.360808e+06\n", "75% 146764.500000 8928.000000 2.467223e+06\n", "max 195686.000000 25384.000000 2.701477e+06" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Description of the raw DataFrame, df_stratabet_key_entries_raw, showing some summary statistics for each numberical column in the DataFrame\n", "df_stratabet_key_entries_raw.describe()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABu4AAAH/CAYAAAC4pivrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOzdZ3iU1b7+8e+U9N4jBAgppFASOqEjLQoIhL4RBQVEUQERhXM2bkEUGygogo2moCBNOkiRDgqhSBNEQARBCCSkJ1P+L/zPHKK7eM7ZMJzk/rwBMpm51rpY1zzPs+71W8tgt9vtiIiIiIiIiIiIiIiIiIhLGV3dABERERERERERERERERFRcCciIiIiIiIiIiIiIiJyV1BwJyIiIiIiIiIiIiIiInIXUHAnIiIiIiIiIiIiIiIichdQcCciIiIiIiIiIiIiIiJyF1BwJyIiIiIiIiIiIiIiInIXUHAnIiIiIiIiIiIiIiIichdQcCciIiIiIiIiIiIiIiJyF1BwJyIiIiIiIiIiIiIiInIXUHAnIiIiIiIiInfEsWPH+Oyzz1zdDBERERGRu5bZ1Q0QERERERERkYrh+PHjTJgwAV9fX7p06eLq5shdJjs7m9zcXAwGAwEBAfj5+bm6SSIiIiJ3nII7EREREREREbkjgoKCABgzZgx5eXn069fPxS2Su8WhQ4eYMGEC169f5+bNmzRo0IChQ4fSsGFDVzdNRERE5I7SVpkiIiIiIiIickfk5eXh7+9P165dmTBhgrbNFAC+//57nnjiCWrXrs3f/vY3nn32WbKysnjjjTe4cOGCq5snIiIickep4k5ERERERERE7ohz586RkJDA8OHD8fDwYMKECQCqvKug7HY7BoOB5cuXk5iYyOjRowkICADA3d2dv/3tb/z8889UqVLFxS0VERERuXNUcSciIiIiIiIid0xMTAxVq1bl8ccfp1+/fqq8q8AMBgMAly5dwmw2ExAQgN1uB6BLly74+Pjw7bffurKJIiIiInecKu5ERERERERE5I545JFHcHNzA+Cee+5hyJAhAKq8q+ACAwPZtWsXhYWFeHl5Ab9V3Hl7e5Obm+vi1omIiIjcWQruREREREREROTfzmKxYDaXnXbw9/fHbrc7t0isVKlSmfDOYDDQt29fVzRXXMAxDgYPHkxpaSmnT5+mTp06WCwWDAYDnp6ezqo8x++WlJTg7u7u4paLiIiI3D4K7kRERERERETk38pqtWI2m8nPz+fdd9+loKCAunXr0q1btz8EMY7wzmg08uKLLxIaGkq7du1c3AO53Ww2G0bjbye4VK5cmbFjxzrPtzMajRQXF5Ofn4+Hhwfw27aaly5d4rXXXmPw4MHUrl3bZW0XERERuZ0U3ImIiIiIiIjIv5XJZKKwsJA+ffpw8+ZNTCYTixYt4uzZs4waNQr4LYi5NbwbOHAglStXpnXr1q5tvNx2jmrM0tJScnNz8fT0dIZ2VqsVk8mEyWSioKDAGdxdvHiR0aNHk5OTQ1JSkiubLyIiInJbKbgTERERERERkX8LR+gCcPLkSe655x5mzpwJwFdffcWbb76JzWZj9OjRQNnwrkqVKjzyyCPA399mU8oHu92O2WwmLy+PESNGcO7cOeLi4ujWrRv33XcfJpMJi8VCfn4+AN7e3ly/fp2RI0dSUFDAypUrMZvNZcaaiIiISHmiu2ARERERERER+V9zhG3FxcUcPXqUrVu34uXlRZUqVQDo0aMHNpuNKVOmAJQJ735PoV35ZTAYsFgsDB8+nMLCQpo1a8aRI0eYMWMGeXl59OrVC7PZjMViwcvLi1OnTvHEE0+Ql5fHypUrcXNzU7ArIiIi5ZruckRERERERETkf81RRdWvXz+ysrK4fv06NWvW5OeffyYqKoqAgAB69+4NwFtvvYXRaHRumynl361hW1FREe7u7jz77LPUrl2bkydPMmXKFObMmQNAr169CAsLIz4+nhUrVpCUlKTQTkRERCoM04svvviiqxshIiIiIiIiIv83WSwWjEYjNpuNsWPHYjabeeSRR0hOTmbDhg0UFhaSmpqKl5cXHh4exMXFERAQwPTp0wkJCaF27dqu7oLcZlarFbPZTEFBATNnziQzM5NvvvmGwYMH4+npSWhoKDExMRw7dowdO3ZgNpupVasWV65cwcPDg7lz5yq0ExERkQpDwZ2IiIiIiIiI/I8ZjUYKCgpYu3YtV65cISMjg/T0dBo2bEhYWBjvvfce+fn51KlTxxnexcTEkJCQQNeuXTEaja7ugtxGdrvdOUa6devGoUOH+Pbbb8nKysLf35/69esDEB4eTkxMDCdOnGDXrl24ubnRv39/Onfu7Dz3TqGdiIiIVAQK7kRERERERETkf2Xr1q2MHj2akydP0rp1a2rUqAFAcnIy4eHhzJgxg/z8fFJSUvD09MTT05OEhASMRqOzYk/KH5vNhtFoxGq1smPHDi5evMiMGTNo27YtRUVFrFixAg8PD1JSUoDfwrvq1auzc+dOcnNzue+++zAYDNjtdkwmk4t7IyIiInJnKLgTERERERERkf8Wm82GwWBw/js2Npbq1auzefNmLBYLderUISAgAIPBQHJyMhEREcycOZOLFy/StGlT3N3dne9VaFd+GQwGSkpKGDt2LOvXr6d69ep07dqVSpUqERMTw40bN1i6dCnu7u5lwrsGDRrQu3dv5xi7dayJiIiIlHfaY0BERERERERE/jTHloUlJSX88MMP+Pv7ExYWRqdOnSgoKOCFF14gJCSEJ554gsjISAwGAz169KCwsJANGzbg7e3t6i7IHXTx4kUMBgNXrlwhISHBGcLFx8czePBgAGbNmoXBYGDAgAEAxMTEAP9VsSciIiJSkajiTkRERERERET+FKvVitlsJi8vj2HDhjF37lxWrVrF1atXqVOnDvXq1XNujVlQUEBSUhK+vr4YDAZSUlLo3r07RqPxDxV7Un5YrdYyYVtQUBDR0dHk5OSwYsUKfH19SU1NBSAkJITo6Ghu3LjBBx98QGxsLPHx8c73aoyIiIhIRaTgTkRERERERET+FKPRSEFBAT169MBoNDJw4EDsdjubN2/m119/pW7dutSvX5/w8HBmzpxJQUEB8fHx+Pv7AzjPK1MVVfnkqMYsKipi9erV7N27l6ioKKpUqUKNGjXIzc1l7ty5+Pv7U6dOHeC38C4qKorw8HD69OmjsSEiIiIVnoI7EREREREREflT7HY7U6ZMoaCggLfffpsmTZoQEhLC9u3b+fHHH7l06RL169d3Vt698847hIWFUb9+fednqIqqfLq1GrNv376sXbuWLVu2sGbNGqKjo0lJSaFGjRrcvHmT2bNnlwnvQkNDadiwIUaj8Q8VeyIiIiIVjc64ExEREREREZE/xWAwcOHCBapUqUJYWBj5+fls2LCBpKQkPDw8WLNmDWazmZEjR9KrVy+Cg4Np1aqVq5std4DJZKKoqIiHHnqI0NBQ51mHw4YN47XXXgOgTZs2PPHEEwC8/vrrFBQUMHTo0D98joiIiEhFpiVMIiIiIiIiIvIv2Ww2bDYbubm5wG/bZi5atIglS5YwfPhw3njjDSIjI1myZAm9evXi8uXLtG3bFrPZjMVicXHr5U7YsWMHAOPGjaNBgwbY7XZnld0LL7zA1q1bqVKlCk888QTNmzdn+/bt2O12F7daRERE5O6iijsRERERERER+QOr1Vqm+smxfeH48eO5efMmly5dYtq0abzwwgskJCRQUlKCp6cnDRo0oFq1aoSFhTnfazZr+qEi+PHHH7l48SLx8fEAfPHFF2RnZzNr1iwmTZrE5MmTsVqtpKWl8corr+Dv7+8891BbqIqIiIj8RnfOIiIiIiIiIlKGxWLBbDZTUFDA0qVLOX/+PBEREaSkpNCoUSPgt+oqf39/2rdvD8D58+dxc3OjZ8+e3H///cAfwz8pP24N22w2G0ajkfr167N7926Ki4tZtWoVc+bMYc6cOdSpU4c2bdowbdo0xowZw/Dhwxk8eHCZ94qIiPye7iOkojLYtSeBiIiIiIiIiPx/jkAmPz+fnj17OqvlTCYTZ86cYejQoQwZMoRTp07Ru3dvBgwYQJ06dfjkk08wGo0sWLAAk8mkKqpyzBHsOkK3goICvL29KS0t5eLFi1SqVIkePXqQnp7O8OHDsdvtTJ06lV9++YWUlBT+8pe/aCJWpIL7e4GMrhtyK8e1pri4mL1795Kfn4+fnx8tWrRwddNEbjtV3ImIiIiIiIiIk8FgwGazObcyfOWVV4iNjcVutzN48GBmzJhBeno6MTExDB06lE8++YSvvvqKqlWr8vHHH2MymbRCvhyzWq2YzWby8vIYP348v/zyC5cuXaJ9+/Z06NCBxo0bU1JSQklJCVarFYALFy6wb98+WrZsyYABA5yfozEiUjHdGsgcPHiQ4uJiWrVqpdBOnGw2m/Na8+CDD5Kfn8+NGzcoKiqiadOmPPjgg7Ro0UJjRsotBXciIiIiIiIiFdjNmzcxm814e3s7f2axWPjhhx+oVasWsbGxAKxdu5Z9+/Yxfvx4zp49S3Z2Ns888wy9e/fm5s2bJCYmYjQanROyUj6ZTCYKCwvp3bs3/v7+NG7cGKPRyPbt21m7di3PP/88HTt2JDw8nA0bNnDixAkuX76MyWRi2LBhZT5HRCqeW8P/gQMHcv78eXJzc2natCmTJk2iUqVKrm6i3AWMRiOlpaU8+eST+Pr6MmnSJAICArDZbHTs2JGioiJiY2OpXLmyq5sqcltoE3ERERERERGRCqqwsJDXXnuN5cuXA1BaWkphYSEGg4HS0lLnSvZVq1YxevRonnzySXr16sW2bdt45513yM/PJyoqiuTkZIxGo3OFvJQ/t560smzZMmw2G5MnT+bpp59mxIgRNG/enOzsbNzd3XF3d2fSpEnExcVRWFhIQkICn3/+OWaz2VmFJyIVkyP879evH15eXkycOJGpU6eSmZnJxIkTuXjxoqubKHeJ69evc+XKFXr27ElycjJVqlTh3LlzAGRkZHDp0iWuXbvm2kaK3Ca6mxYRERERERGpoLy8vAgNDeWll16itLSU9evX07dvX7p160ZCQgKbNm0iMjKSqVOnMnLkSGfF1I0bN/Dw8MDHx6fM5xmNWh9cnly4cIErV67QoEEDDAaD8/ypixcvYjKZqFy5MiaTiTVr1vD+++8zduxYfH19eemllxg/fjyvv/46Hh4ezs9TNaaIACxevBg/Pz9eeeUVqlSpwnfffUft2rXZuXMnY8eO5bXXXlPlnZCXl8fZs2dxd3fHaDSyZs0aRo8ezciRI2nXrh2DBg3ivvvu45FHHnF1U0X+7XRHLSIiIiIiIlKBPfXUU/Tq1YvXX3+dGzdukJaW5vx5QEAAb775JkOGDHGGdmfPnuXy5cskJCS4stlyBxw+fJj//M//5OjRowB8//33APj4+HD16lXc3d3ZunUro0eP5plnnmHgwIHcuHGDJUuWcObMmTKhnd1uV2gnIgCcPn2a0tJSqlSpgt1uZ/Xq1Xh5efHyyy9z6tQpJk6cyPnz5ykpKXF1U+UOsdlszr87Krx9fX2JjY3lwIEDLFy40HmtcdyPZGdnk5WV5ZL2itxuumMSERERERERqcBMJhM3btzAx8eHCxcusHXrVvr27UtERATPPPMMb731Fl9++SXe3t7k5uayZ88ebDYbI0eOBHBWYUn5ExkZSVBQEM888wxXrlyhX79+PPvsszRt2pRly5aRkZHB8ePHGTt2LA8++CDw2/arVatWJSAgoMxnaYyIVFyO64TVasVgMBAXF+fc4nDBggUsWLCAuXPn0qBBA7Zu3cr69es5ceIE48ePp127di5uvdxujmrskpISrly5Qm5uLsnJyURERJCRkcEbb7wBwPDhwxk6dCgAly9fxmAwUK1aNVc2XeS2UXAnIiIiIiIiUsE988wzmEwm5s+fz8SJE7FarfTv359mzZoRExPD22+/zbp16/D09KRmzZq88MILzvPKTCaTq5svt0mDBg0YNGgQo0ePxs3NjeTkZMxmM7Vq1SI9PZ0lS5ZQs2ZNevfujdls5uzZsyxfvpzo6GjCw8Nd3XwRcTHHNcIR3DuuFx07dqRRo0bcuHGDWbNmMW7cOBo0aOD8ne7du+Pj40ObNm1c1na5MxzV2Hl5eQwbNoxz586RnZ1NTEwMjz32GAMGDKCkpIRp06ZRUFDA119/TUFBAbNnz8bf358ePXq4ugsit4WCOxEREREREZEKzGAwEBMTA8CIESOw2+28/PLLAPTv358qVaowZcoUSktLsdlszu0PdV5Z+eaYcL98+TKNGjUiNzeXGTNmEBQURIsWLRgxYgQ2m40tW7Zw3333ERkZSXZ2Nj4+PkybNg2DwYDNZtO5hyIVlOM7JD8/n7fffptffvmF3NxcunXrRlpaGsnJyVy4cAGAsLAw4LetmM+ePcuDDz5IRkZGmc+R8sfxf2u1Wnn22WcxGAw8++yz+Pr6Mm/ePP7617/y448/MnToUIxGIytXrmT+/PnExMQQGRnJe++953y/xoiUNwa7Y9NYERERuW10IykiIiL/V+Tk5DB9+nQWLVrE2LFjSU1NJTMzk549e+Lt7Q1oe8yKpKSkBDc3N7Zv384HH3zAtWvX+I//+A9atWpFaWkphw4dYvfu3RQXF1O5cmX69u2LyWRSsCsiFBQU0L17d3x9fYmKiqK4uJijR49SvXp1xowZQ3BwMN27d6devXrExcWxf/9+DAYDn376qb4/Koji4mKWLl3K+vXreeSRR2jdurXzteeff56NGzfyxhtv0K5dOy5dukR+fj6+vr5ERkZiMBh0rZFyS8GdiIjIbea4kSwuLiYzMxMvLy+SkpKcq9VFRET+t/7RAhGFK/I/lZOTwzvvvMOnn36Kr68vcXFxLFy4UNVTFczvv0O+/vprPvzwQ65du8YLL7xAs2bNKC0txc3Nrcz7tGhNpGJzfHdMnTqVbdu28d577xEZGYnJZGL06NFs2rSJGTNm0Lx5czZv3syrr76Kh4cHUVFRvPvuu9qKuZy7NWx79913+eKLL8jOzmbZsmXExsZSWFiIl5cXAP369aO0tJQlS5b84XNU1S3lmenFF1980dWNEBERKc+MRiN5eXn85S9/YcmSJSxYsIAffviByMhI7rnnHlc3T0TuclarVQ+k8k/dOrG1detWdu/ezYULF7Db7YSGhrq4dfJ/laenJ02aNCEhIYHq1avz8ssvYzKZsNlsCoMrEMf/tWMSPjo6mtDQUL7//nvWrFmDxWJhxYoV+Pj4EBUV5XyfrlsiFZPjvtXx3fHZZ5/h5eXFX/7yF4xGI+vWrePdd99l3LhxeHh4sGvXLrp37879999P7969ycjIUMVuBWA0GsnPz2ft2rXcf//9XL58mSNHjuDm5kaLFi1wc3OjqKgIs9mMp6cnK1eupGPHjvj7+5e5B9H9iJRn+gYUERG5TRwPGzabjREjRhAYGMhTTz1Fbm4uEydOpLCwkMcff9x5CLeIyO/dGshs376dvLw8DAYD9erVIyIiwsWtk7uB3W53jpGnn36aw4cPY7VacXd358aNG4wcOZKePXvi4+Pj4pbK/0Xu7u6kp6c7/62J1IrLYDA4w7tWrVphMBiYPXs206ZNIzIykr/97W+ubqKIuJjjvrWwsJBvvvnG+V1x9epVAL766itGjRrFqFGj6N+/P9OmTWPZsmV06dKlzEIjm82ma00FsHHjRsaNG8e6det4+umnsVqt7Nixg6ioKAYMGICnpycAubm5hISE/CG0Eynv9C0oIiJym5jNZgoKCjhy5AiVK1fm/vvvp0mTJgBEREQwYsQIZs6cqfBORP6uWwOZESNGcPDgQaxWK9nZ2SQlJdGxY0eGDBni4laKqzkmMGbNmsXhw4d59dVXSUpKwmAwMGnSJCZPnkytWrWoX7++i1sqrvK/2Ubq9+/VRGrFdmt417JlSxISEsjOziYuLg6TyaRt7UQqMJvNhslkoqioiN69e1OnTh1SU1Np3rw57777LqNGjWLdunWMGTOGAQMGAFBaWkpERITz7FQHVeyWH/9sy/bk5GTi4uLYtGkTQ4YMYfDgwXz00UcsWrSIK1eu0KdPH86dO8fSpUuJjo4mICDgDrdexLX0TSgiInIbLViwgIEDB7J06VLn2R9Wq5W0tDSmT5/Od999x8yZMzlw4ICLWyoidwvHEdSOh9y3336bw4cPM2XKFJYuXUpmZiYmk4kZM2awe/duVzZV7iLHjx+nQYMGNGzYkMDAQHJycti+fTt9+vTBYDCwf/9+VzdRXMBisWA0GikpKeHkyZPcvHkTm832p95rt9udk6fbt2/n1KlTt7Op8n+EI7yD3xaiJSQkKLQTEee15ssvv8TX15e//OUvBAQE0K5dO2JiYli3bh3t2rXj0Ucfxd3dnZ9//pk9e/YQGxurs9/LseLiYgBKSkr+8FpCQgJt27blww8/JCsri5iYGB577DHq1KnDRx99xH333ce8efO45557eOuttzAYDH/6HkakPFBwJyIi8m/kmMhw6NGjB8OHD8dqtbJnzx5KSkowmUzY7XaaNGnCO++8w/Hjx3n55Zc5efKki1otIncDi8UC8IeH0pMnT5KWlkatWrWIjIykoKCA06dP079/f3x9fRXICBaLhZs3b1JaWorZbObcuXP07NmTJk2aMHr0aHbu3Mmbb75JTk7OH65TUn7Z7XbMZjN5eXkMHDiQwYMH06NHD5YuXUpOTs6/fK9j8cDcuXMZOXIk169fvxPNljvIarX+qZ/93u+vUxaLRaGdSAVnt9t55plneOONNygqKiI5ORmAoKAgXn31VRo3bszJkycZOHAgo0eP5rHHHsNisfDSSy853y/ly549e7j33nu5du0a7u7uFBQUsG3bNn755Rfn7/Tt25fIyEgWL16MxWKhWrVqDB8+nIyMDMLDw6lSpQrvvPMOnp6eFBcXqxpTKhTTiy+++KKrGyEiIlIeOCYt7HY7FosFu92Oj48PDRs2JCcnh48//pjg4GBq166N0WjEbrcTFRVFQkICp06dYuDAgdqzXaSCstvtzJkzhwMHDlC/fn3nd0FBQQHvvvsu1atXp3379pw9e5auXbuSlpbGuHHjmDFjBkeOHKF9+/b6/qggrFbrHyYtjEYjmZmZ7N27l9jYWIYOHUpaWhovv/wyfn5+rF69mp9//plBgwZpnFQwNpuN0aNHU1hYSM+ePbl58yZLly7F09OT2NhY5/kxt7o1tPvkk094/fXX+c///M8yZ93J/323nkW1bt06jh49StWqVf/umPi9W6sxP//8c86cOUONGjU0oSpSwdhsNuf1wmAwEBcXx7fffsuJEyfw8/MjNTUVAF9fX1q2bIm/vz+//PILZrOZunXrMmXKFMxms8L/cmj//v089thjpKen0759e8xmM2+88QYvv/wyhw4dAiA6Oprg4GBOnz7Nzp076dOnD0ajEX9/f2rUqMHly5fZtm0b+fn5NGrUSNt1S4Wj4E5EROTfwGq1Yjabyc/PZ9KkSXzyyScsWbKE7777jrp169KhQwdycnJ47733CAgIcIZ3NpuN6OhoHnjgAefqZU2qilQ8paWlfP3116xduxYvLy8+/vhjgoKCiI6O5sKFC+zbt4/g4GCefPJJmjZtyssvv4yvry+bN2/m7NmzdO/eXRMeFYDj/Bj4bbL84MGDXLp0ifj4eFJSUvjyyy/59NNPadWqFVOmTMHLy4vs7GzWrFlDREQEbdq0wWg06jpTzjm2xzQYDBiNRrZs2cKAAQPo1q0bXbt25dSpUyxfvhwPD48/hHe/D+1eeeUVJkyYQO/evV3VHblNjEYj+fn59O7dm5UrV7JhwwbWrFlDs2bNCA4O/ofvu3WMfPrpp0ycOJH27ds7q2tEpHw7ceIEy5cvJyEhAXd3d+f2uVarlbCwMBo3bsy+ffv47rvvCAoKokaNGgB4e3tTu3ZtOnXqRMeOHUlLS8NoNDqfo6X8OHToEA899BD9+vVjzJgxzvuMFi1aEBMTw40bN5g1axb79+8nPz+fBx98kI8//hiTyUTdunUBCAgIoEaNGmRlZbFw4UKMRiMNGjRwZbdE7jgFdyIiIv8Dvz9k2Wg0UlBQQK9evbh27ZrzAWXXrl0sXbqUli1b0rVrV65fv86sWbMICgqiZs2af5ho12SqSMVkMpmIiooiMzOTpUuXcuzYMQYPHkxwcDDFxcVs2bKFFStWUK9ePWbOnImHhwc5OTl8/vnnxMbG0rZtW31/VACO/+MRI0awcOFC9u7dy1dffUVRURH33nsvVatW5cSJE1y7do3Q0FAyMzP57LPP2L17N6+++iphYWEaJ+WcI9x1LCRasWIFBw4coG/fvoSGhgLQoUMHvv/+e5YvX46npydxcXF4enqWqeZ0hHYTJ06kV69eruyS/Jvdukhs4cKFXL16lVdeeYUOHTpw4sQJPv30U9LS0pzj5VZ/L9idNGkSGRkZd7QPIuIaOTk59O3bl82bN7N27VquXr2Kt7c399xzj/P6ERgYSOPGjdmwYQOHDx/Gz8/P+WzsWFhyK1Xqli8HDx5k0KBB3HPPPUyfPt0Z2jmuPfHx8XTo0IFWrVrx448/snLlSlauXIm/vz9ZWVm0atXKeeZhQEAAsbGxWCwWMjIyCAwMdGXXRO44BXciIiL/A4WFhbi5uQH/NYkxc+ZMfv31V9566y26detGp06dqFOnDvv372fx4sVkZGTQrl07bt68yfTp06lRowZxcXEu7omI3A1sNhtBQUFs3bqVM2fOEBkZSWhoKHXq1CE2Nhaj0ciRI0cIDg7G19eXU6dOMXfuXA4ePMikSZMICQlxdRfkNrp1ov3UqVOsWrWKV199lYyMDIKDg5k1axZWq5U+ffrQsGFDDhw4wIYNGzhw4AAeHh5MnTrVOWkm5Zdj+8KioiK6d+/OTz/9RE5ODj///DM5OTnUr18fHx8f4Lfw7tSpU6xYsQKLxULNmjWdk2vz58/n1VdfVWhXDjm2oystLeXGjRvs3r2b2NhYOnfuTHR0NDVr1iQzM5MFCxbQtGnTMuHd3wvtNEZEKms8plgAACAASURBVBaLxcKPP/5IfHw8sbGx7N27lzlz5vDzzz9TXFxMfHw8AMHBwdSvX59169Zx+PBh/P39iY+PV0hXzu3fv5+HH36YpKQkzp8/T05ODi1btgRwVmY6dhmKjIykefPmdOnShcuXL3PhwgWOHDlC3bp1iYmJcX5mYGDgv6wEFymvFNyJiIj8Nx0/fpwnn3ySVq1a4evrW2a7IJPJRN++fTEYDBgMBipVqkT16tVZvXo1169fp1WrVtSuXZuIiAi6du2qhxeRCu73Z4PYbDbS09P56aef2Lt3LxaLhdTUVFJSUggICOD8+fN89NFHfPfdd1gsFqZNm6ZAppxznEMF8PPPP3Po0CHOnDnDQw89RNWqValevToeHh7MmjWL0tJSOnfuTPfu3WnZsiUPPfQQXbp0oVKlSi7uhdxuNpsNo9GIxWJh7dq1ZGVlMXXqVIYNG0ZOTg579+4lOzubmjVr4uXlBfwW3n3zzTdcuXKFnj17YjAY+Pzzz3nppZcUyJRDjmrMvLw8hgwZwtKlS9m4cSMJCQk0adIEk8lEaGgotWvXJjMzk4ULFzor7xzjCxTaiVRk7u7u5OfnM2/ePCZOnEj//v2JiIhg1apVbNq0iR07dmCxWAgJCSE6Opq0tDTWr1/Phg0bSExMpEqVKq7ugtwme/bs4dFHH2XAgAGMHz+e0NBQZs2axY0bN8qEd7f+6ebmhp+fH23atKFJkyYUFRWxfft22rVrh6enp/P3NGciFZWCOxERkf+mvXv3EhkZSatWrZyrxiwWCwsXLsTNzY1u3bphMBicW05FRUXxzTffcPHiRbp27Yq3tzcpKSnOCTbdiIpUTLcGMqdOncJqtZKQkEBiYiJJSUkcOnSIffv2UVpaSmpqKjVr1qRdu3ZkZGTQv39/unXrRuXKlV3cC7ndHNeIMWPG8O6777Jz504AunbtipeXFz4+PsTExODh4cH7779PUVERTZs2JSgoCC8vL9zd3V3ZfLlDDAYDJSUlvPTSS2zbtg0/Pz/nQqKWLVvy008/sWXLFq5fv14mvOvSpQudO3d2njO0evVq+vXrR48ePVzcI/l3clxvSktLefTRR7FaraSmpmIymdizZw9JSUlERUVhNBoJCQmhdu3aHDp0iLfffpsuXboQFBQE/Bbavfrqq0yYMEGhnUgFlZiYyN69e9m6dStdunShQYMG9OnTh5MnT7Jt2zYyMzNZuXIlAMnJyaSnp/PLL7/w8MMP67m3nLJarbz55pukpKTw3HPP4e/vT1xcHEFBQbz//vtkZ2c7w7vfcyxiDA4OxmKxsG7dOtLT0wkICLjDvRC5+yi4ExER+ZPy8vJwd3cnISGB+vXrU1RUxHPPPUd4eDiVK1fGZrMxb948wsPDqVmzZpkHk927dzsrIW79uR5eRCqmW0O78ePHM336dBYtWsTx48epV68e1apVIzExkUOHDrF//36sViuJiYmcPn2aatWq4evr6zz/Qcofu91e5ryxKVOmsH37dnr06EF4eDh79+4lNzeX1q1bAzjDO29vb95//30AGjVq5Krmi4sYjUY2bNhAZmYmBQUFZGRkOIPbVq1aOcO77OxsEhMT8fb2BnAuQDKbzTRv3pykpCRXdkNuA8cWqmfPnuXYsWOMGzeOzp0706ZNGw4dOsTixYtJTk6mUqVKzvDOUc39wAMPYDQa2bp1K88//7wq7USEgoICvv76a+rUqUNUVBSrV6/mgw8+YOTIkbRo0YLCwkI++eQT5s6dS2pqKk888YRzgYief8sfo9FI69atadOmjfP5xN3dnbi4OEJCQpg1a9Y/DO9uPXv53LlzLF26lKZNmxIdHX2nmi9y11JwJyIi8ifk5ubywQcfUFJS4ryJPH78OB9++CEHDhwgJSWFRo0aOc8e8vHxoVatWpSWlnLhwgXmzZtHamrqP1xpJiIVi2PSYsSIEezZs4eHH36YyMhIduzYweHDh0lLS3OGd4cPH2bjxo3MmzeP9evX07NnT+dZVFK+FBUVkZWVha+vr3OM7Nq1i+vXr3Pvvffy8MMP06hRI4KDg/nggw/KTIL4+PgQHR1NYGAgHTp00FkgFcCtE6COHQDatm3LzZs32b9/PxcuXKBBgwbO74tWrVpx4cIFPv/8c8LCwkhJSXF+luNzbp1Ak/Jl/PjxvPjii9y4cYNevXoRHByMj48PzZo1IzMzk88//7xMeBcREUGbNm0wGo3Y7XYCAwNJS0vjvvvuc3VXROQOclRE3bq9e2JiIl988QU3b97EYrHw7LPP8thjjzFs2DBSU1Pp1KkTiYmJBAcHl6m0U2hXPlmtVtzd3Z3HhTjuT9zd3YmNjf2X4R3AL7/8woIFC7h69SqPP/44fn5+d7gXIncfBXciIiJ/wrVr15g9ezY//PADfn5+TJ48mUGDBnHPPfdw8OBBvvrqK1q0aEHr1q05ffo0s2fP5sCBA6xZs4YlS5ZgMpl4++23nZMfmhgTkWXLlrFp0ybeeust7rvvPuLi4ti1axc//PADmZmZztWmSUlJeHh4EBgYyPjx44mKinJ10+U2KC0t5eGHHyYvL89ZLbdr1y4effRR9u3bR9OmTalTp84/nQTx9fWlbt26hIaGurIrcgc4KuQKCwuZPXs227ZtIzs7mxo1apCWlsa1a9fYtWsXFy5coF69es7wrmXLlnh5efHggw9qArWc+/39ZsOGDfnpp584evQocXFxxMTE4Obm5gzvDh48yKJFi4iNjaVq1apl3mswGPDy8qJq1aqu6IqIuJBjK+YBAwZQq1YtQkNDMZlM+Pr68u6777Jx40Yef/xxhgwZUmY3iJiYGFq2bKnjISoAo9HoHCM1a9YkPDzcGfT+2fDOz8+PoKAgBg0apLOZRf4/BXciIiL/xJkzZ/D19SUoKIioqCgWLFjA2rVryc3NZcCAASQmJuLl5cWhQ4dYt24dHTp0oGfPnlStWpVjx44REBBAamoqU6dOxWw2Y7FYnNvjiUjFc+tE6pYtW7h48SIPPfQQnp6erFq1inPnztGkSRO+/fZbDh8+TIsWLYiKiqJBgwa0bduW8PBwF/dAbofi4mLc3d3x8fGha9euuLu7k5OTQ3x8PBEREezbtw+z2Uzjxo2d59bFxsYSGhrKRx99xE8//UTbtm0BrWavCGw2GyaTiby8PHr37s2+ffs4efIky5cvx93dnQYNGtCiRQt+/vlnduzY8YfwLjU1VVuWlXOO+02LxUJBQQFWqxVfX19atmzJkSNHWL16NfHx8URFRWEymZzh3ebNmzl37hxdunRxdRdE5C5SWFjIZ599xq+//uoM47y9vfnqq69ITk5m4sSJ/3Q3CF1ryr/fj5Fb5zwc961hYWF88MEHnDt3jvbt2ztfdzwfVapUCX9/f1c0X+SupOBORETkH7h48SKDBw/G19eXmjVrUqVKFWbNmkVxcTE1atSgcuXKVKlShYSEBLy9vTl8+DDr1q2jfv36tGzZkgceeIBOnTrRtGlT5wSZ2Wx2dbdE5A6yWCxkZ2fz448/YjabsdvtuLu7Y7fb2bRpE1euXGHAgAHs37+fsWPHMnjwYB5//HEOHz7M1q1bWbFiBfXq1XNuXSblT15eHh06dKC0tJS+ffvi7u7Om2++yYIFC2jcuDGNGzfG19eXefPmUVhYSIMGDXB3d3dOgnh5ebF8+XK6devmPLNMyjdH9cPAgQPx8/Nj2rRp9OzZk3PnzrFo0aIy4d3FixfZtWsX3333HS1atHCeeQeaSC2vHNWY+fn5/Md//Acff/wxmZmZmM1mEhMT6dChA7t372bZsmXExMQ4wztvb2/S09PJyMjQzhAiUobZbCYrK4vdu3fToUMHfHx8CAgIwGg0smTJElq3bk1ERIR2lqnAbh0jHTt2xMfHp8z2qo77Vk9PT44ePUq3bt2cr2nMiPx9Cu5ERET+AZPJRExMDJ06daK4uBiz2UzlypVp0qQJGzdu5Pz584SHh5cJ744cOcKWLVuIi4v7w3Z2miATqVjy8/P561//yqxZs/joo49YtmwZJ06cwN/fn6pVq1K7dm0CAgKoUaMGQ4YMoWXLljz11FMA7NmzB4BatWrRvHlznVdWTuXl5dG1a1eio6N56qmn8PX1BWD79u0cPHiQM2fOkJKSQtOmTQkICGDWrFnk5uaWCe8SEhLo37+/xkgFs2/fPnbu3MmECROIj4+noKCAb7/9Fi8vL7788kvMZjMNGzakefPmnDp1iuLiYrp06aLJsXLObrc7qzF79OjhrNw9evQoBw4cIDAwkOTkZNLT09m1axfLly8nNjaWypUrYzab8fT0/MNZViJSsTiqse12O4Dz3LJatWoxa9YsioqKaN68OQBeXl7s3LmTq1evkpaWVmZxiJRf/2qMFBYW0rx58z9cR9zd3alVqxa9evXCaDTqWiPyLyi4ExER+Z0rV66QlZVFWFgY0dHRWCwWhgwZwoYNGxg+fDh16tQhKiqK5cuXc/78ecLCwpzhnZ+fH1999RU3b96kXbt2ru6K3CbXrl3DYrGUOcdB5FZ5eXn07NmT0tJS0tPT6dmzJ56ennz99dcsXryY4OBgGjZsSEJCApcvX2bx4sU8+uijREdHk5WVxapVq6hXrx7jxo0jLCzM1d2R26CgoICMjAxiY2N5+eWXiYyMdL7WvHlzcnNz2bp1698N7/Lz86lXr54zvPtn21NJ+XT06FGWLFlCz549CQ0N5f333+fo0aOMGTMGgLlz5+Lr60t8fDzp6emkp6drkqwCMBgMWCwWnn/+eTw9PZkyZQo9e/bEarWybds2Tp06hZ+fHzVr1iQ9PZ09e/bwwQcf0Lx58zJnCmmMiFRMjq2YCwoK+Otf/8r58+dJTk7Gzc3NGcp9/fXXpKamEhoaSkhICOfPn2fx4sV07dpVi4gqgD87RurWrUtISMgf3m82mzEYDNjtdi1sFvkXFNyJiIjcorS0lEWLFvH111/TrFkzzGYzJpOJM2fO8M0333Do0CHatm3r3Cpz+fLlXLx4keDgYIKDg4mMjKRRo0b0799fN6Ll1IULF+jcuTNubm7UqFFDE+byB44qqkqVKjF58mTuvfde4uPjad26NUlJSVy9epUFCxYQGhpKrVq1KC4uZvbs2RQVFWEwGJg/fz6ZmZmMHz9eEyDlVF5eHn379uXHH39k+PDh1K9fH/itWsaxzVSjRo3Izc3l66+/LhPeBQYG8s4772C1WmnatKkm2CuAW8M2x99zcnLIzs6mW7dubN68mcmTJ/Paa6+RlpZGbm4umzdvZufOnRQUFNCyZUtNklUgRUVFzJ07l/vvv5/mzZtz/fp1Nm7ciIeHB/n5+ezdu5fw8HDntpmOsxI1NkQqNqvVislkoqioiC+//JJVq1Zx/PhxPvvsM7y9vfH39ycxMZE5c+ZQvXp1atasCUBERATXr1+nb9+++h4p5/47YyQ6OpqaNWv+w+1Tdf8q8q8puBMREbmFyWQiMzOThQsXcvnyZZ588kkqVapEv379yMvLY/v27c7wLiEhwVl5t2/fPqZMmcL58+cZPHiw80w7PbyUPwEBARw+fJgVK1YQEBBA9erVFd6JU35+Pg888ABxcXG88sorVK5cGfhtUYDJZKJq1apUr16dS5cu8cknn5CSkkJSUhKBgYEsWLCAnTt3kp2dzcyZM4mNjXVxb+R2yMvL44EHHsDHx4fk5GQ+/vhjateuTbVq1ZzBiiOc+X14l5qaSlpaGuHh4bRt21bBbgVgsVgwmUxYLBby8vK4evUqAQEBVKpUiQYNGhAUFMRLL71Es2bNeOihh7BarezZswd/f38mT55M9+7dnfcimiQr/2w2G1euXOH999+nQ4cOJCYm8vHHH7Ny5UpmzJhBkyZN+PDDDzl58iS//vorrVq1olWrVrpvFangHFVURUVFZGRkEBwczAsvvEBaWhpXr15l6dKlrFy5kpiYGEwmE8uXLyc9PR1fX18CAgLo3LmzvkfKuf/NGBGR/xkFdyIiIr9Tv359rl69yrJlywgJCaF3795Uq1aNWrVqUVBQwI4dO8pU3kVHR1NcXExCQgKTJ0/GZDIBOtOuPHI8jHbq1IkzZ84wf/58goKCqFatGl5eXq5unriYxWLhww8/ZMuWLUyYMIHExETnayaTybniNDIyktDQUHbt2sWvv/5Ku3btqF27Nt27d6d9+/YMHDiQKlWquLAncrs4Qrtq1aoxY8YMUlJS+PXXX3nnnXeoXbs21atXx2az/d3wbufOnWRmZtKoUSMaN26s0K4CsFqtmM1m8vLyGDFiBB9++CHz589ny5YtBAYGUrVqVTw9PZk3bx4RERHUrVuXnJwcPvzwQ8LDw+nXrx9GoxGLxaJ7knLq95PkBoOBgIAAjEYjtWrV4tKlSzz//PO89tpr1K1bFx8fHxYvXkx4eDjZ2dllzj3UGBGpWG6thDIYDJSUlLBy5UrOnj3LwIEDiY+Pp1KlSnTs2JHk5GS8vb2ZOnUq+fn5XL58mfj4eOe9ruP7Q98j5cu/a4wo0BX5n1FwJyIicouSkhJMJhOLFy/mxo0bFBcXYzQaiYmJcW5r5wjvDh8+TNu2bYmLi6NJkyZ07NjRuSpeN6bl063bjNWoUYN169Zx9OhRPDw8iImJUeVdBZaXl8drr71GUlIS2dnZLFq0iBYtWhASEuJ86HVsVWcwGKhatSpXr15l06ZNZGRk4Ofnh6+vL2FhYQqByynH9piVK1dm8uTJhIWFERoaSmxsLFevXv2X4d3ly5c5efIkXbp00erlCsJoNFJUVETfvn0xm810796drl27cvLkSebPn4/dbqdJkyacPn2aL774gm+//ZaFCxdiMBiYMWMGRqMRu93uXFAk5YvFYsFsNlNcXMy3337L4cOHycrKokqVKtSvX5+IiAhWrVpFfn4+Y8eOpaSkhMOHD3Py5ElGjhzJ8OHDnWNE1ZgiFcuvv/7KoUOHCAkJwd3dHbvdzvPPP8+cOXPw9PTkySefxGQyOZ+No6KiSEtLo0OHDsBvZ8IfO3aMPn366Lm3nNIYEXE9BXciIiL815kxjsmttLQ0HnvsMW7cuMHKlSvJz88nJiaGsLAwatWqRWFhITt37mTbtm106tQJDw8P52fpxrT8cUxqGY1G8vPzuf/++zl8+LBzzKxfv57AwECio6MVulRAxcXF9OjRA3d3dwYOHEhSUhLfffcdc+fOpUWLFoSGhpYJ7xzbZnp6evLFF1/QrFkzqlat6upuyG1UXFxMnz59iIyM5K233iIkJMT5WmhoKHFxcf+y8q5Zs2Z06NCBsLAwF/ZE7oRbg5Tt27ezadMmJk6cSLt27YiNjcXDw4PVq1czcOBAvL29uf/++ykpKcFoNJKcnMzbb7+N2WzWCvdyzLFlmWNBwJYtW1i2bBk7duxg3bp1pKSkEBoaSmZmJitWrKBdu3ZcvHiR6dOn4+fnx7BhwxTaiVRgGzZs4J133iE1NRVPT08OHz7MAw88QGZmJsePH8fDw4P69euXuZbYbDaCg4OpV68eTZs2ZcmSJQQGBpbZYULKD40REdfTXbyIiFR4jgq5kpISTp48yTfffMO1a9cwm82MHTuWjIwMVq9ezdy5c7l06RK+vr4MHTqU9u3b4+fnp5Xs5diNGzeAsucCvfnmm/j5+TFx4kQWLlzI6tWreeCBB/j4449ZsWIFOTk5rmquuMjp06f59ddfad++PUFBQQQEBDBmzBiqVavGoEGD+P77753VdgBubm4AnD17lsjISOLi4lzZfLkDTp8+zcWLF+nSpQv+/v6cPXsWq9XqfD0hIYGnn36ae++9l2HDhrFt2zbnBIjjT0DbY5ZjV69e5fPPP6egoKDMNef8+fNkZ2dTq1YtjEYjK1asYNSoUYwaNYrAwEBeeOEF8vPzGTVqFG+88Qbjxo3DbDY7z8aT8sloNFJaWsrTTz+Nv78/r7/+Ohs3buS9997jhx9+YNy4cVy7do0WLVqQmJhIt27dePrppykqKnJWYzoWBYhIxdO2bVt8fX156qmn6NSpEwsXLiQsLIwpU6ZQq1Ytli5dyooVK4Dftnt33I/Y7XbMZjORkZF4enpy4cIFF/dEbheNERHXU8WdiIhUaLeuWB4wYABffvklc+fOZdOmTXz//fc0a9aM1q1bk52dzerVq8nOzsZqtXLgwAH69OlD9+7dNflRTh0/fpxnn32W2rVrExoa6vz5559/7jz70Gw2YzAYaNu2LWfPnmXu3LkEBgbqzLsKIj8/n+3bt+Pj48OePXv46aefmD9/PidOnKBv377Oyrv58+fTvHnzMuPo+vXrfPbZZ/j5+dG5c2fc3d1d2BO5XW4dI3v37uWnn37ik08+4ciRI7Rt2xY3NzdnxcvvK+9SUlKIjo4us0WvlF/79+9n9uzZ+Pn5ER4ezjfffEN0dDR5eXmsX7+eJk2acOjQIZ599llGjRrFY489RkFBAZMnT6Z+/fpER0eX+TyNmfKluLgYk8lUZsvlixcvsmDBAh5++GFatWqFn58f+/btY+PGjTz55JPY7Xbi4uJIT0+ndu3atGzZkrFjxyrYFRE8PT1p0qQJs2bNwm6388gjjxAdHU1QUBBNmzZly5YtHDhwAE9PT5KSkjAYDM5gBuDEiRMsW7aMoKAg7r33XgA9C5czGiMirqfgTkREKjTHIcuDBg3C3d2dMWPG0K9fP1JSUpg+fTpnz56lbdu2tGzZkqysLFauXMnq1as5duwYjz32mHNVmSbIyp/Tp0/j5+dHx44dywSzK1as4MqVK/Tp08c5fkwmE/feey/r16/n4MGDlJSUkJSUVGYLVSlfioqK6NSpE9evX2fQoEEEBQWxZMkSrl27Rv/+/albt66zms4R3jnOvMvLy+PNN99k+/btvPHGG0RGRrq6O3Ib/KMxcvXqVR5++GFSU1MBykzEO8K7rKws3n77berVq6dtVCuIwMBA1q9fz9atW3nvvfc4d+4cPXv2pLS0lC1btrBt2za++OILRo0axdChQ7Hb7c5zzR588EFVY5ZjWVlZfPjhh4SFhZU5NzUrK4t58+Zx7733kpCQwOrVqxkzZgyjRo2iU6dOTJgwgaysLNq0aUN8fDzx8fEYjUasVitms9nV3RIRFzt27BhXr17Fy8uLzZs3U6NGDSIiIggJCaFZs2Zs3LiRAwcO4O3tTWJiovNZyGq18u2333Lx4kWeeeYZQkJCFMiUUxojIq6lWUYREanwTp06xfXr1xk6dChpaWmkpqZSWloKQOvWrTl27BgAzz33HK+//jqvvvoqmzdvVqVdOde0aVOGDBlCcXExzz33HF9++SUAPXv2dJ4TAzgrpQoLC/H09CQnJ4fvv/8eX19fl7Vdbr+VK1dSUFDAwIEDAdiyZQulpaWEhoaybt06Dh06hMFgIDU1lbFjx1KlShUGDhzIkSNHmDZtGitXruSjjz7SNpnl2D8bI2vWrOHIkSPO3711K9WEhASGDRtG586diYiIcEXT5Q6z2WwEBgYydepUrly5gpubGy1btqS0tJS4uDiee+45zpw5Q7Vq1ahTpw6Ac0FATEwM1atXd3EP5Hby9vZm586d/PWvf+XkyZM899xznDx5koCAALy8vPjuu+/48ssvndWYQ4cOxWQycenSJYqKiv7weaq0E6mYLBYL8Ns5qlarlbS0NGbPns2cOXOoVKkS48eP55tvvqGwsJCoqCimT5+Oh4cHkydPZtu2bc7PMZlMpKenM2vWLN3HljMaIyJ3F1XciYhIhXf+/HkWLFhAjx49iIqKYvXq1Tz33HOMHj2a1q1b87e//Q2TyURSUhLR0dFlVixr8qP8O3bsGHPmzOHMmTOEh4fTsmVLfvjhB7Zt28b169dp0qQJVquVy5cvs23bNiZNmsSgQYOc1ZgKdsunnJwcFi9eTNWqVVmxYgX/j737jq6q2tu+/83OTkJIJSEVQkghEFooCUmogQDSQQRBUA8ozXMUD1JExaMiKhYQ6YSOoAeRptSACCShSRFCLwGBSCeU9LbfP3z3foL6HDnPfWPizvUZwyGDsPeYa+Q31lxrXrNkZWUxbdo0qlatSlJSEkeOHCEwMBB/f3/LyrsTJ04wdepUTp48yb///W9q165d2pchj9Af1cjhw4cJCgqyrLgsufLOy8uLuLg4vLy8Svkq5M9g/t0fPnyY27dvU7FiRX788UfLs0dYWBhhYWFs376dLVu2MHfuXLZv346dnR0LFy7EaDRqIpEVs7OzIyIigq+++orFixdTsWJFnnzySTw8PHBycmLGjBkkJiby6quvMmjQIAAuXbrE1q1badmyJXXr1i3lKxCR0mZeaZuVlcWnn37KqlWryMrKsmzPHB8fz86dO/nmm2+oU6cO3t7eeHh40KZNG27evMmzzz5r2WHGfIaZtnm3LqoRkbJHwZ2IWJWioiJtWSj/0e8NbGVkZLB69WoaN27M+fPnGTlyJK+88gpDhgzhzp07zJ49m0aNGllmuZup1soHX19fwsLCSElJYc+ePYSGhvLEE09w9uxZ1q1bx9q1a9m6dSvLly8nNzeXESNGPHBAt1if4uJiqlWrhqOjI1OmTOHMmTO8//77hIWFER4eToUKFUhOTiY1NfWB8K5q1aoUFBTw0UcfUatWrdK+DHmEHrZGjhw58pvwzkwTQ6xfyedWGxsbqlWrRteuXWnXrh07d+4kOTn5gfAuJibGcp5dly5dGDVqlM4rs3KFhYUYDAZcXFxYsmQJ9+7dw93dnfj4eNzc3PD398fOzo5Dhw4RFBSEwWDg6NGjTJ48GVtbW9566y09i4gIBoOB7OxsevXqxalTp8jIyGDVqlXcvn2b6tWrU7VqVdq2bcuOHTtYuXIlRUVFTJo0icDAQMuERHOfpUki1kk1IlL2KLgTEatifjH95JNPcHNzw9vbu5RbJGWJeWCrsLCQ27dvk5mZiZOTNX5bugAAIABJREFUE97e3ly/fp1p06axadMmxo4dy/PPPw/8MmM5KSmJdu3aERISUspXIH828+qXgIAAqlatyu7du0lKSiI8PJynn36a8PBwbt68ibu7O/Xr1+fTTz/FaDRqNaaVK3ne4alTpygoKMDX15cGDRpga2tL7dq1Hwjvqlevjp+fH1WqVKFVq1ba/rAc+G9q5MiRIwQHB+usw3KmsLAQo9FIdnY2X331Fd988w3nzp2jsLCQkJAQ4uPjSU5OJjk5GXt7e2rVqoWPjw9BQUE0atSI4OBgnVdWDhgMBrKyspg8eTL9+vWjc+fOJCYmkpSURGxsLL6+voSEhODv78/SpUvZtGkTqamp+Pn5sWDBAsszicI7kfKp5KTVbdu2ce7cOaZNm8Y//vEPqlatysKFC8nIyCAwMJCqVavSoUMHdu3axffff4+bmxsvv/yy5Z1G9xHrpBoRKbsU3ImI1UlLS2PkyJHEx8dTvXr10m6OlBHFxcXY2tqSmZnJ8OHDmT9/Pl999RXbtm2jfv361K1bl9zcXE6dOkXHjh0BOHPmDB9++CEVK1Zk1KhRehAth0puXVetWjVLeJecnIy/vz+tWrXiscceo23btsTGxmoLVStnfrEtLi4mLy+PnJwcnnvuOXx9fZk+fTq2trbUr18fo9FoCWb27NlDcnIy4eHh+Pr6aoDdyv2/1khSUhK1a9dWqFtOmEwmbG1tycrKolevXhw/fpyLFy9y+vRplixZQlFRES1btiQ+Pp6kpCSSkpJIS0tj2rRpuLi4EBYWZvkuPZtYvwMHDjB+/Hi6d+9O69atqV+/PmvXrmXnzp3Exsbi5+dHvXr16Nq1q+W//v37Wyarqd+xPtqKXR6WjY0N+fn5DBgwgIsXLxIYGEjXrl0BqFWrFt7e3ixatMgSzPj7+9OzZ09iY2MZNmyYZVW3+hrrpRoRKbsU3InIX96vZ5Hm5+ezdu1a6tWrR506dUqxZVJWmLcszM/PZ/DgweTk5NC9e3dCQ0M5cuQIy5cvp0GDBsTFxVFYWEhCQgLffPMN+/btw93dnUWLFmnGcjn2e+Hd3r172b9/Pw4ODtSsWfOBf68asU4lA9msrCzy8vJo0KABvr6+NGnShKKiImbOnImdnd0DwYzJZOL06dP07NkTFxeXUr4KeZRUI/KwzOHu22+/TU5ODh999BEvvfQSvXv35ocffmD16tW0bduWKlWq0KZNGw4cOMDp06extbVl7NixmhxSzlSoUIGTJ09y584dy9mX9evX55tvvmHXrl00btyY7Oxsrl+/Tnh4OO7u7pYaU61Yn5LvI/n5+fodyx+6e/cuO3fuZNu2bYSGhtKqVSvgl74oPDwcb29vFi9ezL179/Dx8cHHxwdvb2/dR8oR1YhI2aTgTv4SNKNM/hPzi8uFCxdwd3fH2dmZQ4cOcfnyZdq3b//AoLuUT+ZZZDt37mT37t28+uqrdO7cmSZNmtCuXTt+/PFH1qxZQ+/evenRowexsbG0b9+eTp06MWTIEMssMs1YLr9+L7xbt24dNjY2xMfHl3bz5BEr+UI6YcIEZs6cyaeffsqWLVvIzs4mNDSUli1bUlRUxIwZM7CzsyMiIgKj0Ui9evXo0KEDlStXLuWrkEdJNSL/rYKCAubPn0/9+vXp0aMHABs3bmTJkiWMGzeOwsJC0tPTqVmzJu3bt6dNmzYMGjRIM9ut3O/9bp2cnMjKyiIhIYEOHTrg6emJj48PERERrF69mlWrVvHtt99y+fJlOnToYHnn0buP9SnZ10yePJl58+Zx8uRJHB0d8ff3L+XWSVnx6zPdHR0dadKkCTdv3mTjxo0EBwc/sHI7PDwcHx8fZs6ciY+PD02aNLH8TPcR66QaEflrUHAnZdKvV7WoI5A/8uqrr/L++++zdetWLly4wJkzZ7CxsaFTp04Yjcbf1JCCvPLFZDIxYsQIli9fTm5uLsOHD8fe3p6ioiJcXFyIiYlh3bp1pKam0qVLF3x9falatSo+Pj6aRSYWJcO7gIAAGjZsSJ8+fTAYDLqnWDnz73bs2LGkpKTQoUMHunXrxoULF9i+fTsXLlwgOjqali1bYjKZmDNnDvn5+TRu3Bij0YiDg0MpX4E8aqoR+W9lZWXx9ddfExAQQIsWLfj2228ZNWoUw4cPp3///ixcuJDk5GTatWuHo6Mjrq6ueiYpBwwGA9nZ2axevRpbW1tLoF+nTh2Sk5NJS0ujZcuW2NnZ4efnR0xMDLdu3cLPz4+PPvpItWHlzH3NuHHj+Oabb/Dw8GDHjh2cOXMGNzc3goODS7mFUtrMZ7rn5+dz9uxZTp8+DYC/vz/NmjXj7NmzLFy4kJCQkAfObw8PD6devXr07NlTE0OsnGpE5K9DwZ2UOSW3GVqxYgXr1q1j7969ZGVlPdBpiJjl5eVRXFxMjRo1uHPnDhcuXODIkSNcunSJAwcO8MUXX3Dr1i3u3btHVlaWJYwR61ZyFpmNjQ0hISHs2bOH8+fP4+PjQ/369TEYDBQXF+Pi4sLVq1dJTU2lW7du2NvbP/Bdqhfr9OuZhg8TvpnDOwBfX1+FduVIamoqCxYs4PXXX+fJJ5+kTp06dOvWjVu3brFjxw5yc3Np3LgxERERFBQUsGLFCp588kkcHR1Lu+nyJ1GNyP/N762icnBwYN++fezatYuioiLGjx/Pyy+/zLBhw7C1teXrr7/G1taWnj17PvA59TfWb968eUycOJEdO3aQmZmJp6cnnp6eZGVl8d133/HYY4/h7OyMyWSicuXKtG3blnbt2mEwGLQa00qVnNicl5fHihUrGDVqFCNGjCA2NpZNmzZx9OhR3N3dFd6VYyXPdH/mmWdYs2YNS5Ys4fvvv+fAgQN06NCB5s2bc/78eebNm/ebYKZ69eq6j1g51YjIX4uCOylzzDf/4cOHs27dOjIyMkhPT2fJkiX89NNPBAQEaCuhcu7XKzKNRiM1atQgMjKSrl270qVLF5o3b8769esJCAigevXqbNq0ieXLl7Nz5066dOmCk5NTKV6BPGrmWWQFBQXcvXuXnJwcAgICiI6OZt++fVy6dAk/Pz8CAgIsA2CHDh0iPT2dxx9/HDs7u1K+AnnUSs40TE9Pp6Cg4L+6L5jrZu/eveTl5eHh4fGomiql5NfBbmpqKmvWrOHVV1/FycmJ/Px8jEYjMTExHDx4kKSkJHr16oWzszMNGjSgb9++qgsrpxqRh1FUVITRaCQrK4tPP/2UlJQUrly5Qp06dYiIiGD9+vWsW7eOoUOH8uKLLwJw/vx5Vq1aRaNGjYiNjS3lK5BH7dfvNlFRUURFRVGxYkXLysvz58/Tt29flixZQl5eHs2aNbOswDTfh0wmk1bcWaGSE5s3btxIWloaKSkpPP3007i4uODj40NoaChbtmzh2LFjCu/KMfPxEM8//zz29vaMHDmSQYMGERgYSEJCAqmpqXTv3p1mzZpx/vx5Fi1ahJ+fn87rLkdUIyJ/LQrupExavnw533zzDR9//DEvvPAC/fr1w8HBgYULF9KiRQsCAgL0UlJOlXxxWblyJYmJiSQlJeHg4ICnpydGoxGj0YirqytJSUm0bNmSMWPG0LFjR3r37k3v3r3x8/Mr5auQR8k8aJGZmcngwYNZtGgRK1asoEaNGtSrV48mTZqwZs0ajh49SlFREYGBgaSmprJo0SJCQkLo1KlTaV+CPGIla+TZZ59l8eLFLFq0CF9fX4KCgv5j/1Jydd2iRYsYNWoUXbp00X3FyhQXF1teSK9cuYKLiws3b95k9erVNGzYkODgYEvwa2dnR1hYGLNnzyYqKorAwEDs7Oy0isrKqUbkYRkMBnJycujVqxfHjx/n1KlTbNmyhaysLNq3b0+1atU4duwYx48fp6CggB07djBv3jyKi4uZNGmSVnZbOfMZyrm5uWzevJn9+/cTFBREcHAwsbGxdOrUicLCQjZv3szatWtxdnYmLS2NmJgYPDw8HqgL1Yj1MZlMlr7mxRdfZNGiRaxfv54bN24QGBhIvXr1AKhSpQphYWFs2bKFU6dOUaFCBWrUqFGaTZdScuLECVasWMErr7xC8+bNqVy5MidOnGD79u08//zzmEwmgoKCiI2N5cCBA5w4ccJyzqqUD6oRkb8OBXdSJq1du5acnBz+/ve/U6FCBc6fP8+4cePo3LkzjRs3Zu/evYSHh5d2M6UUlFyRuX79ei5dukR6ejoJCQlkZGTg6+uLl5cX9vb2bN68mXPnzvHEE0/g5OSEh4cHbm5upXwF8qjZ2NhQWFjIkCFDKCoqokmTJhQUFDBnzhyCg4Np0qQJ0dHRrF27lq+//ppVq1Zx9OhRXF1dmTRpEkajUQNkVs48Q/3ll1+muLiYrl274ubmxuzZs3FzcyM8PByj0fibz5Wsi88//5yPP/6Yf/3rX7Rr1+7PvgR5BMyrFkoGMiNHjuTChQs0bdqU/Px8duzYwdWrV6lZsyaenp7Y2tpiMpk4fPgwe/fu5W9/+xuVKlUq5SuRR0U1Iv+NkquofvzxR86dO8fUqVPp0qULrq6uzJ49m4KCAvr06UNcXBynTp1i3759XL9+nVq1ajFz5kyMRuNvVmOJ9TCvxszMzOSpp57i22+/ZcuWLWzevJmgoCC8vb3x9PSkcePGPPXUU2RmZnL37l2OHDlCSEgI9evX1zOrFSvZ12zfvp1NmzYxceJE2rZty7Vr19i9ezcuLi7UqlUL+OV8qrCwMFasWMG1a9do27btb7b/F+tj7iPMzyg//fQTy5cvp1evXlSpUoVvv/2WsWPHMmLECOLi4hg/fjx2dnZERETQqlUr+vTpo3uIlVONiPx1/XZUSuRPVnIFFfzygGoymcjOzsbe3p5Lly7Rp08fmjZtyhtvvMHmzZv55JNPiIyMfGCbOyk/li5dysGDB5kyZQqhoaG4u7szefJkEhISaNmyJYGBgVSsWJHIyEiWL1/OvXv3cHV1Le1myyNmnrEMv9xHqlSpQv/+/albty5Xrlzhk08+YfTo0djY2PDYY48xbdo0RowYwc2bN2nTpg1DhgwBID8/Xy+5VspcI+bZy76+vnTo0IGYmBgKCwupXLkyH374ITY2NvTp0wcHBwfLZ38d2r3//vuMHz+e3r17l9blyP+ivLw8BgwYwN///ndatGhhebE9ceKEZaJQUFAQw4cPZ9SoUTg4ONC3b19iYmK4fPky3333HZ6enri7u5fylcijohqR/4a5v8nLy+PMmTPs2LEDGxsbvL29sbOz48knn8RkMjFjxgwKCgoYM2YM06ZN4/bt2zg7O1ueQ0o+24j1sbW1JTc3l2eeeQYPDw/Gjh2Lh4cHw4YNY9KkSYwcOZLo6Gjs7OwwGo28+OKLZGRksHTpUubPn0/btm3x8fEp7cuQR8Qc2i1btoyUlBTCw8OJjIzEzs6OypUrM23aNObNmwdgWQ0TGRnJlClTqFy5so6GKCfM95EpU6bw3HPP4eHhgclk4ty5c9y/f5/Ro0czYsQIhgwZwpUrVzhx4gQZGRkAlu26S4bEYn1UIyJ/XXoLkFJnDu1WrFhB7969MRgMBAYG8tVXX7Fo0SJmzZpF06ZNmTBhAhUrVuTGjRs4ODj8ZmsQsT7Z2dmsX7/eMjBuHji/cOECoaGhNGjQAKPRyPnz51m5ciW9evWiUqVKbN++nU6dOlGlShUyMzMpKioq5SuRR808Yzk7O5sFCxZw5coVtm/fTr9+/QDw8/NjxIgRwC+rIwAee+wxJk2axEsvvcSGDRuoWrUqnTp1UmhnpUqeMTRlyhSuXr3Knj17aN++PfDLWZn//Oc/AZg4cSKAJbwrGdotWbKEDz74QKGdlUlLSyM3N5dRo0YxdepUoqOjycvLo6ioiIKCAsu/69KlC4WFhXz88cckJyfj6uqKg4MDGRkZLFq0SOeVWTHViPw3zKuo+vXrx7Vr18jPzyc8PJy8vDzs7Ozw9vamb9++AMyYMQOTycSrr776QH2YTCaFduXAd999h8FgYNy4cYSEhHDu3Dlq1arFgQMHeOedd3j77bdp0qQJdnZ2FBcXU6lSJVq1asXq1au5fPmygjsrl5aWxqZNmzh9+jTNmze3nMPdsGFDXnrpJUt4ZzAY6Natm+VnUr7s37+fRYsW0apVK2JjY+nbty/vvPMOxcXFvP3225b+5saNG3h5eeHr6/vA5xXIWD/ViMhfk7bKlDIhJSWFf/7zn/z000+0b9+eiIgIDh48yBdffEGDBg34+OOPcXFxISMjg9WrV1OhQgU6duyoAXYrt3z5ct555x2MRiORkZGWgfNNmzZx/vx5nn76adLT0+nVqxcxMTG8++677Nq1i0mTJtG9e3c8PT155pln8PLyKuUrkUfNYDCQnZ1N79692bt3Lzdv3uTatWtUq1aN8PBw7OzscHV1pU6dOty8eZOEhAR8fX2JjY2ladOmJCYm8t133+Hj46PzIKyQeYVdTk4OvXv35tixYxQWFpKeno6DgwO1a9fG2dkZo9FIkyZNyMvLY9asWRiNRiIiIiwDp0uXLtVKOyvl5eVFWFgYFy5cYP78+dStW5egoCDWrl1L1apViYmJAX6ppfDwcBo1akRYWBgVK1akVatWjB49mpCQkFK+CnmUVCPyMAoLCy2DW6NHj8ZoNDJw4ECqVavGli1buH37Nq1btwbAycmJkJAQnJycmDNnDu7u7tSvX9/yXZqgaJ3MNWKeFLR582b27dtnmVg2d+5crly5wvjx40lOTiY5OZmAgADc3d0t52KeOXOGL7/8ksjISB0fYWV+vTVupUqVqFatGunp6WzduhVvb2/q1KkD/DIxMTAwkBMnTrB27Vp8fHwICwsrraZLKapWrRppaWls2bKFzp07ExgYyP379zl79izNmzfHycmJY8eOMWnSJCpUqMDo0aMVxJQzqhGRvyYFd1ImuLq6UrlyZZYuXcrZs2d57LHHiImJIS0tjYMHD1JQUEBKSgorV65k165dTJ48GX9//9Jutjxi3t7eODg4MGPGDACaNGkCwMWLF0lKSsJoNDJ69GiaNWvGu+++S8WKFUlKSuLUqVMMHDgQT09PnJ2dS/MS5BEzb1VmMpnYvHkzly9fZtq0aXTu3BmABQsW4O/vT40aNbC1tbWEd2fOnCE1NZVu3brh4eFBZGQk+/bto3///joH0cqYB8aKi4vZunUrly5dYurUqfTv3x8nJyfmz5+Po6MjoaGhVKxYEVtbW6Kiorhx4waHDx+md+/e2NjYkJCQwIcffsiECRMU2lkZ8yCZn58f/v7+XLx4kUWLFtGsWTMuXbrEhg0bOHv2LOfOncNkMuHp6UlAQAD16tWjdevWRERE6L5h5VQj8rDME4nWrFlDVlYWPXr0oGPHjkRERFC5cmUSEhK4ceMGcXFxAFSsWJHAwEBCQ0Pp1auXBsnKAfNEoo8//pjY2FiKi4s5c+YM3bp1Y+XKlXz66ae8++67NGnShOvXr5OYmEhSUhIuLi40aNCA69evs379ei5evMhLL72ke4sVKXmEyI8//sjp06epUqUKVatWJSgoiKtXr7Jp0yZcXV0tga2vry/+/v5cu3aNnj17qh7KgV+Hu+bJAEVFRWzfvp2IiAjq1KmDv78/FSpUYPbs2axdu5a9e/fi4eHBokWLdH6qlVONiFgPG5PJZCrtRkj58n/bG/nevXusWrWKjz/+mK5du1q2KpswYQInTpzg/v37hIWFMXToUK2IKQfMLy4ZGRksX76cKVOmMHbsWAYMGEBxcTG9evXi+PHjNG3alKlTp+Ls7Mzdu3d5++23uX//PlOnTqVixYqlfRnyJ8jPz+e5556jQoUKVK1aFfN8lOzsbCZOnMiqVat466236N69u2WV7o0bN/D09MRgMFBQUICdnZ3l/2J9CgoKePrppzEajfj6+jJp0iTLz6ZPn8706dN54YUX6N+/P5UrV7Z8xmg0WlY8zJo1CxcXF55++ulSuQZ5NH59zi7A6dOneffddzl69KjlfJiwsDAOHjxIYWEhTk5O+Pr60qdPH8t2vGK9VCPy39q0aZNl6+WpU6datmTOzMxk7dq1fPDBBzz55JP861//+s1ndaZd+XD69Gm6devGe++9xxNPPMG5c+cICAigW7duPP744wwdOhSTycTEiRO5efMm4eHhDBw40HIvOnfuHM7Oztom04qU7GvGjBnDjh07uHv3LsHBwUyaNInw8HCOHDnCjBkzOHXqFMOHD6dnz56Wz+fl5T1wNrNYt5ycHD777DOeeuopqlSpYuk3Hn/8cby8vEhISLD823PnznHjxg1cXFwIDw/HYDCorykHVCMi1kEr7uRPZx4EnTZtGhcvXrRs9eDg4EBwcDAeHh4sXLiQS5cu0bZtW1q2bEl8fDz9+/cnPj4eb2/v0my+/AmKi4stLy4JCQn89NNPnD59muTkZAwGA02aNCEuLo59+/Zx8eJF7t69S2pqKsuWLWP37t1Mnjz5N3tyi/UyGAzs2bOHrVu34u7uTvPmzXF0dMTOzo7o6Ghu3bpFQkICfn5+hISEYDQacXJysqzCMj+QGgwGbUtlpWxtbTl79iwbN27EZDLRokULy4xk80remTNn4ujoSPXq1XF2dsbW1hYbGxvLTMOoqKgHtjCTv76Sg2Sff/45GzZsYNu2bcTExNCgQQNu3LjBsWPHePrpp5k4cSJPPPEEMTExuLm5WbZd9fT0LOWrkEdJNSIPo+Q5qABVqlShevXq7Nq1i8LCQmJiYqhQoQL29vYEBwfj5eXFvHnzOHfunCXUM9PMduv061ULnp6eZGVlkZiYSGRkJCEhIeTm5rJs2TLq1q1L48aNuXjxIosXLyY6OprBgwdbJpvZ2tri4eGhXUWsjLk+Ro4cyYEDByzB3M6dO9mxYwcRERHUrVuXwMBA0tLSSExMxN7enrp16wJogL2cMPc3y5Yt4/PPP+eLL77g1q1bAAQGBuLn58fmzZupXLkyISEhFBcX4+npSdWqVfH29ra8//56QpJYD9WIiHVRcCd/CpPJ9MBKu9TUVKZPn86JEydwcXGx7MXu4OBAYGAgAF988QW3b9+mVatWVKhQAVtbW3Ue5YR58GPMmDFs27aNNm3a0KZNGypUqMAXX3xBcXEx8fHx9OjRg/Pnz3PixAlSU1Px8vLiww8/1N7+Vs68PaaZjY0N7du35/bt26xfvx4vLy9CQ0Oxt7e3hHcZGRl89tlnREREEBQU9MBnf+/P8tf2e9t6tGjRAqPRyIYNG7CxsaFGjRqWQa8mTZpgMBiYPn06ISEhlgkloEFUa2b+3b788sts3bqVe/fuceHCBSpUqECHDh2oXLky169fZ/PmzdSrV49atWoRGBhI06ZN6dChgyYSlQOqEfkjhYWF2NraUlxcTHZ2Nnl5edjZ2VG3bl1cXV1ZvHgxd+/eJSoqCnt7e0t4Zz5LpkePHnr+KAcMBgO5ubns3r3b8q5rNBpJTEzEw8OD+vXrk5GRwY4dOzh16hR79+5l+fLlGAwGJkyYYLkX6V3YuqWkpLB27VrGjx/PY489hrOzMydPnuTIkSMkJSXRuHFjS3h36NAh9u/fT9euXbXSrhwwv/+a+4u6devSp08fbGxsSE5OZvny5fz888+4u7tz8eJFHB0dadKkyW8mloDeea2VakTEOim4k0eqZOdhfuEYM2YMlSpVolevXuzZs4e9e/fi7OxsCVscHR3x8/Nj3bp17N+/n/T0dNq2bVualyGl4NKlSyxcuJAhQ4bQr18/IiIiiI6OxtvbmylTpgDQtGlT2rZtS6dOnejduzcdO3bUIJmVMw+Q5efnc/DgQY4cOcL169epVq0arVq14saNG8yfPx8vLy+Cg4Mt4V1UVBSurq50795dQYyVM2/rkZ2dzZQpU/j66685fPgwzZs3JzIykuLiYubNm4eNjQ2hoaGW8C4qKgo/Pz969OihGrFSvw79Ab766ivWrl3LlClTGDhwIE899RSRkZEA+Pv7ExAQwMWLF1m6dCk1a9a0DLhq8NQ6qUbkv1FUVITRaCQzM5NXX32VJUuWsHDhQks4065dOypVqsScOXO4c+cOkZGRlvAuPDyc3r17YzAYfrfuxLqYTCaeeeYZ5s6dy82bN2nSpAlBQUHcv3+fGTNm0L17d3x9fQkKCuL06dNkZGQQGBjIrFmzdM5QObJ79262b9/OSy+9hKOjIxs3bmT//v288MILHD16lMTERBo2bEjdunWJiIjgiSee0KrucsD8/puXl8fhw4c5evQoOTk5BAYGEh0dTdOmTalduzarV6/m8uXLHDhwgEOHDtG6dWu8vLxKu/nyJ1CNiFgvBXfyyGRmZvLee+/h7e1tCVPWrVvHl19+SdOmTWnZsiXVq1dn37597Nu3DycnJ2rWrAnA5cuXuXjxImPGjKF9+/a4u7uX5qVIKbh58yazZs2iXbt2lsO3HR0dCQ0NJT8/n4SEBMsh7XZ2dtjb22uQzMqZt2zIzMykf//+bNy4ka+++orNmzezc+dOGjZsSM+ePbl69Spz5859ILyzt7enUaNGlv3aNfhhnUwmE7a2tmRlZdGrVy+uXLlCYWEhnp6ehISE4OLiQnR0NMXFxcydOxeDwfBAeFe7dm3ViJXKzMzk7bffJiQkhEqVKln+ftOmTdy+fZuhQ4fi6Oho2Wrq/v37fPHFF1SpUoXWrVuzb98+NmzYQJ8+fR44+1Csh2pE/lsGg4GcnBz69u1LTk4OrVq1IiAggFOnTjFv3jxq1qxJt27dcHNzIyEhgXv37tG4cWPLpCIbGxtMJpP6m3LAxsYGW1tb9u3bx8GDB/n+++9xdHSkefPmXLt2jd27d9O8eXMCAwNp06YNvXv3pn379tja2uqcoXLAHMzeunWLY8eO0bVrV86dO8fw4cN59tln6devH5mZmWzYsIFNmzZhZ2dHmzZtcHFxKe2myyNW8v23X780ryX/AAAgAElEQVR+bN68meXLl/P999+zfft2IiMjqVatGjVr1qRz586WcbNTp05Z3ntAK6ismWpExLopuJNHIjMzk06dOmFjY0OfPn2oUKECb775Jvv27aNRo0YMHToU+GWmclBQEPv27WPPnj2YTCbs7e358ssvuXHjBkOHDtUKqnLg92Ya5+XlkZKSQlFREQ0bNsTR0REAe3t7HB0dWbVqFcnJydjb29O4cePSaLb8yWxsbMjPz2fYsGHY29vz2muvMWTIEOrUqUNycjIbN26kRYsW9OjRg/T0dBYtWoSjoyO1a9fGzs7O8j0aILNe5j35x44dS1FREXPnzqV79+7Ex8eTm5vL8ePHLYMdBoOB2bNnk52dTUREhOUeA6oRa3TkyBESExPp16/fA/eDzZs3k5qaypAhQ4D/0x85ODgwc+ZMkpKSGDRoEGFhYfztb3/Dw8NDL7ZWSjUi/42SZ8icO3eOyZMn0759e1q0aEFUVBS3bt1ixowZtGrVivj4eDw9PZk2bRpubm40atTI8j2qFetkDmJKvuNUrFiRW7duERsbi62tLSkpKezduxdHR0dycnKoXr06vr6+2NjYWMJ/84QksS6/XkFp3qHIw8ODGjVqEBYWxiuvvEJERASvvvoqAPv27aO4uJhGjRrRvXt3PDw8Sqv58ieysbGhoKCAoUOHYmdnx5tvvsmgQYNo3rw5M2fO5MSJE8TFxeHo6IijoyM1atSgQ4cOFBYWsmbNGvr06aOtVK2cakTEumlkSv7XZWVl0b17d2rWrMmHH36Im5sb8MtqqV27dvHDDz9w8eJFy79v3LgxY8aMwcvLiw8++IABAwawbds2xo0bp1lk5UDJF5fTp09z5MgRAPz8/Gjbti2rVq0iMTGRu3fvWj5jMBiIjo5m7NixxMfHl0q7pXScO3eO69evM3jwYMs2Q507d+azzz6jqKiI119/HYD33nuPuLg4du7cSYUKFUq51fKoFRUVWf5sMBi4fv06rVu3xt3dnatXrzJt2jTatWvHwIED6dmzJ3v27OHvf/87Tz/9NKdOnXpgdY1YH5PJRGRkJIsXL8bR0ZFFixZx6NAhAOLi4igsLGTatGkUFBRgMBgwmUwA+Pr6WgKchg0b4u/vX2rXII+WakT+W+YwJi0tDZPJhI+Pj6UuQkNDeeGFFwgKCmL69OkUFBTQo0cPpk+fzt/+9rfSbLb8SWxtbcnJyWHs2LHMmTMHgMDAQKKiokhOTuall15i1KhReHt7s379erZt28ayZcsAHlhdp2DX+hQVFVnC2ISEBF577TUGDBjAsmXLyMjIIDY2FvhlVbd56+W7d+9y4sQJatasyYQJEwgJCSm19suf79KlS1y9epX+/fvTqFEjgoKCuHbtGjY2NnTr1o1z585RUFAAQH5+PgDDhw8nPz+fxMTE0my6/ElUIyLWS3suyP+qrKwsy1YOb7zxBr6+vpYZqa+//joeHh5MmTKFL7/8kiFDhlgGSxs1asS7777LTz/9xI0bN4iKitLgRzlhfnEZNWoUKSkpZGRkEBUVxYQJExg+fDg///wz48eP58KFC3To0AEbGxv+/e9/c//+fXr27Imrq2spX4E8SiVfbgFu3LjBzz//TMWKFR/4eUhICMOGDeOdd94hJSWFZs2a8emnn1ruP7936LJYB/P2IHl5eXzyySe88cYbVKhQgfXr13PlyhVSUlJIT0+nV69eNG/enNmzZzNjxgxiYmIYN26caqQcMJ9DZWtry7lz55g4cSItW7ZkzJgxNGvWjOjoaNasWUOFChUYPHgwNjY23L9/n3v37uHt7U1BQYG2PrRyqhF5GCX7CfPzR15eHrm5uZbZ6gUFBdjZ2REWFkZUVBTbt28nNzcXFxcXy5nd2vrQepV8br1x4wYXL17k4MGDJCUl8frrr/PEE09w5MgRRo4cyapVq4iJiaF58+a8+eabXLhwQc8iVsz8uzXXx/Dhwzly5Ai1atXC1dWVadOm8e233/Lss89a7hVJSUkUFRVx+fJlDh48yOLFi0vzEuRP8uv33/z8fG7cuGGZKLRu3TrGjBnDK6+8QqtWrRgxYgTx8fE899xz2NvbA/Ddd9+RlZVlmVAi1kU1IlJ+6I1B/tdkZmbSvXt3bty4gbe3N2vWrGHIkCFUrFiR4uJiDAYDw4YNIzc3l9mzZ+Pm5kbfvn0teyxXqVKFKlWqlPJVyJ+l5MPGZ599xsGDB/nnP/9JQUEBS5YsYfjw4UyePJmJEydSqVIlNm7cyMKFC/Hz88NkMjF79myFdlbOPLCVl5dHamoq9+/fp6ioiKKiIo4ePUrjxo2xsbGx1FLdunXJzc0lKysL+D/bzpjvP2J9zDVSWFjIli1b+PzzzykqKuKVV17hgw8+YMOGDTRq1Ij33nuPyMhIAPbs2cPly5ctn1VoZ92Ki4stA+SpqanUq1ePefPmMWzYMD744APeeecd3n//fV555RUWL17Mtm3bqF27NmlpaRw9epR///vfD2ybKNZHNSIPwxzImVd4m589evbsyYYNG5gwYQLjxo17oBZcXFzw8fH5zVaHCu2sk/m5Ijs7mxkzZuDi4kLz5s1p3LgxM2bM4B//+Afx8fF06dKFjIwMZs+ezbBhw+jSpQvh4eEEBQXpmcQKme8dJd9ZNmzYwMGDB5k6dSo1a9bEycmJZcuW8e6779K5c2fs7e35+OOPefnll9m4cSPOzs7MmzdPK+3KAfOExKysLI4fP05UVBTOzs7Y29tz8uRJcnJyGD16NCNGjGDIkCHcuHGDn376iZycHMt3ZGZmkp6ejtFo1JEiVkg1IlK+6Iw7+V+RmZlJ586dCQ0NZf78+Vy8eJEdO3Zw79496tevj4ODg2WP/5iYGAoKCpg5cyZOTk6EhYVpK7tyIj8/n7t37+Lo6GgJUrZt28b58+d57LHHePLJJ6lfvz5RUVGsW7eOrVu3Eh0dTdeuXWndurXlZff555+3bB0i1qnkIctPP/00mzZtYtmyZTg4OBAYGMiSJUuoW7cuQUFBllo6deoUhw8fpmvXrvj5+Vm+S4Mf1qlkjYwZM4adO3dy584dDh8+TH5+PlOnTmXIkCF07twZf39/TCYT165dY86cOdSoUYM2bdpYvks1Yl0KCgosfY35dzt+/Hi+/fZbHn/8cQIDA4mIiGDmzJmcOXOGpk2b0rNnTxwcHLh06RLp6en4+/vzwQcfEBoaWspXI4+CakQexrVr18jJycHJycnS37z55pt8/vnnHD16FDc3N6Kiorh9+zbr1q3j6tWrtGzZkuzsbC5dusS8efMsZ8mIdSv5TNK7d29OnjzJoUOHOHXqFABTpkzBzs6Offv2MX/+fAwGA3fu3KFBgwZUqlTJcjbmr88+k782k8nEkiVLSEpKIjo62vK7TUpK4uTJkwwbNgwXFxcuXLjAq6++SqdOnWjdujUrV66kY8eOdO7cmd69e9OzZ09NcLZiJfsa+GViyODBg7l27Rpt2rTB1dUVGxsbpk6dSmJiImPHjmXQoEEAXL58me+//564uDjCw8MBsLe3p3bt2vTp00e7WFkJ1YhI+aXgTv7HTCYTr7/+Oo6OjkyYMAF/f3/atGnDsWPH2LlzJ/fv3/9NeBcbG0tBQQEJCQkA1KlTRweiWrmioiLat29P5cqVqVOnDgCJiYm89dZb/PDDDzz++OMEBwdTVFSEt7c3TZo0sYR3UVFRBAcHExAQgK+vr+WBRayX+ZDlv//979jb2/PWW2/x3HPPERYWRnx8POfPn2fWrFk4OzuTm5vLmTNnmD59Oi4uLrzwwgsKYsoBGxsb8vPzefbZZyksLOSZZ57h+eefp6CggF27dnH48GE6duzI999/z2uvvcaPP/5o2WJo+vTplnOqVCvWpbi4mCeeeIKff/6Z8PBwS3/x5Zdf4u7uTocOHSgqKiIwMJAGDRowa9Yszp49S8OGDWndujU9evSge/fulv5KrI9qRB5GZmYmcXFxFBQUEBERgYODAz179uTWrVu4ubnxww8/cODAAYKDg+nevTsZGRksX76cNWvW8M033/DNN99ga2vLrFmz1N+UA+ZnkkGDBuHm5sbkyZMZOHAgR44cYePGjdy6dYsXX3yR9u3bY29vz86dOzl79ix+fn40atTI8j0K7axLUVERycnJrF+/HoD58+fj4eHBtWvX2LFjBy+//DI3btzg8ccfJzY2lg8//JAzZ87w7rvvEhcXR7Vq1XB0dNQ4iRUr2deYn0kMBgOff/45ISEhNGvWDMAS3KamphIREYGNjQ0nTpzgk08+wd7enjfeeOOB+4etra3laAn5a1ONiJRvCu7kf8zGxoYGDRrQuXNnvLy8LDMO27Rpw/Hjx/9jeJeRkcHKlSvp378/jo6OpX0p8ggZDAYaNWpEVFQUDg4O5OfnU6lSJQoLCzl69Ci2tra0bdsWg8FAcXExXl5eREdHs3HjRlavXk2zZs3w8PAo7cuQP9GFCxdYtmwZgwcPpnnz5lSqVAkfHx9cXV3x8/Pjp59+YtWqVaxbt479+/fj4eHBggULMBqNlvuMWLfU1FS+/vprxo4dS3x8PN7e3kRHR+Po6MjmzZs5dOgQ9evX59ixY9y7d49atWoxffp0y/aav96+TP76bGxs8PHx4cMPP8Te3p5q1arh4uJCSkoKAO3bt6e4uBiTyWQJZmbOnMnly5epWrUqvr6+2NraavDUiqlG5GHY29sTHh7OBx98gMFgIC0tjevXr/PRRx/x7LPPUqNGDX788Ue+++47wsPD6du3L5GRkdy+fZuAgACio6P54IMP1N+UI4cPH2bDhg2MGzeOWrVqkZWVxZ49e3BwcODHH3+07DASGRlJo0aNCA4OZsCAAbqXWDGDwUBISAipqamsXLmSkydPMmDAAKpWrUpiYiLHjx/n/fffp2XLlrz99ts4Ojpy/Phx9u/fT79+/XBzcyvtS5BHzNzXTJw4EaPRSGBgIC4uLiQmJuLn50dsbCzFxcW4uLhQvXp1PD09WbBgARs2bODQoUN4eXmxcOFCjEajVuxaKdWISPmmDfblf4WPj4/lzwaDgaKiIoxGI++//z6vv/66ZZbZkCFDcHJyspw59cYbb/DCCy9QqVKl0mq6/Inq168PwGuvvUbFihUZMWIEzz33HMXFxcydOxcXFxdef/11S3gXFhbGJ598whtvvKGZhuVQbm4ut27deuAsGJPJhMlk4vLly8AvK6fs7OxwcXGhbt26GAwGyxkjYv3y8vK4f/++ZbvlgoICXFxc6NmzJydPnmT16tW4ubkxb968B84WMfdRYp3atGnD3Llzef755ykuLmbYsGF4eHiwb98+rl279sAzS7NmzZg9ezZDhgzB3t7eMitVrJtqRB5GXFwcCQkJDBkyhFq1alG1alWqVasGQIsWLTCZTMycOZP33nuPESNG0KpVK6Kioh74DvU35cft27e5deuW5f6wePFiLly4wOuvv87KlStZsWIFxcXFjB49msaNG1vOFdJzq3Xz8vLCycmJwsJC/Pz82L17NwMGDKBdu3Z88cUX1KpVizfeeAM3Nzfu3LnDjh078Pb2VmhXjsTFxTFnzhwGDRpEcXExgwcPxsPDg1OnTvHzzz9bjoCoWrUqQ4YMoV27duTk5GBvb09wcLDef8sB1YhI+aUVd/JIlAzvWrdu/R9X3mmlXfljMBiYOHEiBoPBMusUYOnSpdy5c4cWLVpgY2NDcXEx3t7ePP7441ptVw6ZTCZWrFgBQNu2bS1/bzAYMBqNfPLJJzz11FNER0fj4+NjqRnNai8/TCYTX375JZUqVSImJgZbW1sKCgpwdHTEx8eH1atXc+nSJS5cuEB8fLxluzLNNLR+1apVo2HDhrz11ls4Ojpy5coVfvjhB9auXcuBAwc4e/YsAG5uboSFhREXF0fz5s3x8vIq5ZbLn0U1Ig+jWrVqNGrUiHnz5mE0GmnTpg3Ozs4ABAYG4u3tzeHDh9m1axcuLi7UqFHjgc+rvyk/7Ozs+PHHH+nduzdJSUm8//77vPfee8TGxuLm5sbq1as5ceIEly9ffuDcQ9WI9QsJCaFFixakp6ezY8cO7OzsePHFF7ly5QqnT58mJSWFQ4cO8eWXX7J//36mT5+uM+3KmV8/k5w5c4YDBw6wdOlSduzYwdGjR8nOzsbV1RUHBweqV69uORtT77/lg2pEpHxScCePzO+FdykpKVy7do1GjRppBVU5FhQURMOGDfnXv/6FjY0NkZGRNGzYkOLiYj7//HPu379P8+bNLVsd6iGjfHJ2dsbNzY3Zs2djZ2dH48aNLTWRlpbGkSNH6Nq16wNnDGl7zPLFzc2NoqIiZs6cSZUqVQgPD7fcL3744QeuXLlCixYt2LNnD3Xr1sXPz081Uo5Uq1aNiIgI3njjDTIzM3F3d6dHjx6cPn2aHTt2sHr1ambNmsWxY8d45plndF5ZOaQakYcREBBAZGQk8+fPx97entDQUMvZiObw7rvvviM/P/+BiUZSvri5uREfH4+7uzuffvopderUYfDgweTn5/P999+TmZnJZ599xjPPPKOwrpypVKmSZRJAamoqW7duxd7enpdffplKlSpx79490tPTqVmzJm+//TZhYWGl3WQpBeZnknHjxnH//n3q1KnDoEGDuH37Nj/88APffvst8+bNIzs7m9atW1vOTtW7TfmhGhEpf2xMJpOptBsh1s28NVlhYSHDhw/n559/ZsGCBVpBJSQnJzNo0CCGDh3K4MGDKSoqYsGCBcyZM4chQ4bwyiuvlHYTpZTl5OQwY8YMFixYQPfu3WnRogUAS5YswdbWls8//1yDH+VcRkYG7733HuvWrWPo0KE0bdqU3Nxcpk2bRt26dXnxxReJi4vjzTffpE+fPqXdXCkFu3fvZuDAgYSHh7N69WoA7ty5w6FDhzhx4gTt2rX7zSoZKV9UI/IwUlJSeP755xk6dCj9+/fH29vb8rPDhw9Tt25dTTYT8vPz6d+/P1WqVGHKlCmkpaXx1ltvERQUxDvvvPPA1t1S/qSnp/Puu++SlpbGM888w+OPP87Vq1fx8/OzTAiQ8m3Pnj0MGDCAZs2aMX/+fOCXIyTOnDnD2bNn6dq1q7Y8LOdUIyLlh4I7+VOUDO9u3br1wNkhUr79OrwrLCxk6dKldOzYkZCQkNJunpQBOTk5fPvtt3z22Wfk5ORQuXJlAgICLCvxzGdmSvl19+5dvvrqK+bOnUtRUREODg4EBQWxYMEC7t+/z8CBAxk5ciRxcXGl3VQpJeZgZujQoTz11FP4+vqWdpOkjFGNyMMwP7cOGzaM/v37/2brVAUyArBo0SImTpxIaGgo9+/fx8PDgxUrVmA0Gi0rIKT8Sk9P5/333+fw4cMUFBRgMBhYv369JjaLhfmZZNiwYfTt2/c3zyQ6r0xUIyLlg7bK/JVr167RqlUrHBwcaNCgQWk3x2qU3DbTfC6ECPyy3L9Bgwa88847ZGdn06xZM5o2bYqnp2dpN03KCDs7O+rUqUPXrl3p3Lkz3bt359lnn8VoNFJYWKgBMqFChQo0btyYzp0706FDBzp27Mg//vEPTCYTH374IWlpaQwdOlT9TzkWEBBAgwYNGD9+PEVFRYSHh1OxYsXSbpaUIaoReRglz5jJz8+nXr16D5zXrYlEAlCzZk2Cg4MpKioiOjqajz76SM+tYuHq6kqjRo0sZzKPGzeOgICA0m6WlCEBAQGWvqawsFB9jfyGakSkfNCKuxKysrIYOHAghw8f5rXXXmPAgAGl3SSRcmP79u2MHDmSLVu2aLahPBSttJPfYzKZ2LJlC6tXryYjI4PLly8zd+5cwsPDS7tpUgZs376dUaNGkZiYqL5GfpdqRB7Gtm3bSEhI4Msvv9TqKfm/Krm6Tqsf5PfofUb+E/U18kdUIyLWTcHd/y89PZ2XXnqJY8eOASi4EykF2dnZmt0uIv8jJpOJw4cPM2fOHCIiIujQoQPVq1cv7WZJGaK+Rv6IakQehjmU0daHIiLyqKivkT+iGhGxXpryxS970E+dOpXc3FxiYmLYs2dPaTdJpFzSIJmI/E/Z2NjQoEEDZs2aVdpNkTJKfY38EdWIPAwNkomIyKOmvkb+iGpExHppTT6wZMkSqlSpwtKlS+nevXtpN0dERERERETKOA2SiYjIo6a+Rv6IakTEOmnFHfDOO+/QtGlTbG1tuXDhQmk3R0RERERERERERERERMohBXdAixYtSrsJIiIiIiIiIiIiIiIiUs4puPt/EBcXV9pNkDJuypQpAPzzn/8s5ZZIWaUakT+iGpE/ohqRP6IakT+iGpGHoTqRP6IakT+iGpE/ohqRh7V9+/bSboL8D40ePZqcnBymT59e2k0p03TGnYiIiIiIiIiIiIiIiEgZoOBOREREREREREREREREpAxQcCciIiIiIiIiIiIiIiJSBii4ExERERERERERERERESkDFNyJiIiIiIiIiIiIiIiIlAEK7kRERERERERERERERETKAAV3IiIiIiIiIiIiIiIiImWAsbQbUNb07NmTnj17lnYzREREREREREREREREpJzRijsRERERERERERERERGRMkDBnYiIiIiIiIiIiIiIiEgZoOBOREREREREREREREREpAxQcCciIiIiIiIiIiIiIiJSBii4ExERERERERERERERESkDFNyJiIiIiIiIiIiIiIiIlAEK7kRERERERERERERERETKAAV3IiIiIiIiIiIiIiIiImWAgjsRERERERERERERERGRMkDBnYiIiIiIiIiIiIiIiEgZoOBOREREREREREREREREpAxQcCciIiIiIiIiIiIiIiJSBii4ExERERERERERERERESkDFNyJiIiIiIiIiIiIiIiIlAEK7kRERERERERERERERETKAAV3IiIiIiIiIiIiIiIiImWAgjsRERERERERERERERGRMkDBnYiIiIiIiIiIiIiIiEgZoOBOREREREREREREREREpAxQcCciIiIiIiIiIiIiIiJSBii4ExERERERERERERERESkDFNyJiIiIiIiIiIiIiIiIlAEK7kRERERERERERERERETKAAV3IiIiIiIiIiIiIiIiImWAgjsRERERERERERERERGRMkDBnYiIiIiIiIiIiIiIiEgZoOBOREREREREREREREREpAxQcCciIiIiIiIiIiIiIiJSBii4ExERERERERERERERESkDFNyJiIiIiIiIiIiIiIiIlAEK7kRERERERERERERERETKAAV3IiIiIiIiIiIiIiIiImWAgjsRERERERERERERERGRMkDBnYiIiIiIiIiIiIiIiEgZoOBOREREREREREREREREpAxQcPf/tXf/sV7WdR/HX8fxywShhVmKqEtGK0QgTVJpouGPCFM5DbCENKPaRElGZssYYq4lodSmc5Q5FUtryIYmGlo5ctIvakJ2DPJImElATCU4h1/3H+6c+z47Iuh9jHf3/XhsZ2ff6/rse32+13X999x1fQAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAPMRH24AABLaSURBVAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAoQ7gAAAAAAAKAA4Q4AAAAAAAAKEO4AAAAAAACgAOEOAAAAAAAAChDuAAAAAAAAoADhDgAAAAAAAAp4w+HuxRdfzAc+8IHccccdnfZt3bo1c+fOzRlnnJGhQ4fmnHPOyW233ZaWlpZOY5944okMHjz4Nf9OPfXUTuP/8pe/ZPr06Rk5cmSGDx+e8ePH58EHH3zNOS5btiwTJkzICSeckOHDh2fy5MlZsWJFhzHf+c539nr8tr+LL774jZ4eAAAAAACA19Xa2pqPfexjeeKJJzrt27x5c0aOHJn169cfgJnxZnTl9ez2Rg68devWTJs2La+88kqnfdu2bcvkyZOzatWqDBo0KBMnTsy6desyb968LF++PAsWLEivXr3axzc1NSVJJkyYkMMOO6zDd73tbW/r8Hn16tWZPHlydu7cmY9+9KM55JBD8sgjj+Sqq67Kxo0bM2XKlPaxP/rRj/LVr341ffv2zYUXXphdu3ZlyZIl+fSnP51bbrklo0ePTpJ88IMfzOWXX/6av/Ohhx7K2rVrc9JJJ72R0wMAAAAAAPC6WlpaMmPGjPz5z3/utG/Lli35whe+kH/+858HYGa8GV19Pfc73D3//POZNm1aVq9e/Zr7v/vd72bVqlUZM2ZM5s2blx49eiRJFi5cmOuuuy4LFizItGnT2se3hbuZM2emT58+ez3u7t27c80112TXrl256667MnTo0CTJtGnTct555+Wmm27KxIkT07Nnz+zYsSNz585Nnz59smjRogwYMCBJ8slPfjKNjY25/vrr28PdySefnJNPPrnT8X7/+9/n1ltvzYgRI/Ya9gAAAAAAAN6oNWvWZMaMGdmzZ0+nfb/+9a9z9dVXp3fv3gdgZrwZb8X13K9XZd5xxx0ZN25c/vSnP2XkyJGvOebBBx9MQ0NDrr322vZolyQXXXRRjjnmmNx9993ZuXNn+/ampqYceeSRrxvtkuRXv/pVmpqaMmXKlPZolyR9+/bN9OnTc95552XTpk1JknXr1mXLli0ZOXJke7RLksGDB2fYsGFZv359Nm7cuNdjtba25uqrr85BBx2UG264IQcdZAlAAAAAAACga/zmN7/JqaeemnvvvbfTvl/+8peZNGlSbr755gMws7dWa2trnn/++Tz77LP5/ve/n9bW1gM9pS7xVlzP/Xri7s4778yRRx6Z2bNnp7m5OU8++WSnMevXr88RRxyRww8/vMP2hoaGDB48OA8//HDWrl2bwYMHZ9euXVm7dm1OOeWUfR778ccfT5KcffbZnfZdcMEFueCCC9o/9+vXL0nyt7/9rcO4PXv2ZMOGDenevfvrhsJ77rknzc3Nueyyy3Lsscfuc24AAAAAAAD7a+LEiXvdN3369CTJc8899++azr9Fa2trGhsb8/LLLyd5tTndf//9+fGPf9zhQbD/RG/F9dyvR8pmz56dxYsXZ8SIEXsd06NHj70W0raL0RbUnn322bS0tKRXr16ZOXNmRo0alRNOOCGTJk1qD3Vt2t4JOnDgwMyfPz9nnHFGjj/++Hz84x/P0qVLO4x9xzvekbPOOiurV6/O3Llzs3nz5mzatClz5sxJc3NzJk2alJ49e77mHF955ZXceuutOeSQQ/K5z31uf04LAAAAAAAAr2PhwoXtnajNyy+/nIULFx6gGdW2X0/cjRo1ap9jhgwZkhUrVmTlypUZPnx4+/ZNmzblD3/4Q5L/Dnht69s99NBDGTFiRMaNG5cXX3wxy5Yty9SpU3P99densbExSbJhw4b06NEjV1xxRVavXp2PfOQjaWhoyCOPPJIrr7wys2bNykUXXdR+vBtvvDFvf/vbs2DBgixYsKB9+2WXXZYZM2bsdf6LFi3Kli1bcskll+TQQw993d/685//fJ/nAxL3CvvmHmFf3CPsi3uEfXGPsC/uEfaH+4R9cY+wL+4R9sU9Av93PfXUU6+5fdWqVf/mmfxn2K9wtz8uvfTSrFixIl/84hcze/bsnHjiiVm3bl1mz57dvihf2//t27dn4MCB+cQnPpGpU6e2f8eaNWsyYcKEzJkzJ6effnr69++fbdu2pbW1Nc8880wWL16cd7/73UmSz3/+8xk/fny+8Y1v5Kyzzkr//v2TJIsXL86SJUtyxBFHZPTo0Wlpacmjjz6ahQsX5rjjjuvwas02e/bsycKFC9OtW7dMmTKlq04JAAAAAADA/2vz5s070FP4j7Jfr8rcH6effnq+9KUv5R//+EemTp2aESNG5Pzzz8/BBx+cSy+9NEly8MEHJ0nGjx+fn/70px2iXZIcd9xxmTJlSrZv355ly5YleXWNvCT57Gc/2x7tkmTAgAG5+OKL09LSksceeyzJq4sAzpo1K4MGDcqSJUvyta99LV//+tfzwAMP5PDDD89XvvKVPPPMM53mvnLlyjQ3N+e0007rcAwAAAAAAAD4d+mycJckn/nMZ7J06dJce+21mTlzZu68887cfvvt+de//pXk1TXo9uV973tfkmT9+vVJkj59+iRJ3v/+93ca+973vjdJsm7duiTJ/fffnyS56qqr0rt37/Zx/fv3z5VXXpndu3dn8eLFnb6nLfydffbZ+/dDAQAAAAAAoIt12asy2xx11FH51Kc+1WHbqlWr0tDQkPe85z1JXn0l5oYNG/KhD32o/Ym6Ni0tLUmSnj17JkmOPvroPPXUU9mxY0enY+3cuTNJ0qtXryTJ3//+9yRpP87/NGjQoCTJCy+80GnfL37xi3Tr1i1nnnnm/v9QAAAAAAAA6EJd9sTdN7/5zZx00knZvHlzh+0bN27MypUrM2TIkPTr1y9JMmvWrFxyySX54x//2Ol7fvvb3yZJhgwZkiQ58cQTkyRPPvlkp7FtCxe2PXnX9kRfc3Nzp7HPPfdckrSvhddm69atWbNmTQYNGpS+ffvu348FAAAAAAD4X2hqasopp5zSafvRRx+dpqamDBgw4ADMijerq65nl4W7QYMG5aWXXsoPf/jD9m2tra255pprsmPHjg7r2Z1zzjlJkptvvrn9qbkk+d3vfpf77rsvAwcOzKhRo5Ik5557bg499NDcddddWbt2bfvY5ubm/OAHP8hhhx2WD3/4w+1jk+Smm27Ktm3b2se+9NJLmT9/fpJk7NixHeb99NNPZ/fu3Tn++OO75DwAAAAAAADAm9Flr8ocN25c7rnnnnz729/O008/naOOOirLly9PU1NTGhsbM2bMmPaxEydOzMMPP5zHH388559/fk477bS88MILefTRR9O9e/d861vfSrdur06tX79+ue666zJjxow0NjZm7NixOeigg7J06dJs3749N954Y3r06JEkGT16dC688MIsWrQoY8eOzZlnnpnW1tY89thj2bBhQ6ZOnZphw4Z1mPdf//rXJMnAgQO76lQAAAAAAADAG9Zl4a5bt2753ve+l/nz5+dnP/tZli9fnmOOOSZz5sxJY2Njh7Xsunfvnttvvz233XZbHnjggdx9993p3bt3xowZkyuuuCLHHntsh+8+99xz8853vjO33HJLfvKTnyRJhg4dmssvv7z9VZptbrjhhgwbNiz33ntv7rvvvjQ0NGTw4MH58pe/3OlpuyTZsmVLkuRd73pXV50KAAAAAAAAeMMa9uzZs+dATwIAAAAAAAD+v+uyNe4AAAAAAACAN0+4AwAAAAAAgAKEOwAAAAAAAChAuAMAAAAAAIAChDsAAAAAAAAoQLgDAAAAAACAAoQ7AAAAAAAAKEC4AwAAAAAAgAKEOwAAAAAAACjgvwDqgEy+N34JzwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot visualisation of the missing values for each feature of the raw DataFrame, df_stratabet_key_entries_raw \n", "msno.matrix(df_stratabet_key_entries_raw, figsize = (30, 7))" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Series([], dtype: int64)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Counts of missing values\n", "tm_null_value_stats = df_stratabet_key_entries_raw.isnull().sum(axis=0)\n", "tm_null_value_stats[tm_null_value_stats != 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualisation shows us very quickly that there a few missing values in the `shotQuality` column, but otherwise the dataset is complete. This data is now ready for Data Engineering." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2.4. Export Complete DataFrame" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "df_stratabet_key_entries_raw.to_csv(data_dir_stratabet + '/raw/key_entries/' + 'stratabet_key_entries_all.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.5. Minutes Played" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.5.1. Data Dictionary\n", "The [StrataBet]( http://www.stratagem.co/) Events dataset has twelve features (columns) with the following definitions and data types:\n", "\n", "| Feature | Data type |\n", "|------|-----|\n", "| `eventId` | int64 |\n", "| `subEventName` | object |\n", "| `tags` | object |\n", "| `playerId` | int64 |\n", "| `positions` | object |\n", "| `matchId` | int64 |\n", "| `eventName` | object |\n", "| `teamId` | int64 |\n", "| `matchPeriod` | object |\n", "| `eventSec` | float64 |\n", "| `subEventId` | object |\n", "| `id` | int64 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.5.2. Import Data" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "# Combine individual csv files to form one DataFrame, df_raw, using glob\n", "lst_files_key_entries = glob.glob(data_dir_stratabet + '/raw/key_entries/individual_competitions' + \"/*.csv\") # Creates a list of all csv files\n", "\n", "li = [] # pd.concat takes a list of DataFrames as an argument\n", "\n", "for filename in lst_files_key_entries:\n", " df_raw_temp = pd.read_csv(filename, index_col=None, header=0)\n", " li.append(df_raw_temp)\n", "\n", "df_stratabet_key_entries_raw = pd.concat(li, axis=0, ignore_index=True) # ignore_index=True as we don't want pandas to try an align row indexes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.5.3. Initial Data Handling\n", "Let's quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2teamkeyentryAreakeyentryType
01535649570ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersRightPass
11535659571ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersBoxPass
21535669572ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersRightPass
31535679573ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalHamilton AcademicalRightPass
41535689574ScoPr22424242016-08-0611:30:00RangersHamilton AcademicalRangersBoxRun
\n", "
" ], "text/plain": [ " True index competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "0 153564 9570 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "1 153565 9571 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "2 153566 9572 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "3 153567 9573 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "4 153568 9574 ScoPr 2242424 2016-08-06 11:30:00 Rangers \n", "\n", " awayteam_team2 team keyentryArea keyentryType \n", "0 Hamilton Academical Rangers Right Pass \n", "1 Hamilton Academical Rangers Box Pass \n", "2 Hamilton Academical Rangers Right Pass \n", "3 Hamilton Academical Hamilton Academical Right Pass \n", "4 Hamilton Academical Rangers Box Run " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the first 5 rows of the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.head()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2teamkeyentryAreakeyentryType
1956821112817814GreSL27014772018-03-1115:15:00LamiaLevadiakosLevadiakosRightTurnover
1956831112827815GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaLeftTurnover
1956841112837816GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaRightPass
1956851112847817GreSL27014772018-03-1115:15:00LamiaLevadiakosLamiaBoxPass
1956861112857818GreSL27014772018-03-1115:15:00LamiaLevadiakosLevadiakosBoxRun
\n", "
" ], "text/plain": [ " True index competition gsm_id kickoffDate kickoffTime \\\n", "195682 111281 7814 GreSL 2701477 2018-03-11 15:15:00 \n", "195683 111282 7815 GreSL 2701477 2018-03-11 15:15:00 \n", "195684 111283 7816 GreSL 2701477 2018-03-11 15:15:00 \n", "195685 111284 7817 GreSL 2701477 2018-03-11 15:15:00 \n", "195686 111285 7818 GreSL 2701477 2018-03-11 15:15:00 \n", "\n", " hometeam_team1 awayteam_team2 team keyentryArea keyentryType \n", "195682 Lamia Levadiakos Levadiakos Right Turnover \n", "195683 Lamia Levadiakos Lamia Left Turnover \n", "195684 Lamia Levadiakos Lamia Right Pass \n", "195685 Lamia Levadiakos Lamia Box Pass \n", "195686 Lamia Levadiakos Levadiakos Box Run " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the last 5 rows of the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.tail()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(195687, 11)\n" ] } ], "source": [ "# Print the shape of the raw DataFrame, df_stratabet_key_entries_raw \n", "print(df_stratabet_key_entries_raw.shape)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['True', 'index', 'competition', 'gsm_id', 'kickoffDate', 'kickoffTime',\n", " 'hometeam_team1', 'awayteam_team2', 'team', 'keyentryArea',\n", " 'keyentryType'],\n", " dtype='object')\n" ] } ], "source": [ "# Print the column names of the raw DataFrame, df_stratabet_key_entries_raw \n", "print(df_stratabet_key_entries_raw.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset has six features (columns). Full details of these attributes can be found in the [Data Dictionary](section3.3.1)." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 195687 entries, 0 to 195686\n", "Data columns (total 11 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 True 195687 non-null int64 \n", " 1 index 195687 non-null int64 \n", " 2 competition 195687 non-null object\n", " 3 gsm_id 195687 non-null int64 \n", " 4 kickoffDate 195687 non-null object\n", " 5 kickoffTime 195687 non-null object\n", " 6 hometeam_team1 195687 non-null object\n", " 7 awayteam_team2 195687 non-null object\n", " 8 team 195687 non-null object\n", " 9 keyentryArea 195687 non-null object\n", " 10 keyentryType 195687 non-null object\n", "dtypes: int64(3), object(8)\n", "memory usage: 16.4+ MB\n" ] } ], "source": [ "# Info for the raw DataFrame, df_stratabet_key_entries_raw \n", "df_stratabet_key_entries_raw.info()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trueindexgsm_id
count195687.000000195687.0000001.956870e+05
mean97843.0000006453.8474142.354363e+06
std56490.1154015040.2883731.125099e+05
min0.0000000.0000002.237445e+06
25%48921.5000002675.0000002.246823e+06
50%97843.0000005401.0000002.360808e+06
75%146764.5000008928.0000002.467223e+06
max195686.00000025384.0000002.701477e+06
\n", "
" ], "text/plain": [ " True index gsm_id\n", "count 195687.000000 195687.000000 1.956870e+05\n", "mean 97843.000000 6453.847414 2.354363e+06\n", "std 56490.115401 5040.288373 1.125099e+05\n", "min 0.000000 0.000000 2.237445e+06\n", "25% 48921.500000 2675.000000 2.246823e+06\n", "50% 97843.000000 5401.000000 2.360808e+06\n", "75% 146764.500000 8928.000000 2.467223e+06\n", "max 195686.000000 25384.000000 2.701477e+06" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Description of the raw DataFrame, df_stratabet_key_entries_raw, showing some summary statistics for each numberical column in the DataFrame\n", "df_stratabet_key_entries_raw.describe()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot visualisation of the missing values for each feature of the raw DataFrame, df_stratabet_key_entries_raw \n", "msno.matrix(df_stratabet_key_entries_raw, figsize = (30, 7))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Series([], dtype: int64)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Counts of missing values\n", "tm_null_value_stats = df_stratabet_key_entries_raw.isnull().sum(axis=0)\n", "tm_null_value_stats[tm_null_value_stats != 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualisation shows us very quickly that there a few missing values in the `shotQuality` column, but otherwise the dataset is complete. This data is now ready for Data Engineering." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.5.4. Export Complete DataFrame" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "df_stratabet_key_entries_raw.to_csv(data_dir_stratabet + '/raw/key_entries/' + 'stratabet_key_entries_all.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Data Engineering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1. Chances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.1. Assign Raw DataFrame to Engineered DataFrame" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# Assign Raw DataFrame to Engineered DataFrame\n", "df_stratabet_chances = df_stratabet_chances_raw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.2. Create `Full_Fixture_Date` Attribute" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2iconchanceRatingteamtypetimeplayerlocation_xlocation_ybodyPartshotQualitydefPressurenumDefPlayersnumAttPlayersoutcomeprimaryPlayerprimaryTypeprimaryLocation_xprimaryLocation_ysecondaryPlayersecondaryType
04684GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play24:43:00D. Epstein8148Left3520Saved------
14685GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play45:29:00D. Epstein2760Left2220DefendedThuramOpen Play Pass-2982--
24686GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play44:34:00S. Siontis23117Right2141Missed------
34687GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchancePlataniasOpen Play42:39:00O. Gnjatic-9118Left1131MissedG. ManousosOpen Play Pass7792--
44688GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play40:46:00D. Epstein4215Left2520Saved------
\n", "
" ], "text/plain": [ " index competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "0 4684 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "1 4685 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "2 4686 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "3 4687 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "4 4688 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "\n", " awayteam_team2 icon chanceRating team type time \\\n", "0 Platanias poorchance poorchance Kerkyra Open Play 24:43:00 \n", "1 Platanias goodchance goodchance Kerkyra Open Play 45:29:00 \n", "2 Platanias poorchance poorchance Kerkyra Open Play 44:34:00 \n", "3 Platanias poorchance poorchance Platanias Open Play 42:39:00 \n", "4 Platanias goodchance goodchance Kerkyra Open Play 40:46:00 \n", "\n", " player location_x location_y bodyPart shotQuality defPressure \\\n", "0 D. Epstein 81 48 Left 3 5 \n", "1 D. Epstein 27 60 Left 2 2 \n", "2 S. Siontis 23 117 Right 2 1 \n", "3 O. Gnjatic -9 118 Left 1 1 \n", "4 D. Epstein 42 15 Left 2 5 \n", "\n", " numDefPlayers numAttPlayers outcome primaryPlayer primaryType \\\n", "0 2 0 Saved - - \n", "1 2 0 Defended Thuram Open Play Pass \n", "2 4 1 Missed - - \n", "3 3 1 Missed G. Manousos Open Play Pass \n", "4 2 0 Saved - - \n", "\n", " primaryLocation_x primaryLocation_y secondaryPlayer secondaryType \n", "0 - - - - \n", "1 -29 82 - - \n", "2 - - - - \n", "3 77 92 - - \n", "4 - - - - " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_stratabet_chances.head()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances['Full_Fixture_Date'] = df_stratabet_chances['kickoffDate'].astype(str) + ' ' + df_stratabet_chances['hometeam_team1'].astype(str) + ' vs. ' + df_stratabet_chances['awayteam_team2'].astype(str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.3. Convert Data Types" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances['location_x'] = df_stratabet_chances['location_x'].apply(pd.to_numeric, errors='coerce')\n", "df_stratabet_chances['location_y'] = df_stratabet_chances['location_y'].apply(pd.to_numeric, errors='coerce')\n", "df_stratabet_chances['primaryLocation_x'] = df_stratabet_chances['primaryLocation_x'].apply(pd.to_numeric, errors='coerce')\n", "df_stratabet_chances['primaryLocation_y'] = df_stratabet_chances['primaryLocation_y'].apply(pd.to_numeric, errors='coerce')\n", "df_stratabet_chances['shotQuality'] = df_stratabet_chances['shotQuality'].apply(pd.to_numeric, errors='coerce')\n", "df_stratabet_chances['defPressure'] = df_stratabet_chances['defPressure'].apply(pd.to_numeric, errors='coerce')\n", "df_stratabet_chances['numDefPlayers'] = df_stratabet_chances['numDefPlayers'].apply(pd.to_numeric, errors='coerce')\n", "df_stratabet_chances['numAttPlayers'] = df_stratabet_chances['numAttPlayers'].apply(pd.to_numeric, errors='coerce')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.4. Convert X, Y Coordinates to Standardised Coordiantes\n", "\n", "From the documenation, the XY coordinates in addition to grid locations, with (0,0) representing the absolute centre of the defended goal line. The pitch length runs from 0 to 420, while the width runs from 136 to -136 (left to right).\n", "\n", "Some key reference points of note:\n", "* Left Goalpost (15, 0)\n", "* Right Goalpost (-15, 0)\n", "* 6-Yard Box Left Corner: (37, 22)\n", "* 6-Yard Box Right Corner: (-37, 22)\n", "* Penalty Spot: (0, 44)\n", "* 18-Yard Box Left Corner: (81, 66)\n", "* 18-Yard Box Right Corner: (-81, 66)\n", "* Centre Spot: (0, 210) " ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances['location_y_120'] = ((df_stratabet_chances['location_y'] / 480) * 120).round(2)\n", "df_stratabet_chances['location_x_80'] = (((df_stratabet_chances['location_x'] + 136) / 272) * 80).round(2)\n", "df_stratabet_chances['primaryLocation_y_120'] = ((df_stratabet_chances['primaryLocation_y'] / 480) * 120).round(2)\n", "df_stratabet_chances['primaryLocation_x_80'] = (((df_stratabet_chances['primaryLocation_x'] + 136) / 272) * 80).round(2)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2iconchanceRatingteamtypetimeplayerlocation_xlocation_ybodyPartshotQualitydefPressurenumDefPlayersnumAttPlayersoutcomeprimaryPlayerprimaryTypeprimaryLocation_xprimaryLocation_ysecondaryPlayersecondaryTypeFull_Fixture_Datelocation_y_120location_x_80primaryLocation_y_120primaryLocation_x_80
04684GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play24:43:00D. Epstein81.048.0Left3.05.02.00.0Saved--NaNNaN--10/09/2016 Kerkyra vs. Platanias12.0063.82NaNNaN
14685GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play45:29:00D. Epstein27.060.0Left2.02.02.00.0DefendedThuramOpen Play Pass-29.082.0--10/09/2016 Kerkyra vs. Platanias15.0047.9420.531.47
24686GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play44:34:00S. Siontis23.0117.0Right2.01.04.01.0Missed--NaNNaN--10/09/2016 Kerkyra vs. Platanias29.2546.76NaNNaN
34687GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchancePlataniasOpen Play42:39:00O. Gnjatic-9.0118.0Left1.01.03.01.0MissedG. ManousosOpen Play Pass77.092.0--10/09/2016 Kerkyra vs. Platanias29.5037.3523.062.65
44688GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play40:46:00D. Epstein42.015.0Left2.05.02.00.0Saved--NaNNaN--10/09/2016 Kerkyra vs. Platanias3.7552.35NaNNaN
\n", "
" ], "text/plain": [ " index competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "0 4684 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "1 4685 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "2 4686 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "3 4687 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "4 4688 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "\n", " awayteam_team2 icon chanceRating team type time \\\n", "0 Platanias poorchance poorchance Kerkyra Open Play 24:43:00 \n", "1 Platanias goodchance goodchance Kerkyra Open Play 45:29:00 \n", "2 Platanias poorchance poorchance Kerkyra Open Play 44:34:00 \n", "3 Platanias poorchance poorchance Platanias Open Play 42:39:00 \n", "4 Platanias goodchance goodchance Kerkyra Open Play 40:46:00 \n", "\n", " player location_x location_y bodyPart shotQuality defPressure \\\n", "0 D. Epstein 81.0 48.0 Left 3.0 5.0 \n", "1 D. Epstein 27.0 60.0 Left 2.0 2.0 \n", "2 S. Siontis 23.0 117.0 Right 2.0 1.0 \n", "3 O. Gnjatic -9.0 118.0 Left 1.0 1.0 \n", "4 D. Epstein 42.0 15.0 Left 2.0 5.0 \n", "\n", " numDefPlayers numAttPlayers outcome primaryPlayer primaryType \\\n", "0 2.0 0.0 Saved - - \n", "1 2.0 0.0 Defended Thuram Open Play Pass \n", "2 4.0 1.0 Missed - - \n", "3 3.0 1.0 Missed G. Manousos Open Play Pass \n", "4 2.0 0.0 Saved - - \n", "\n", " primaryLocation_x primaryLocation_y secondaryPlayer secondaryType \\\n", "0 NaN NaN - - \n", "1 -29.0 82.0 - - \n", "2 NaN NaN - - \n", "3 77.0 92.0 - - \n", "4 NaN NaN - - \n", "\n", " Full_Fixture_Date location_y_120 location_x_80 \\\n", "0 10/09/2016 Kerkyra vs. Platanias 12.00 63.82 \n", "1 10/09/2016 Kerkyra vs. Platanias 15.00 47.94 \n", "2 10/09/2016 Kerkyra vs. Platanias 29.25 46.76 \n", "3 10/09/2016 Kerkyra vs. Platanias 29.50 37.35 \n", "4 10/09/2016 Kerkyra vs. Platanias 3.75 52.35 \n", "\n", " primaryLocation_y_120 primaryLocation_x_80 \n", "0 NaN NaN \n", "1 20.5 31.47 \n", "2 NaN NaN \n", "3 23.0 62.65 \n", "4 NaN NaN " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_stratabet_chances.head()" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances['location_y_120_inv'] = 120 - ((df_stratabet_chances['location_y'] / 480) * 120).round(2)\n", "df_stratabet_chances['location_x_80_inv'] = 80 - (((df_stratabet_chances['location_x'] + 136) / 272) * 80).round(2)\n", "df_stratabet_chances['primaryLocation_y_120_inv'] = 120 - ((df_stratabet_chances['primaryLocation_y'] / 480) * 120).round(2)\n", "df_stratabet_chances['primaryLocation_x_80_inv'] = 80 - (((df_stratabet_chances['primaryLocation_x'] + 136) / 272) * 80).round(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.5. Renaming" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances = df_stratabet_chances.rename(columns = {'index':'id'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.6. Assign New Attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Season" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances['Season'] = 'TO ADD'" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "# Write code here to add seasons for each match per 'competition' and 'kickoffDate' - varies per league" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.7. Create DataFrame of Teams and Leagues" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
competitionteam
77137AusALAdelaide United
77053AusALBrisbane Roar
77104AusALCentral Coast Mariners
77066AusALMelbourne City
77052AusALMelbourne Victory
\n", "
" ], "text/plain": [ " competition team\n", "77137 AusAL Adelaide United\n", "77053 AusAL Brisbane Roar\n", "77104 AusAL Central Coast Mariners\n", "77066 AusAL Melbourne City\n", "77052 AusAL Melbourne Victory" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create DataFrame of Teams and Leagues\n", "\n", "## Select columns of interest\n", "cols = ['competition', 'team']\n", "df_stratabet_teams_leagues = df_stratabet_chances[cols]\n", "\n", "## Drop duplicates\n", "df_stratabet_teams_leagues = df_stratabet_teams_leagues.drop_duplicates()\n", "\n", "## Order columns by league and team\n", "df_stratabet_teams_leagues = df_stratabet_teams_leagues.sort_values(['competition', 'team'], ascending=[True, True])\n", "\n", "## Display DataFrame\n", "df_stratabet_teams_leagues.head()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "# Export DataFrame\n", "df_stratabet_teams_leagues.to_csv(data_dir_stratabet + '/reference/teams_leagues.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.8. Export DataFrame" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances.to_csv(data_dir_stratabet + '/engineered/chances/stratabet_chances_all.csv', index=None, header=True)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "df_stratabet_chances.to_csv(data_dir + '/export/stratabet_events_chances.csv', index=None, header=True)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idcompetitiongsm_idkickoffDatekickoffTimehometeam_team1awayteam_team2iconchanceRatingteamtypetimeplayerlocation_xlocation_ybodyPartshotQualitydefPressurenumDefPlayersnumAttPlayersoutcomeprimaryPlayerprimaryTypeprimaryLocation_xprimaryLocation_ysecondaryPlayersecondaryTypeFull_Fixture_Datelocation_y_120location_x_80primaryLocation_y_120primaryLocation_x_80location_y_120_invlocation_x_80_invprimaryLocation_y_120_invprimaryLocation_x_80_invSeason
04684GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play24:43:00D. Epstein81.048.0Left3.05.02.00.0Saved--NaNNaN--10/09/2016 Kerkyra vs. Platanias12.0063.82NaNNaN108.0016.18NaNNaNTO ADD
14685GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play45:29:00D. Epstein27.060.0Left2.02.02.00.0DefendedThuramOpen Play Pass-29.082.0--10/09/2016 Kerkyra vs. Platanias15.0047.9420.531.47105.0032.0699.548.53TO ADD
24686GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchanceKerkyraOpen Play44:34:00S. Siontis23.0117.0Right2.01.04.01.0Missed--NaNNaN--10/09/2016 Kerkyra vs. Platanias29.2546.76NaNNaN90.7533.24NaNNaNTO ADD
34687GreSL235559110/09/201613:00:00KerkyraPlataniaspoorchancepoorchancePlataniasOpen Play42:39:00O. Gnjatic-9.0118.0Left1.01.03.01.0MissedG. ManousosOpen Play Pass77.092.0--10/09/2016 Kerkyra vs. Platanias29.5037.3523.062.6590.5042.6597.017.35TO ADD
44688GreSL235559110/09/201613:00:00KerkyraPlataniasgoodchancegoodchanceKerkyraOpen Play40:46:00D. Epstein42.015.0Left2.05.02.00.0Saved--NaNNaN--10/09/2016 Kerkyra vs. Platanias3.7552.35NaNNaN116.2527.65NaNNaNTO ADD
\n", "
" ], "text/plain": [ " id competition gsm_id kickoffDate kickoffTime hometeam_team1 \\\n", "0 4684 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "1 4685 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "2 4686 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "3 4687 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "4 4688 GreSL 2355591 10/09/2016 13:00:00 Kerkyra \n", "\n", " awayteam_team2 icon chanceRating team type time \\\n", "0 Platanias poorchance poorchance Kerkyra Open Play 24:43:00 \n", "1 Platanias goodchance goodchance Kerkyra Open Play 45:29:00 \n", "2 Platanias poorchance poorchance Kerkyra Open Play 44:34:00 \n", "3 Platanias poorchance poorchance Platanias Open Play 42:39:00 \n", "4 Platanias goodchance goodchance Kerkyra Open Play 40:46:00 \n", "\n", " player location_x location_y bodyPart shotQuality defPressure \\\n", "0 D. Epstein 81.0 48.0 Left 3.0 5.0 \n", "1 D. Epstein 27.0 60.0 Left 2.0 2.0 \n", "2 S. Siontis 23.0 117.0 Right 2.0 1.0 \n", "3 O. Gnjatic -9.0 118.0 Left 1.0 1.0 \n", "4 D. Epstein 42.0 15.0 Left 2.0 5.0 \n", "\n", " numDefPlayers numAttPlayers outcome primaryPlayer primaryType \\\n", "0 2.0 0.0 Saved - - \n", "1 2.0 0.0 Defended Thuram Open Play Pass \n", "2 4.0 1.0 Missed - - \n", "3 3.0 1.0 Missed G. Manousos Open Play Pass \n", "4 2.0 0.0 Saved - - \n", "\n", " primaryLocation_x primaryLocation_y secondaryPlayer secondaryType \\\n", "0 NaN NaN - - \n", "1 -29.0 82.0 - - \n", "2 NaN NaN - - \n", "3 77.0 92.0 - - \n", "4 NaN NaN - - \n", "\n", " Full_Fixture_Date location_y_120 location_x_80 \\\n", "0 10/09/2016 Kerkyra vs. Platanias 12.00 63.82 \n", "1 10/09/2016 Kerkyra vs. Platanias 15.00 47.94 \n", "2 10/09/2016 Kerkyra vs. Platanias 29.25 46.76 \n", "3 10/09/2016 Kerkyra vs. Platanias 29.50 37.35 \n", "4 10/09/2016 Kerkyra vs. Platanias 3.75 52.35 \n", "\n", " primaryLocation_y_120 primaryLocation_x_80 location_y_120_inv \\\n", "0 NaN NaN 108.00 \n", "1 20.5 31.47 105.00 \n", "2 NaN NaN 90.75 \n", "3 23.0 62.65 90.50 \n", "4 NaN NaN 116.25 \n", "\n", " location_x_80_inv primaryLocation_y_120_inv primaryLocation_x_80_inv \\\n", "0 16.18 NaN NaN \n", "1 32.06 99.5 48.53 \n", "2 33.24 NaN NaN \n", "3 42.65 97.0 17.35 \n", "4 27.65 NaN NaN \n", "\n", " Season \n", "0 TO ADD \n", "1 TO ADD \n", "2 TO ADD \n", "3 TO ADD \n", "4 TO ADD " ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_stratabet_chances.head()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(119148, 37)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_stratabet_chances.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# All code below here is old and needs to be sorted" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.2. Key Entries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.3. Match Info" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.4. Minutes Played" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add code here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Old code from here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.3. String Cleaning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Split `Label` column into seperate `Fixture` and `Score` columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Break down 'Label' column into constituent parts - Fixtures, Score, Date, Home Goals, Away Goals, etc.\n", "df_stratabet['fixture'] = df_stratabet['label'].str.split(', ').str[0]\n", "df_stratabet['score_home_away'] = df_stratabet['label'].str.split(', ').str[1]\n", "df_stratabet['team_home'] = df_stratabet['fixture'].str.split(' - ').str[0]\n", "df_stratabet['team_away'] = df_stratabet['fixture'].str.split(' - ').str[1]\n", "df_stratabet['goals_home'] = df_stratabet['score_home_away'].str.split(' - ').str[0]\n", "df_stratabet['goals_away'] = df_stratabet['score_home_away'].str.split(' - ').str[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Split `date` column into seperate `date_isolated` and `time_isolated` columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet['date_isolated'] = df_stratabet['date'].str.split(' at').str[0]\n", "df_stratabet['time_isolated'] = df_stratabet['date'].str.split(' at ').str[1]\n", "df_stratabet['date_time_isolated'] = df_stratabet['date'].str.split(' GMT').str[0].str.replace(' at ', ' ', regex=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.4. Drop columns\n", "As this is a large dataset with >3mil rows, we will remove every column that is not required at this stage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Display columns\n", "df_stratabet.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# df_stratabet = df_stratabet.drop(['tags', 'dateutc', 'wyId_x', 'label', 'date', 'referees', 'wyId_y', 'date_isolated', 'time_isolated', 'date_time_isolated'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.5. Create New Attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create full fixture data from broken down attributes created in section 4.2." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet['full_fixture_date'] = df_stratabet['date_date'].astype(str) + ' ' + df_stratabet['team_home'].astype(str) + ' ' + df_stratabet['goals_home'].astype(str) + ' ' + ' v ' + ' ' + df_stratabet['goals_away'].astype(str) + ' ' + df_stratabet['team_away'].astype(str) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create 'season' attribute" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet['season'] = '17/18'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.5. Reorder DataFrame\n", "Correctly order all the rows in the DataFrame by date, time, country, fixture, half, and time in the match. Important when looking at events and the following event e.g. is possession retains? Which player receives the pass, etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "df_stratabet = df_stratabet.sort_values(['date_date', 'time_time', 'country', 'league_name', 'full_fixture_date', 'matchPeriod', 'eventSec'], ascending=[True, True, True, True, True, True, True])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.6. Create New Attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create `followingPossession` column\n", "'teamIdNext' = following 'teamId'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet['teamIdNext'] = df_stratabet['teamId'].shift(-1)\n", "df_stratabet['teamNameNext'] = df_stratabet['teamName'].shift(-1)\n", "df_stratabet['fullNameNext'] = df_stratabet['fullName'].shift(-1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create `player2player` column" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet['player2player'] = df_stratabet['fullName'] + ' - ' + df_stratabet['fullNameNext']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create `isPossessionRetained` column\n", "When `teamId` is not followed by the same `teamId` in the following row, possession is lost. We want to creat a column that stats this." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet['isPossessionRetained'] = np.where(df_stratabet['teamId'] == df_stratabet['teamIdNext'], True, False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Clean Positions data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.7. Export DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Break down data into individual matches" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lst_results = list(df_stratabet['full_fixture_date'].unique())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i, g in df_stratabet.groupby('full_fixture_date'):\n", " g.to_csv(data_dir_wyscout + '/engineered/individual_matches/{}.csv'.format(i), header=True, index_label=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Complete dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet.to_csv(data_dir_wyscout + '/engineered/combined/wyscout_events_big5_1718.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.8. Aggregate Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.8.1. Fixture Level" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Select columns of interest\n", "\n", "## Define columns\n", "cols = ['season',\n", " 'date_time_timestamp',\n", " 'fixture',\n", " 'team_home',\n", " 'team_away',\n", " 'teamName',\n", " 'goals_home',\n", " 'goals_away',\n", " 'eventName',\n", " 'subEventName'\n", " ]\n", "\n", "## Streamline DataFrame with columns of interest\n", "df_stratabet_select = df_stratabet[cols]\n", "\n", "## \n", "df_stratabet_select['Opponent'] = np.where(df_stratabet_select['team_home'] == df_stratabet_select['teamName'], df_stratabet_select['team_away'], df_stratabet_select['team_home'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "\n", "## Group DataFrame and Aggregate on 'eventName'\n", "df_stratabet_fixture_grouped = (df_stratabet_select\n", " .groupby(['season', 'date_time_timestamp', 'fixture', 'teamName', 'Opponent', 'goals_home', 'goals_away', 'eventName'])\n", " .agg({'eventName': ['count']})\n", " )\n", "\n", "## Drop level\n", "df_stratabet_fixture_grouped.columns = df_stratabet_fixture_grouped.columns.droplevel(level=0)\n", "\n", "\n", "## Reset index\n", "df_stratabet_fixture_grouped = df_stratabet_fixture_grouped.reset_index()\n", "\n", "## Rename columns\n", "df_stratabet_fixture_grouped = df_stratabet_fixture_grouped.rename(columns={'season': 'Season',\n", " 'date_time_timestamp': 'Date',\n", " 'fixture': 'Fixture',\n", " 'teamName': 'Team',\n", " 'Opponent': 'Opponent',\n", " 'goals_home': 'Goals_Home',\n", " 'goals_away': 'Goals_Away',\n", " 'eventName': 'Event',\n", " 'count': 'Team_Value'\n", " }\n", " )\n", "\n", "## Display DataFrame\n", "df_stratabet_fixture_grouped.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Select columns of interest\n", "\n", "## Define columns\n", "cols = ['Season',\n", " 'Date',\n", " 'Fixture',\n", " 'Team',\n", " 'Opponent',\n", " 'Event',\n", " 'Team_Value'\n", " ]\n", "\n", "## Streamline DataFrame with columns of interest\n", "df_stratabet_fixture_grouped_select = df_stratabet_fixture_grouped[cols]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Join DataFrame to itself on 'Date', 'Fixture', 'Team'/'Opponent', and 'Event', to join Team and Opponent together\n", "df_stratabet_fixture_grouped = pd.merge(df_stratabet_fixture_grouped, df_stratabet_fixture_grouped, how='left', left_on=['Season', 'Date', 'Fixture', 'Opponent', 'Event'], right_on = ['Season', 'Date', 'Fixture', 'Team', 'Event'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Clean Data\n", "\n", "## Drop columns\n", "df_stratabet_fixture_grouped = df_stratabet_fixture_grouped.drop(columns=['Team_y', 'Opponent_y', 'Goals_Home_y', 'Goals_Away_y'])\n", "\n", "## Rename columns\n", "df_stratabet_fixture_grouped = df_stratabet_fixture_grouped.rename(columns={'Season_x': 'Season',\n", " 'Team_x': 'Team',\n", " 'Opponent_x': 'Opponent',\n", " 'Goals_Home_x': 'Goals_Home',\n", " 'Goals_Away_x': 'Goals_Away',\n", " 'Team_Value_x': 'Team_Value',\n", " 'Team_Value_y': 'Opponent_Value',\n", " }\n", " )\n", "\n", "## Replace null values with zeros\n", "df_stratabet_fixture_grouped['Team_Value'] = df_stratabet_fixture_grouped['Team_Value'].replace(np.nan, 0)\n", "df_stratabet_fixture_grouped['Opponent_Value'] = df_stratabet_fixture_grouped['Opponent_Value'].replace(np.nan, 0)\n", "\n", "## Convert Opponent_Value' from Float64 to Int64 type\n", "df_stratabet_fixture_grouped['Opponent_Value'] = df_stratabet_fixture_grouped['Opponent_Value'].astype('Int64')\n", "\n", "## Display DataFrame\n", "df_stratabet_fixture_grouped.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "\n", "##\n", "df_fixture_gw = (df_stratabet_fixture_grouped\n", " .groupby(['Date', 'Team'])\n", " .agg({'Team': ['nunique']})\n", " )\n", "\n", "##\n", "df_fixture_gw.columns = df_fixture_gw.columns.droplevel(level=0)\n", "\n", "\n", "##\n", "df_fixture_gw = df_fixture_gw.reset_index()\n", "\n", "##\n", "df_fixture_gw = df_fixture_gw.rename(columns={'Date': 'Date',\n", " 'nunique': 'Gameweek',\n", " }\n", " )\n", "\n", "## Groupby. See: https://stackoverflow.com/questions/18554920/pandas-aggregate-count-distinct\n", "df_fixture_gw = (df_fixture_gw.groupby(['Team', 'Date']).sum()\n", " .groupby(level=0).cumsum().reset_index()\n", " )\n", "\n", "## Display DataFrame\n", "df_fixture_gw.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Join DataFrame \n", "df_stratabet_fixture_grouped = pd.merge(df_stratabet_fixture_grouped, df_fixture_gw, how='left', left_on=['Date', 'Team'], right_on = ['Date', 'Team'])\n", "\n", "# Display DataFrame\n", "df_stratabet_fixture_grouped.head(50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Export DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet_fixture_grouped.to_csv(data_dir_wyscout + '/engineered/combined/wyscout_aggregated_fixtures_big5_1718.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.8.2. Team Level" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Group DataFrame by Team\n", "\n", "##\n", "df_stratabet_team_grouped = (df_stratabet_fixture_grouped\n", " .groupby(['Team', 'Event'])\n", " .agg({'Team_Value': ['sum'],\n", " 'Opponent_Value': ['sum']\n", " }\n", " )\n", " )\n", "\n", "##\n", "df_stratabet_team_grouped.columns = df_stratabet_team_grouped.columns.droplevel(level=0)\n", "\n", "##\n", "df_stratabet_team_grouped = df_stratabet_team_grouped.reset_index()\n", "\n", "## Rename columns\n", "df_stratabet_team_grouped.columns = ['Team', 'Event', 'Team_Value', 'Opponent_Value']\n", "\n", "## Display columns\n", "df_stratabet_team_grouped.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Export DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_stratabet_team_grouped.to_csv(data_dir_wyscout + '/engineered/combined/wyscout_aggregated_team_big5_1718.csv', index=None, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Exploratory Data Analysis\n", "..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Summary\n", "This notebook scrapes data for player valuations using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) from [TransferMarkt](https://www.transfermarkt.co.uk/) using [pandas](http://pandas.pydata.org/) for data maniuplation through DataFrames and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Next Steps\n", "..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. References\n", "..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "This article was written with the aid of StrataData, which is property of [Stratagem Technologies](http://www.stratagem.co/). StrataData powers the [StrataBet Sports Trading Platform](http://www.stratabet.com/), in addition to [StrataBet Premium Recommendations](http://app.stratabet.com/recommendations).\n", "\n", "***Visit my website [EddWebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Back to the top](#top)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "oldHeight": 642, "position": { "height": "664px", "left": "1119px", "right": "20px", "top": "-7px", "width": "489px" }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "varInspector_section_display": "block", "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }