{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='top'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Webscraping of TransferMarkt Data\n",
    "##### Notebook to scrape raw data using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) from [TransferMarkt](https://www.transfermarkt.co.uk/).\n",
    "\n",
    "### By [Edd Webster](https://www.twitter.com/eddwebster)\n",
    "Last updated: 31/08/2020"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![title](../../../img/transfermarkt-logo-banner.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Click [here](#section5) to jump straight to the Exploratory Data Analysis section and skip the [Task Brief](#section2), [Data Sources](#section3), and [Data Engineering](#section4) sections. Or click [here](#section6) to jump straight to the Conclusion."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "\n",
    "<a id='sectionintro'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='import_libraries'>Introduction</a>\n",
    "This notebook scrapes data for player valuations using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) from [TransferMarkt](https://www.transfermarkt.co.uk/) using [pandas](http://pandas.pydata.org/) for data maniuplation through DataFrames, [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.\n",
    "\n",
    "For more information about this notebook and the author, I'm available through all the following channels:\n",
    "*    [eddwebster.com](https://www.eddwebster.com/),\n",
    "*    edd.j.webster@gmail.com,\n",
    "*    [@eddwebster](https://www.twitter.com/eddwebster),\n",
    "*    [LinkedIn.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/),\n",
    "*    [GitHub/eddwebster](https://github.com/eddwebster/),\n",
    "*    [Kaggle.com/eddwebster](https://www.kaggle.com/eddwebster), and\n",
    "*    [HackerRank.com/eddwebster](https://www.hackerrank.com/eddwebster)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/fifa-league) and a static version of this notebook can be found [here](https://nbviewer.jupyter.org/github/eddwebster/fifa-league/blob/master/FIFA%2020%20Fantasy%20Football%20League%20using%20TransferMarkt%20Player%20Valuations.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "\n",
    "<a id='sectioncontents'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='notebook_contents'>Notebook Contents</a>\n",
    "1.    [Notebook Dependencies](#section1)<br>\n",
    "2.    [Project Brief](#section2)<br>\n",
    "3.    [Data Sources](#section3)<br>\n",
    "      1.    [Introduction](#section3.1)<br>\n",
    "      2.    [Data Dictionary](#section3.2)<br>\n",
    "      3.    [Creating the DataFrame](#section3.3)<br>\n",
    "      4.    [Initial Data Handling](#section3.4)<br>\n",
    "      5.    [Export the Raw DataFrame](#section3.5)<br>         \n",
    "4.    [Data Engineering](#section4)<br>\n",
    "      1.    [Introduction](#section4.1)<br>\n",
    "      2.    [Columns of Interest](#section4.2)<br>\n",
    "      3.    [String Cleaning](#section4.3)<br>\n",
    "      4.    [Converting Data Types](#section4.4)<br>\n",
    "      5.    [Export the Engineered DataFrame](#section4.5)<br>\n",
    "5.    [Exploratory Data Analysis (EDA)](#section5)<br>\n",
    "      1.    [...](#section5.1)<br>\n",
    "      2.    [...](#section5.2)<br>\n",
    "      3.    [...](#section5.3)<br>\n",
    "6.    [Summary](#section6)<br>\n",
    "7.    [Next Steps](#section7)<br>\n",
    "8.    [Bibliography](#section8)<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "\n",
    "<a id='section1'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section1'>1. Notebook Dependencies</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:\n",
    "*    [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;\n",
    "*    [`NumPy`](http://www.numpy.org/) for multidimensional array computing;\n",
    "*    [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation;\n",
    "*    `tqdm` for a clean progress bar;\n",
    "*    `requests` for executing HTTP requests;\n",
    "*    [`Beautifulsoup`](https://pypi.org/project/beautifulsoup4/) for web scraping; and\n",
    "*    [`matplotlib`](https://matplotlib.org/contents.html?v=20200411155018) for data visualisations;\n",
    "\n",
    "All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Libraries and Modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setup Complete\n"
     ]
    }
   ],
   "source": [
    "# Python ≥3.5 (ideally)\n",
    "import platform\n",
    "import sys, getopt\n",
    "assert sys.version_info >= (3, 5)\n",
    "import csv\n",
    "\n",
    "# Import Dependencies\n",
    "%matplotlib inline\n",
    "\n",
    "# Math Operations\n",
    "import numpy as np\n",
    "from math import pi\n",
    "\n",
    "# Datetime\n",
    "import datetime\n",
    "from datetime import date\n",
    "import time\n",
    "\n",
    "# Data Preprocessing\n",
    "import pandas as pd    # version 1.0.3\n",
    "import os    #  used to read the csv filenames\n",
    "import re\n",
    "import random\n",
    "from io import BytesIO\n",
    "from pathlib import Path\n",
    "\n",
    "# Reading directories\n",
    "import glob\n",
    "import os\n",
    "\n",
    "# Working with JSON\n",
    "import json\n",
    "from pandas.io.json import json_normalize\n",
    "\n",
    "# Web Scraping\n",
    "import requests\n",
    "from bs4 import BeautifulSoup\n",
    "import re\n",
    "\n",
    "# Fuzzy Matching - Record Linkage\n",
    "import recordlinkage\n",
    "import jellyfish\n",
    "import numexpr as ne\n",
    "\n",
    "# Data Visualisation\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "plt.style.use('seaborn-whitegrid')\n",
    "import missingno as msno    # visually display missing data\n",
    "\n",
    "# Progress Bar\n",
    "from tqdm import tqdm    # a clean progress bar library\n",
    "\n",
    "# Display in Jupyter\n",
    "from IPython.display import Image, YouTubeVideo\n",
    "from IPython.core.display import HTML\n",
    "\n",
    "# Ignore Warnings\n",
    "import warnings\n",
    "warnings.filterwarnings(action=\"ignore\", message=\"^internal gelsd\")\n",
    "\n",
    "print('Setup Complete')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python: 3.7.6\n",
      "NumPy: 1.18.1\n",
      "pandas: 1.0.1\n",
      "matplotlib: 3.1.3\n",
      "Seaborn: 0.10.0\n"
     ]
    }
   ],
   "source": [
    "# Python / module versions used here for reference\n",
    "print('Python: {}'.format(platform.python_version()))\n",
    "print('NumPy: {}'.format(np.__version__))\n",
    "print('pandas: {}'.format(pd.__version__))\n",
    "print('matplotlib: {}'.format(mpl.__version__))\n",
    "print('Seaborn: {}'.format(sns.__version__))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Defined Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define today's date\n",
    "today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Defined Filepaths"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up initial paths to subfolders\n",
    "base_dir = os.path.join('..', '..', )\n",
    "data_dir = os.path.join(base_dir, 'data')\n",
    "data_dir_fbref = os.path.join(base_dir, 'data', 'fbref')\n",
    "data_dir_tm = os.path.join(base_dir, 'data', 'tm')\n",
    "img_dir = os.path.join(base_dir, 'img')\n",
    "fig_dir = os.path.join(base_dir, 'img', 'fig')\n",
    "video_dir = os.path.join(base_dir, 'video')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "<a id='section2'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section2'>2. Project Brief</a>\n",
    "This Jupyter notebook explores how to scrape football data from [TransferMarkt](https://www.transfermarkt.co.uk/), using [pandas](http://pandas.pydata.org/) for data maniuplation through DataFrames and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.\n",
    "\n",
    "The data of player values produced in this notebook is exported to CSV. This data can be further analysed in Python, joined to other datasets, or explored using Tableau, PowerBI, Microsoft Excel."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "<a id='section3'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section3'>3. Data Sources</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section3.1'>3.1. Introduction</a>\n",
    "[TransferMarkt](https://www.transfermarkt.co.uk/) is a German-based website owned by [Axel Springer](https://www.axelspringer.com/en/) and is the leading website for the football transfer market. The website posts football related data, including: scores and results, football news, transfer rumours, and most usefully for us - calculated estimates ofthe market values for teams and individual players.\n",
    "\n",
    "To read more about how these estimations are made, [Beyond crowd judgments: Data-driven estimation of market value in association football](https://www.sciencedirect.com/science/article/pii/S0377221717304332) by Oliver Müllera, Alexander Simons, and Markus Weinmann does an excellent job of explaining how the estimations are made and their level of accuracy.\n",
    "\n",
    "Before conducting our EDA, the data needs to be imported as a DataFrame in the Data Sources section [Section 3](#section3) and Cleaned in the Data Engineering section [Section 4](#section4).\n",
    "\n",
    "We'll be using the [pandas](http://pandas.pydata.org/) library to import our data to this workbook as a DataFrame."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section3.2'>3.2. Data Dictionaries</a>\n",
    "The [TransferMarkt](https://www.transfermarkt.co.uk/) dataset has six features (columns) with the following definitions and data types:\n",
    "\n",
    "| Feature     | Data type    |\n",
    "|------|-----|\n",
    "| `position_number`    | object     |\n",
    "| `position_description`    | object     |\n",
    "| `name`    | object     |\n",
    "| `dob`    | object     |\n",
    "| `nationality`    | object     |\n",
    "| `value`    | object     |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section3.3'>3.3. Creating the DataFrame - scraping the data</a>\n",
    "Before scraping data from [TransferMarkt](https://www.transfermarkt.co.uk/), we need to look at the top five leagues that we wish to scrape.\n",
    "\n",
    "The web scraper for [TransferMarkt](https://www.transfermarkt.co.uk/) is made up of two parts:\n",
    "1.    In the first part, the scraper takes the webpages for each of the individual leagues  e.g. The Premier League, and extract the hyperlinks to the pages of all the individual teams in the league table.\n",
    "2.    In the second part the script, the webscraper uses the list of invidual teams hyperlinks collected in part 1 to then collect the hyperlinks for each of the players for those teams. From this, the scraper can then extract the information we need for each of these players.\n",
    "\n",
    "This information collected for all the players is converted to a [pandas](http://pandas.pydata.org/) DataFrame from which we can view and manipulate the data.\n",
    "\n",
    "An example webpage for a football league is the following: https://www.transfermarkt.co.uk/jumplist/startseite/wettbewerb/GB1/plus/?saison_id=2019. As we can see, between the subdirectory path of `'/wettbewerb/'` and the `'/plus/'`, there is a 3 or 4 digit code. For The Premier League, the code is GB1. \n",
    "\n",
    "In order to scrape the webpages from [TransferMarkt](https://www.transfermarkt.co.uk/), the codes of the top five leagues need to be recorded from [TransferMarkt](https://www.transfermarkt.co.uk/), which are the following:\n",
    "\n",
    "| League Name on FIFA    | Country    | Corresponding [TransferMarkt](https://www.transfermarkt.co.uk/) League Code    |\n",
    "|------|-----|-----|\n",
    "| LaLiga Santander    | Spain    | ES1    |\n",
    "| Ligue 1 Conforama    | France    | FR1    |\n",
    "| Premier League    | England    | GB1    |\n",
    "| Serie A TIM    | Italy    | IT1    |\n",
    "| Bundesliga    | Germany    | L1    |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See: https://fcpython.com/blog/scraping-lists-transfermarkt-saving-images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "from bs4 import BeautifulSoup\n",
    "from os.path  import basename"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "headers = {'User-Agent': \n",
    "           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Process League Table\n",
    "page = 'https://www.transfermarkt.co.uk/premier-league/startseite/wettbewerb/GB1'\n",
    "tree = requests.get(page, headers = headers)\n",
    "soup = BeautifulSoup(tree.content, 'html.parser')\n",
    "\n",
    "#Create an empty list to assign these values to\n",
    "teamLinks = []\n",
    "\n",
    "#Extract all links with the correct CSS selector\n",
    "links = soup.select(\"a.vereinprofil_tooltip\")\n",
    "\n",
    "#We need the location that the link is pointing to, so for each link, take the link location. \n",
    "#Additionally, we only need the links in locations 1,3,5,etc. of our list, so loop through those only\n",
    "for i in range(1,41,2):\n",
    "    teamLinks.append(links[i].get(\"href\"))\n",
    "    \n",
    "#For each location that we have taken, add the website before it - this allows us to call it later\n",
    "for i in range(len(teamLinks)):\n",
    "    teamLinks[i] = \"https://www.transfermarkt.co.uk\"+teamLinks[i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Create an empty list for our player links to go into\n",
    "playerLinks = []\n",
    "\n",
    "#Run the scraper through each of our 20 team links\n",
    "for i in range(len(teamLinks)):\n",
    "\n",
    "    #Download and process the team page\n",
    "    page = teamLinks[i]\n",
    "    tree = requests.get(page, headers = headers)\n",
    "    soup = BeautifulSoup(tree.content, 'html.parser')\n",
    "\n",
    "    #Extract all links\n",
    "    links = soup.select(\"a.spielprofil_tooltip\")\n",
    "    \n",
    "    #For each link, extract the location that it is pointing to\n",
    "    for j in range(len(links)):\n",
    "        playerLinks.append(links[j].get(\"href\"))\n",
    "\n",
    "    #Add the location to the end of the transfermarkt domain to make it ready to scrape\n",
    "    for j in range(len(playerLinks)):\n",
    "        playerLinks[j] = \"https://www.transfermarkt.co.uk\"+playerLinks[j]\n",
    "\n",
    "    #The page list the players more than once - let's use list(set(XXX)) to remove the duplicates\n",
    "    playerLinks = list(set(playerLinks))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "568"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(playerLinks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "ename": "ConnectionError",
     "evalue": "HTTPSConnectionPool(host='www.transfermarkt.co.ukhttps', port=443): Max retries exceeded with url: //www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.uk/david-de-gea/profil/spieler/59377 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x1a1b398f90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mgaierror\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/connection.py\u001b[0m in \u001b[0;36m_new_conn\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    156\u001b[0m             conn = connection.create_connection(\n\u001b[0;32m--> 157\u001b[0;31m                 \u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_dns_host\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mport\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mextra_kw\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    158\u001b[0m             )\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py\u001b[0m in \u001b[0;36mcreate_connection\u001b[0;34m(address, timeout, source_address, socket_options)\u001b[0m\n\u001b[1;32m     60\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 61\u001b[0;31m     \u001b[0;32mfor\u001b[0m \u001b[0mres\u001b[0m \u001b[0;32min\u001b[0m \u001b[0msocket\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetaddrinfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhost\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mport\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfamily\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msocket\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSOCK_STREAM\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     62\u001b[0m         \u001b[0maf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msocktype\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mproto\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcanonname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msa\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mres\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/socket.py\u001b[0m in \u001b[0;36mgetaddrinfo\u001b[0;34m(host, port, family, type, proto, flags)\u001b[0m\n\u001b[1;32m    751\u001b[0m     \u001b[0maddrlist\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 752\u001b[0;31m     \u001b[0;32mfor\u001b[0m \u001b[0mres\u001b[0m \u001b[0;32min\u001b[0m \u001b[0m_socket\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetaddrinfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhost\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mport\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfamily\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mproto\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    753\u001b[0m         \u001b[0maf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msocktype\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mproto\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcanonname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msa\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mres\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mgaierror\u001b[0m: [Errno 8] nodename nor servname provided, or not known",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mNewConnectionError\u001b[0m                        Traceback (most recent call last)",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py\u001b[0m in \u001b[0;36murlopen\u001b[0;34m(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)\u001b[0m\n\u001b[1;32m    671\u001b[0m                 \u001b[0mheaders\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mheaders\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 672\u001b[0;31m                 \u001b[0mchunked\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mchunked\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    673\u001b[0m             )\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py\u001b[0m in \u001b[0;36m_make_request\u001b[0;34m(self, conn, method, url, timeout, chunked, **httplib_request_kw)\u001b[0m\n\u001b[1;32m    375\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 376\u001b[0;31m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_conn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconn\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    377\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mSocketTimeout\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mBaseSSLError\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py\u001b[0m in \u001b[0;36m_validate_conn\u001b[0;34m(self, conn)\u001b[0m\n\u001b[1;32m    993\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"sock\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m  \u001b[0;31m# AppEngine might not have  `.sock`\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 994\u001b[0;31m             \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconnect\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    995\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/connection.py\u001b[0m in \u001b[0;36mconnect\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    299\u001b[0m         \u001b[0;31m# Add certificate verification\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 300\u001b[0;31m         \u001b[0mconn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_new_conn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    301\u001b[0m         \u001b[0mhostname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhost\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/connection.py\u001b[0m in \u001b[0;36m_new_conn\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    168\u001b[0m             raise NewConnectionError(\n\u001b[0;32m--> 169\u001b[0;31m                 \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"Failed to establish a new connection: %s\"\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    170\u001b[0m             )\n",
      "\u001b[0;31mNewConnectionError\u001b[0m: <urllib3.connection.VerifiedHTTPSConnection object at 0x1a1b398f90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mMaxRetryError\u001b[0m                             Traceback (most recent call last)",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/requests/adapters.py\u001b[0m in \u001b[0;36msend\u001b[0;34m(self, request, stream, timeout, verify, cert, proxies)\u001b[0m\n\u001b[1;32m    448\u001b[0m                     \u001b[0mretries\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmax_retries\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 449\u001b[0;31m                     \u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    450\u001b[0m                 )\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py\u001b[0m in \u001b[0;36murlopen\u001b[0;34m(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)\u001b[0m\n\u001b[1;32m    719\u001b[0m             retries = retries.increment(\n\u001b[0;32m--> 720\u001b[0;31m                 \u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merror\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_pool\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_stacktrace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexc_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    721\u001b[0m             )\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/urllib3/util/retry.py\u001b[0m in \u001b[0;36mincrement\u001b[0;34m(self, method, url, response, error, _pool, _stacktrace)\u001b[0m\n\u001b[1;32m    435\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mnew_retry\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_exhausted\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 436\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mMaxRetryError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_pool\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merror\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mResponseError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcause\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    437\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mMaxRetryError\u001b[0m: HTTPSConnectionPool(host='www.transfermarkt.co.ukhttps', port=443): Max retries exceeded with url: //www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.uk/david-de-gea/profil/spieler/59377 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x1a1b398f90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mConnectionError\u001b[0m                           Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-28-c90af715f1a0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      3\u001b[0m     \u001b[0;31m#Take site and structure html\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m     \u001b[0mpage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mplayerLinks\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m     \u001b[0mtree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheaders\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mheaders\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      6\u001b[0m     \u001b[0msoup\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mBeautifulSoup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtree\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontent\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'html.parser'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      7\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/requests/api.py\u001b[0m in \u001b[0;36mget\u001b[0;34m(url, params, **kwargs)\u001b[0m\n\u001b[1;32m     73\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     74\u001b[0m     \u001b[0mkwargs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msetdefault\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'allow_redirects'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 75\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0mrequest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'get'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     76\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     77\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/requests/api.py\u001b[0m in \u001b[0;36mrequest\u001b[0;34m(method, url, **kwargs)\u001b[0m\n\u001b[1;32m     58\u001b[0m     \u001b[0;31m# cases, and look like a memory leak in others.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     59\u001b[0m     \u001b[0;32mwith\u001b[0m \u001b[0msessions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSession\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0msession\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 60\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0msession\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     61\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     62\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/requests/sessions.py\u001b[0m in \u001b[0;36mrequest\u001b[0;34m(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)\u001b[0m\n\u001b[1;32m    531\u001b[0m         }\n\u001b[1;32m    532\u001b[0m         \u001b[0msend_kwargs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msettings\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 533\u001b[0;31m         \u001b[0mresp\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mprep\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0msend_kwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    534\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    535\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mresp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/requests/sessions.py\u001b[0m in \u001b[0;36msend\u001b[0;34m(self, request, **kwargs)\u001b[0m\n\u001b[1;32m    644\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    645\u001b[0m         \u001b[0;31m# Send the request\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 646\u001b[0;31m         \u001b[0mr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0madapter\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    647\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    648\u001b[0m         \u001b[0;31m# Total elapsed time of the request (approximately)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/anaconda3/lib/python3.7/site-packages/requests/adapters.py\u001b[0m in \u001b[0;36msend\u001b[0;34m(self, request, stream, timeout, verify, cert, proxies)\u001b[0m\n\u001b[1;32m    514\u001b[0m                 \u001b[0;32mraise\u001b[0m \u001b[0mSSLError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrequest\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    515\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 516\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mConnectionError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrequest\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    517\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    518\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0mClosedPoolError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mConnectionError\u001b[0m: HTTPSConnectionPool(host='www.transfermarkt.co.ukhttps', port=443): Max retries exceeded with url: //www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.ukhttps://www.transfermarkt.co.uk/david-de-gea/profil/spieler/59377 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x1a1b398f90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))"
     ]
    }
   ],
   "source": [
    "for i in range(len(playerLinks)):\n",
    "\n",
    "    #Take site and structure html\n",
    "    page = playerLinks[i]\n",
    "    tree = requests.get(page, headers=headers)\n",
    "    soup = BeautifulSoup(tree.content, 'html.parser')\n",
    "\n",
    "\n",
    "    #Find image and save it with the player's name\n",
    "    #Find the player's name\n",
    "    name = soup.find_all(\"h1\")\n",
    "    \n",
    "    #Use the name to call the image\n",
    "    image = soup.find_all(\"img\",{\"title\":name[0].text})\n",
    "    \n",
    "    #Extract the location of the image. We also need to strip the text after '?lm', so let's do that through '.split()'.\n",
    "    src = image[0].get('src').split(\"?lm\")[0]\n",
    "\n",
    "    #Save the image under the player's name\n",
    "    with open(name[0].text+\".jpg\",\"wb\") as f:\n",
    "        f.write(requests.get(src).content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "headers = {\n",
    "    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List of leagues by code for which we want to scrape player data - Big 5 European leagues\n",
    "lst_leagues = ['ES1', 'FR1', 'GB1', 'IT1', 'L1']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign season by year to season variable e.g. 2014/15 season = 2014\n",
    "season = '2020'    # 2020/21 season"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from os.path  import basename"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fetching Links from ES1\n",
      "Fetching Links from FR1\n",
      "Fetching Links from GB1\n",
      "Fetching Links from IT1\n",
      "Fetching Links from L1\n",
      "Collected 98 Links\n"
     ]
    }
   ],
   "source": [
    "### Create empty list of links\n",
    "playerLinks = []\n",
    "\n",
    "url = \"https://www.transfermarkt.co.uk/jumplist/startseite/wettbewerb/{}/plus/?saison_id=2020\"\n",
    "\n",
    "### For loop to iteratre through each league page to collect the team links\n",
    "for league in lst_leagues:\n",
    "    print(f'Fetching Links from {league}')\n",
    "    r = requests.Session() .get(url.format(league), headers=headers)\n",
    "    soup = BeautifulSoup(r.content, 'html.parser')\n",
    "    link = [f\"{url[:31]}{item.next_element.get('href')}\" for item in soup.findAll(\n",
    "        \"td\", class_=\"hauptlink no-border-links hide-for-small hide-for-pad\")]\n",
    "    playerLinks.extend(link)\n",
    "\n",
    "### Print statement for the number of team links found\n",
    "print(f'Collected {len(playerLinks)} Links')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "98"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(playerLinks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "ename": "IndexError",
     "evalue": "list index out of range",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mIndexError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-20-c90af715f1a0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     15\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     16\u001b[0m     \u001b[0;31m#Extract the location of the image. We also need to strip the text after '?lm', so let's do that through '.split()'.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m     \u001b[0msrc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mimage\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'src'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"?lm\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     18\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     19\u001b[0m     \u001b[0;31m#Save the image under the player's name\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mIndexError\u001b[0m: list index out of range"
     ]
    }
   ],
   "source": [
    "for i in range(len(playerLinks)):\n",
    "\n",
    "    #Take site and structure html\n",
    "    page = playerLinks[i]\n",
    "    tree = requests.get(page, headers=headers)\n",
    "    soup = BeautifulSoup(tree.content, 'html.parser')\n",
    "\n",
    "\n",
    "    #Find image and save it with the player's name\n",
    "    #Find the player's name\n",
    "    name = soup.find_all(\"h1\")\n",
    "    \n",
    "    #Use the name to call the image\n",
    "    image = soup.find_all(\"img\",{\"title\":name[0].text})\n",
    "    \n",
    "    #Extract the location of the image. We also need to strip the text after '?lm', so let's do that through '.split()'.\n",
    "    src = image[0].get('src').split(\"?lm\")[0]\n",
    "\n",
    "    #Save the image under the player's name\n",
    "    with open(name[0].text+\".jpg\",\"wb\") as f:\n",
    "        f.write(requests.get(src).content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fetching Links from ES1\n",
      "Fetching Links from FR1\n",
      "Fetching Links from GB1\n",
      "Fetching Links from IT1\n",
      "Fetching Links from L1\n",
      "Collected 98 Links\n",
      "Extracting Page# 1\n",
      "Extracting Page# 2\n",
      "Extracting Page# 3\n",
      "Extracting Page# 4\n",
      "Extracting Page# 5\n",
      "Extracting Page# 6\n",
      "Extracting Page# 7\n",
      "Extracting Page# 8\n",
      "Extracting Page# 9\n",
      "Extracting Page# 10\n",
      "Extracting Page# 11\n",
      "Extracting Page# 12\n",
      "Extracting Page# 13\n",
      "Extracting Page# 14\n",
      "Extracting Page# 15\n",
      "Extracting Page# 16\n",
      "Extracting Page# 17\n",
      "Extracting Page# 18\n",
      "Extracting Page# 19\n",
      "Extracting Page# 20\n",
      "Extracting Page# 21\n",
      "Extracting Page# 22\n",
      "Extracting Page# 23\n",
      "Extracting Page# 24\n",
      "Extracting Page# 25\n",
      "Extracting Page# 26\n",
      "Extracting Page# 27\n",
      "Extracting Page# 28\n",
      "Extracting Page# 29\n",
      "Extracting Page# 30\n",
      "Extracting Page# 31\n",
      "Extracting Page# 32\n",
      "Extracting Page# 33\n",
      "Extracting Page# 34\n",
      "Extracting Page# 35\n",
      "Extracting Page# 36\n",
      "Extracting Page# 37\n",
      "Extracting Page# 38\n",
      "Extracting Page# 39\n",
      "Extracting Page# 40\n",
      "Extracting Page# 41\n",
      "Extracting Page# 42\n",
      "Extracting Page# 43\n",
      "Extracting Page# 44\n",
      "Extracting Page# 45\n",
      "Extracting Page# 46\n",
      "Extracting Page# 47\n",
      "Extracting Page# 48\n",
      "Extracting Page# 49\n",
      "Extracting Page# 50\n",
      "Extracting Page# 51\n",
      "Extracting Page# 52\n",
      "Extracting Page# 53\n",
      "Extracting Page# 54\n",
      "Extracting Page# 55\n",
      "Extracting Page# 56\n",
      "Extracting Page# 57\n",
      "Extracting Page# 58\n",
      "Extracting Page# 59\n",
      "Extracting Page# 60\n",
      "Extracting Page# 61\n",
      "Extracting Page# 62\n",
      "Extracting Page# 63\n",
      "Extracting Page# 64\n",
      "Extracting Page# 65\n",
      "Extracting Page# 66\n",
      "Extracting Page# 67\n",
      "Extracting Page# 68\n",
      "Extracting Page# 69\n",
      "Extracting Page# 70\n",
      "Extracting Page# 71\n",
      "Extracting Page# 72\n",
      "Extracting Page# 73\n",
      "Extracting Page# 74\n",
      "Extracting Page# 75\n",
      "Extracting Page# 76\n",
      "Extracting Page# 77\n",
      "Extracting Page# 78\n",
      "Extracting Page# 79\n",
      "Extracting Page# 80\n",
      "Extracting Page# 81\n",
      "Extracting Page# 82\n",
      "Extracting Page# 83\n",
      "Extracting Page# 84\n",
      "Extracting Page# 85\n",
      "Extracting Page# 86\n",
      "Extracting Page# 87\n",
      "Extracting Page# 88\n",
      "Extracting Page# 89\n",
      "Extracting Page# 90\n",
      "Extracting Page# 91\n",
      "Extracting Page# 92\n",
      "Extracting Page# 93\n",
      "Extracting Page# 94\n",
      "Extracting Page# 95\n",
      "Extracting Page# 96\n",
      "Extracting Page# 97\n",
      "Extracting Page# 98\n"
     ]
    },
    {
     "ename": "AttributeError",
     "evalue": "'str' object has no attribute 'text'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-12-a47161833114>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     52\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mlinks\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     53\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 54\u001b[0;31m \u001b[0mmain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"https://www.transfermarkt.co.uk/jumplist/startseite/wettbewerb/{}/plus/?saison_id=2020\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     55\u001b[0m \u001b[0;31m#main(f'https://www.transfermarkt.co.uk/jumplist/startseite/wettbewerb/{}/plus/?saison_id={season}')\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     56\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m<ipython-input-12-a47161833114>\u001b[0m in \u001b[0;36mmain\u001b[0;34m(url)\u001b[0m\n\u001b[1;32m     39\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     40\u001b[0m         \u001b[0;31m#Use the name to call the image\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 41\u001b[0;31m         \u001b[0mimage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msoup\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfind_all\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"img\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m\"title\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     42\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     43\u001b[0m         \u001b[0;31m#Extract the location of the image. We also need to strip the text after '?lm', so let's do that through '.split()'.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mAttributeError\u001b[0m: 'str' object has no attribute 'text'"
     ]
    }
   ],
   "source": [
    "# Run this script to scrape latest version of this data from TransferMarkt\n",
    "\n",
    "## Start timer\n",
    "tic = datetime.datetime.now()\n",
    "\n",
    "\n",
    "## Scrape TransferMarkt data\n",
    "def main(url):\n",
    "    with requests.Session() as req:\n",
    "        \n",
    "        ### Create empty list of links\n",
    "        links = []\n",
    "        \n",
    "        ### For loop to iteratre through each league page to collect the team links\n",
    "        for league in lst_leagues:\n",
    "            print(f'Fetching Links from {league}')\n",
    "            r = req.get(url.format(league), headers=headers)\n",
    "            soup = BeautifulSoup(r.content, 'html.parser')\n",
    "            link = [f\"{url[:31]}{item.next_element.get('href')}\" for item in soup.findAll(\n",
    "                \"td\", class_=\"hauptlink no-border-links hide-for-small hide-for-pad\")]\n",
    "            links.extend(link)\n",
    "        \n",
    "        ### Print statement for the number of team links found\n",
    "        print(f'Collected {len(links)} Links')\n",
    "        \n",
    "        \n",
    "        \n",
    "        \"\"\"\n",
    "        ### Create empty list of goals\n",
    "        goals = []\n",
    "        \n",
    "        ### For loop to iteratre through each goal to collect each players information and assign to a DF\n",
    "        for num, link in enumerate(links):\n",
    "            print(f\"Extracting Page# {num +1}\")\n",
    "            r = req.get(link, headers=headers)\n",
    "            soup = BeautifulSoup(r.content, 'html.parser')\n",
    "            target = soup.find(\"table\", class_=\"items\")\n",
    "            pn = [pn.text for pn in target.select(\"div.rn_nummer\")]\n",
    "            pos = [pos.text for pos in target.findAll(\"td\", class_=False)]\n",
    "            name = [name.text for name in target.select(\"td.hide\")]\n",
    "            dob = [date.find_next(\n",
    "                \"td\").text for date in target.select(\"td.hide\")]\n",
    "            nat = [\" / \".join([a.get(\"alt\") for a in nat.find_all_next(\"td\")[1] if a.get(\"alt\")]) for nat in target.findAll(\n",
    "                \"td\", itemprop=\"athlete\")]\n",
    "            val = [val.get_text(strip=True)\n",
    "                   for val in target.select('td.rechts.hauptlink')]\n",
    "            goal = zip(pn, pos, name, dob, nat, val)\n",
    "            df = pd.DataFrame(goal, columns=[\n",
    "                              'position_number', 'position_description', 'name', 'dob', 'nationality', 'value'])\n",
    "            goals.append(df)\n",
    "                   \n",
    "        ### Concontate the list of goals\n",
    "        new = pd.concat(goals)\n",
    "        \n",
    "        ### Save DataFrame to a CSV\n",
    "        new.to_csv(data_dir_tm + '/raw/' + f'players_big5_2021_raw_{today}.csv', index=None, header=True)\n",
    "        \"\"\"\n",
    "        \n",
    "## Call defined function\n",
    "main(\"https://www.transfermarkt.co.uk/jumplist/startseite/wettbewerb/{}/plus/?saison_id=2020\")\n",
    "#main(f'https://www.transfermarkt.co.uk/jumplist/startseite/wettbewerb/{}/plus/?saison_id={season}')\n",
    "\n",
    "\n",
    "## End timer\n",
    "toc = datetime.datetime.now()\n",
    "\n",
    "\n",
    "## Calculate time take\n",
    "total_time = (toc-tic).total_seconds()\n",
    "print(f'Time taken to scrape the data of all the players for the Big 5 leagues is: {total_time/60:0.2f} minutes.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import data as a pandas DataFrame, df_tm_players_big5_2021_raw\n",
    "\n",
    "## Look for most recent CSV file\n",
    "list_of_files = glob.glob(data_dir_tm + '/raw/*')    # * means all if need specific format then *.csv\n",
    "filepath_latest_tm = max(list_of_files, key=os.path.getctime)\n",
    "\n",
    "## Load in most recently parsed CSV file\n",
    "df_tm_player_top5_2021_raw = pd.read_csv(filepath_latest_tm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section3.3'>3.4. Preliminary Data Handling</a>\n",
    "Let's quality of the dataset by looking first and last rows in pandas using the [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>position_number</th>\n",
       "      <th>position_description</th>\n",
       "      <th>name</th>\n",
       "      <th>dob</th>\n",
       "      <th>nationality</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Marc-André ter Stegen</td>\n",
       "      <td>Apr 30, 1992 (28)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>£64.80m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>13</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Neto</td>\n",
       "      <td>Jul 19, 1989 (31)</td>\n",
       "      <td>Brazil / Italy</td>\n",
       "      <td>£13.05m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>26</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Iñaki Peña</td>\n",
       "      <td>Mar 2, 1999 (21)</td>\n",
       "      <td>Spain</td>\n",
       "      <td>£2.07m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>15</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Clément Lenglet</td>\n",
       "      <td>Jun 17, 1995 (25)</td>\n",
       "      <td>France</td>\n",
       "      <td>£43.20m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Samuel Umtiti</td>\n",
       "      <td>Nov 14, 1993 (26)</td>\n",
       "      <td>France / Cameroon</td>\n",
       "      <td>£28.80m</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  position_number position_description                   name  \\\n",
       "0               1           Goalkeeper  Marc-André ter Stegen   \n",
       "1              13           Goalkeeper                   Neto   \n",
       "2              26           Goalkeeper             Iñaki Peña   \n",
       "3              15          Centre-Back        Clément Lenglet   \n",
       "4              23          Centre-Back          Samuel Umtiti   \n",
       "\n",
       "                 dob        nationality    value  \n",
       "0  Apr 30, 1992 (28)            Germany  £64.80m  \n",
       "1  Jul 19, 1989 (31)     Brazil / Italy  £13.05m  \n",
       "2   Mar 2, 1999 (21)              Spain   £2.07m  \n",
       "3  Jun 17, 1995 (25)             France  £43.20m  \n",
       "4  Nov 14, 1993 (26)  France / Cameroon  £28.80m  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display the first 5 rows of the raw DataFrame, df_tm_player_top5_2021_raw\n",
    "df_tm_player_top5_2021_raw.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>position_number</th>\n",
       "      <th>position_description</th>\n",
       "      <th>name</th>\n",
       "      <th>dob</th>\n",
       "      <th>nationality</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2885</th>\n",
       "      <td>18</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Sergio Córdova</td>\n",
       "      <td>Aug 9, 1997 (23)</td>\n",
       "      <td>Venezuela</td>\n",
       "      <td>£1.44m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2886</th>\n",
       "      <td>9</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Fabian Klos</td>\n",
       "      <td>Dec 2, 1987 (32)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>£900Th.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2887</th>\n",
       "      <td>13</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Sebastian Müller</td>\n",
       "      <td>Jan 23, 2001 (19)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>£270Th.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2888</th>\n",
       "      <td>36</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Sven Schipplock</td>\n",
       "      <td>Nov 8, 1988 (31)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>£270Th.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2889</th>\n",
       "      <td>39</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Prince Osei Owusu</td>\n",
       "      <td>Jan 7, 1997 (23)</td>\n",
       "      <td>Germany / Ghana</td>\n",
       "      <td>£225Th.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     position_number position_description               name  \\\n",
       "2885              18       Centre-Forward     Sergio Córdova   \n",
       "2886               9       Centre-Forward        Fabian Klos   \n",
       "2887              13       Centre-Forward   Sebastian Müller   \n",
       "2888              36       Centre-Forward    Sven Schipplock   \n",
       "2889              39       Centre-Forward  Prince Osei Owusu   \n",
       "\n",
       "                    dob      nationality    value  \n",
       "2885   Aug 9, 1997 (23)        Venezuela   £1.44m  \n",
       "2886   Dec 2, 1987 (32)          Germany  £900Th.  \n",
       "2887  Jan 23, 2001 (19)          Germany  £270Th.  \n",
       "2888   Nov 8, 1988 (31)          Germany  £270Th.  \n",
       "2889   Jan 7, 1997 (23)  Germany / Ghana  £225Th.  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display the last 5 rows of the raw DataFrame, df_tm_player_top5_2021_raw\n",
    "df_tm_player_top5_2021_raw.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2890, 6)\n"
     ]
    }
   ],
   "source": [
    "# Print the shape of the raw DataFrame, df_tm_player_top5_2021_raw\n",
    "print(df_tm_player_top5_2021_raw.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index(['position_number', 'position_description', 'name', 'dob', 'nationality',\n",
      "       'value'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "# Print the column names of the raw DataFrame, df_tm_player_top5_2021_raw\n",
    "print(df_tm_player_top5_2021_raw.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataset has six features (columns). Full details of these attributes can be found in the [Data Dictionary](section3.3.1)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "position_number         object\n",
       "position_description    object\n",
       "name                    object\n",
       "dob                     object\n",
       "nationality             object\n",
       "value                   object\n",
       "dtype: object"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Data types of the features of the raw DataFrame, df_tm_player_top5_2021_raw\n",
    "df_tm_player_top5_2021_raw.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All six of the columns have the object data type. Full details of these attributes and their data types can be found in the [Data Dictionary](section3.3.1)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 2890 entries, 0 to 2889\n",
      "Data columns (total 6 columns):\n",
      " #   Column                Non-Null Count  Dtype \n",
      "---  ------                --------------  ----- \n",
      " 0   position_number       2890 non-null   object\n",
      " 1   position_description  2890 non-null   object\n",
      " 2   name                  2890 non-null   object\n",
      " 3   dob                   2890 non-null   object\n",
      " 4   nationality           2890 non-null   object\n",
      " 5   value                 2854 non-null   object\n",
      "dtypes: object(6)\n",
      "memory usage: 135.6+ KB\n"
     ]
    }
   ],
   "source": [
    "# Info for the raw DataFrame, df_tm_player_top5_2021_raw\n",
    "df_tm_player_top5_2021_raw.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>position_number</th>\n",
       "      <th>position_description</th>\n",
       "      <th>name</th>\n",
       "      <th>dob</th>\n",
       "      <th>nationality</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>2890</td>\n",
       "      <td>2890</td>\n",
       "      <td>2890</td>\n",
       "      <td>2890</td>\n",
       "      <td>2890</td>\n",
       "      <td>2854</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>82</td>\n",
       "      <td>14</td>\n",
       "      <td>2883</td>\n",
       "      <td>2258</td>\n",
       "      <td>418</td>\n",
       "      <td>149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>-</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Danilo</td>\n",
       "      <td>Mar 25, 2000 (20)</td>\n",
       "      <td>Spain</td>\n",
       "      <td>£1.08m</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>476</td>\n",
       "      <td>511</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>404</td>\n",
       "      <td>89</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       position_number position_description    name                dob  \\\n",
       "count             2890                 2890    2890               2890   \n",
       "unique              82                   14    2883               2258   \n",
       "top                  -          Centre-Back  Danilo  Mar 25, 2000 (20)   \n",
       "freq               476                  511       3                  5   \n",
       "\n",
       "       nationality   value  \n",
       "count         2890    2854  \n",
       "unique         418     149  \n",
       "top          Spain  £1.08m  \n",
       "freq           404      89  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Description of the raw DataFrame, df_tm_player_top5_2021_raw, showing some summary statistics for each numberical column in the DataFrame\n",
    "df_tm_player_top5_2021_raw.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1a1d713090>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAABtAAAAIICAYAAAD346pcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOzdeVSV1f7H8fcZAEFAUFRABAQEEUVFHHDWHMCxNIfKKctuZlrmNa3UtLTS1NJSczbNHCrn0jQ1r1OaQ6apOYCzJuIQo5zp98ddnJ/eure5A/p5rdVK8Jyzvs8f2+c5+7P3dxscDocDEREREREREREREREREQHA6OoCRERERERERERERERERAoTBWgiIiIiIiIiIiIiIiIit1GAJiIiIiIiIiIiIiIiInIbBWgiIiIiIiIiIiIiIiIit1GAJiIiIiIiIiIiIiIiInIbBWgiIiIiIiIiIiIiIiIit1GAJiIiIiIiIiIiIiIiInIbBWgiIiIiIiIiIiIiIiIit1GAJiIiIiIiIiIiIiIiInIbBWgiIiIiIiIiIiIiIiIit1GAJiIiIr+bzWb7ye/sdrsLKhEREREREREREfnzmF1dgIiIiBRNVqsVs9lMbm4un3/+ORaLhdq1axMWFubq0kRERERERERERP4Qg8PhcLi6CBERESlaHA4HBoOBrKwsunbtSnp6Orm5ubi7u/P222/TsGFDV5coIiIiIiIiIiLyu6mFo4iIiPwmdrsdg8GA3W5nxowZBAYG8v777zNz5kzq1avHgAED+PLLL11dpoiIiIiIiIiIyO+mFo4iIiLymxiNRm7dusX8+fM5dOgQycnJxMbGAhASEoLRaOSZZ55h8uTJNGnSxLXFioiIiIiIiIiI/A7agSYiIiK/2dq1a1m5ciXffPMNoaGhwL93ppUvX55//vOfNGnShGeffZatW7e6uFIREREREREREZHfzjRq1KhRri5CRERECreCM88KxMbGYrfbOXDgAAcOHOC+++7D29sbgBIlSlC1alV++OEHJk2aRFJSEsHBwa4qXURERERERERE5DdTgCYiIiL/k9VqxWQyYbPZsFgsXLp0iRIlSlCtWjV8fX3ZtWsXe/bsoX79+hQvXhwAX19fKlWqhI+PD+3atcNo1KZ3EREREREREREpOgwOh8Ph6iJERESkcLLZbJhMJrKyshg2bBhpaWmcPXuW2rVr06lTJ1q3bs2yZcuYNWsW5cqV480336R06dI/+Ryr1YrZrKNXRURERERERESkaNAONBEREfmJgpaNRqOR3NxcHnnkERwOB82aNaN169bs2LGDrVu3YrFY6NGjB25ubmzfvp09e/ZQt25dZzvHAtqBJiIiIiIiIiIiRYkCNBEREXG6du0aJpPpjt1iK1euZO/evYwbN46WLVsSFxdH06ZNSU1NZfPmzQQFBdGhQwfsdjtr1qwhOzubRo0aufAqRERERERERERE/hgtBxcREREAjh49Svv27Tl27Ngdv09NTcVgMBAVFYXBYMBisRAUFMSQIUNwc3Nj2bJlAPTs2ZPRo0fzwgsvuKJ8ERERERERERGRP40CNBEREQEgNjaW3r17Ex8fj81mIz8/HwA/Pz+uXr3K6dOnAXBzc3OGaN27d2fXrl2cP38eg8FA69atMZlM2Gw2F16JiIiIiIiIiIjIH6MATURE5B6XlZVFdnY2AI8//ji3bt2id+/erFixAovF4gzUli5dSnp6OvDvEM1ut5ORkUFUVBRly5a94zNNJtPffh0iIiIiIiIiIiJ/FgVoIiIi97CsrCzGjx/Pzp07AbDZbHh4eJCRkcGUKVPYsGEDSUlJdO/enXnz5jF37lznTrQTJ06wfft2oqKi7jgzTUREREREREREpKjTbJeI/FdWqxWz2YzdbsdoVN4ucjfy9vbGYrEwZMgQXnvtNRYvXsyzzz7LZ599RteuXRk9ejRGo5FBgwZhMBhYtGgRy5YtIyAggPz8fPz8/HjjjTcwGAw4HA4MBoOrL0lEREREREREROQPMzgcDoerixCRwqdgIjwrK4sRI0YwaNAgQkNDXV2WiPwFsrKyeO6559ixYwcRERHMnj3b2ZKxS5cupKWl8corr5CSksLu3bs5ePAg165dIzw8nAcffBCz2ewM3EVERERERERERO4GmukSkZ+w2WyYTCbsdjuvvvoqqampXLt2TQGayF3K09OTrKwszGYzFy5c4OjRowQEBGAymVi2bBldunTh5Zdfxm6307JlS+rUqXPH+202m8IzERERERERERG5q2gHmoj8rLy8PF588UVyc3Pp0qULTZs2dXVJIvInur01q9Vq5ciRI7i5uTF58mS+/vprJkyYQMOGDZ3BWNeuXTl37hzPPPMMDzzwAO7u7q4sX0RERERERERE5C+lQ41E5GdduHCBb775hi1btnDq1CnsdrurSxKRP4nVasVoNGKxWPjhhx84e/Ys8fHxxMbG8vrrr5OYmMg///lP/vWvf1Gwzmbp0qX4+/uzYcMGhWciIiIiIiIiInLX0w40EQHu3I1S8PORI0cYPnw4mZmZTJo0iWrVqrmwQhH5M9x+vuGTTz7J+fPnuXz5MrVr1yYlJYWHHnqImzdvMnToUPbv389rr71G+fLluXz5Mo0bN/7JvxUiIiIiIiIiIiJ3IwVoIoLVasVsNmOxWLh27Ro5OTkEBATg4+PD0aNHGTRoEG5ubrz99ttERka6ulwR+R0cDofzrDK73U6PHj0wGo106NABHx8f5s2bx5UrV0hJSWHIkCFcu3aN4cOHs3nzZvz9/alQoQKLFi3CYDA4z0kUERERERERERG5WylAE7nHFUyEZ2Vl0a9fPzIyMjhz5gyVKlWidevWPPbYYxw5coTBgwdjNpsVookUMZcvX8bX1xcvLy8AcnJy2LNnD2vXrqVTp04kJSUB/27bOn36dHbt2sVTTz1Fp06duHnzJqtWreKHH35g0KBBmM1m5w42ERERERERERGRu5lp1KhRo1xdhIi4jtFo5NatW3Tv3h03Nzd69+5NSkoK7u7uvPvuu1gsFtq0aUNCQgLr1q1j06ZNJCQkEBAQ4OrSReQXnDx5kt69exMbG0toaCgAM2bMYOTIkRw/fpwuXboQHByMxWLBz8+PmJgYtmzZwqVLl2jXrh3FihWjWrVq1K9fH6PRiNVq1c4zERERERERERG5J+gQExHhq6++IjMzkyFDhtCmTRuSk5MJCQkBICwsjHPnzhEXF8ekSZO4dOkSc+bMcXHFIvJrhIeH88QTT1C/fn2sVisWi4XOnTvTq1cvjEYjhw8fBsBkMmGxWChXrhw9e/Zkx44dXLp0CZvNdsfnmc1mV1yGiIiIiIiIiIjI304zYSLCzZs3ycjIwNfXF6PRyKeffsro0aN5/vnniY2NZeTIkQwaNIgaNWqwZMkSwsLCXF2yiPyCgvPOunXrRn5+Pv369SM+Pp5+/frRt29ffvzxR8aNG0doaChNmzbFaPz3mpoff/yRiIgI/Pz8tNtMRERERERERETuWQrQRO4xP3d+kZeXF3l5eVgsFjZv3szgwYN57rnn6NOnDwcPHmT//v1kZGQAEBERAfz/2WkiUrjl5+dz7do1LBYLK1eupHjx4vTs2ZMhQ4YA0L9/f0aOHElcXBx5eXmsWbOGkJAQihUr5uLKRUREREREREREXEcBmsg9JD8/H3d3d2w2G7m5uZjNZooVK0bz5s1JSEigW7du5OTkMGLECB555BEAcnJyCAoKws/P747PUngmUnjZ7XZMJhM5OTm0atWKhx56iAULFvDUU0+xcOFCgDtCtNGjR+NwOGjZsiUeHh68/fbbGAwG7Ha7c2eaiIiIiIiIiIjIvUSzYiJ3udOnTzvPOXJ3dyczM5Pnn3+eHj168Nhjj/Hqq69it9sZMGAAcXFxeHp6Ur16ddLT0/n222956623CAwMJCEhwcVXIiK/ltFoJD8/nz179lCpUiXq168PwOTJk6lcuTILFixgwYIFeHt7M3jwYB555BEMBgOJiYksXLgQd3d3LBaLwjMREREREREREblnmUaNGjXK1UWIyJ/P4XCQk5NDSkoKhw4dokqVKvj7+/PII49w8+ZNEhMTMZlMfPHFF2zatIl27dpRuXJl0tLSmDZtGitXrmTLli14eXnx/vvvYzabsdvtP2n/KCKFj81mo1evXsybN4/y5cvTu3dvTCYTJpOJ5ORk9u7dyxdffAFA7dq1qVq1KtevX2fWrFmUL1+eSpUqaZepiIiIiIiIiIjc0xSgidylDAYD7u7uxMfHs3DhQtLS0rBYLBw9epQxY8bwwAMP0LJlS6pVq8a6devYvn07Tz/9NF26dKFChQrUrl2bFi1aMGjQIMxmM1arVRPqIoXY7QG3w+HAy8uL7777juvXr5OUlERAQICztWNycjL79u1jy5YtZGZm0qhRI2JjY7l16xaTJ08mPDyc6OhoF1+RiIiIiIiIiIiI6yhAE7mL2e12QkNDqVmzJlOnTuXgwYOYTCYGDhyIwWDAYDAQHBxMxYoVWbRoEZmZmdSvX5+oqCgqVapEeHg4RqMRm82G2awjE0UKq4KA22KxcPPmTW7cuEHNmjUJCQnhs88+49SpUzRo0AAvLy9niNaqVSs2btxIdnY27dq1o0SJEkRGRuJwOGjSpAklS5Z09WWJiIiIiIiIiIi4jMHhcDhcXYSI/PkcDgcGgwG73Y7RaGTv3r306dOH/Px85s+fT926de94bb9+/bDZbEyfPl1hmUgRYrPZMJlMZGVlMXDgQC5evMjp06fp06cPTz75JHv27GHYsGHUrl2bMWPGULJkSee/C3a7HeCOs86sVqv+DRARERERERERkXue8ZdfIiJFjdVqxWAwYLPZsFqt5OTkkJiYyNy5czGbzXz44YecO3fO+XqDwUDJkiWx2WwurFpEfg+TycStW7fo2bMnBoOBJ554gvHjx5OQkICvry9NmzbljTfeYN++fbz00ktcu3YNo9GIw+HAaDTeEaQBCs9EREREREREREQAzZKJ3GXsdjtms5msrCwGDx7MjRs3KFGiBMOHDycxMZFZs2bRt29fAHr16kW1atVITU3l4MGD1KpVS5PnIkXQ9u3buXHjBqNHj6Zq1ap3/N3NmzeJjIzkjTfeYNiwYQwYMIDp06fj6+vrfM3tO9BEREREREREREREAZrIXcdoNJKfn0/v3r0BCAgI4OzZs3Tr1o0PPviApKQkZ4j25ZdfEh0djdlspnjx4gwfPhz4//aPIlI0XL58mczMTAIDA4H/H8N5eXlMnz6d9PR03n77bUaMGMGKFSvw9vZ2ccUiIiIiIiIiIiKFm5aci9wlbm+/mJWVRXBwMBMmTGDatGm88sorBAYG8tBDD5GWlkZSUhLz58/HZDJx4cIF+vXrx+LFizGbzc72jyJSON3ebrFAuXLlyMzM5MiRI3e8rlixYlSqVIktW7Zw7do1kpOTmTNnzk/aNoqIiIiIiIiIiMidFKCJ3AVsNhsmk4mcnByWLl3K3LlzOXz4MG5ubhiNRmrWrMnQoUMpV64c3bp1Iy0tjcTERKZMmUJkZCSNGjXCZDJhs9nUwlGkELNarc5dpidPnuTIkSPk5eVRp04dGjZsyMiRIzl48CAGg8HZljE/P5+oqCg8PT3vGN9q2ygiIiIiIiIiIvLfGRwOh8PVRYjIH5ednU3nzp25dOkSxYoVIycnh3fffZeGDRs6X7N7927Gjx/PpUuXeP/996lYsaLz7+x2uybURQqxgqA8KyuL/v37k5aWhoeHB4MHDyY5OZktW7YwdepU0tPTGT16NJUqVeLKlSuMGzeOUqVKMXnyZO0uFRERERERERER+ZUUoIkUYQUT6gCLFy/miy++YMiQIfzwww/Mnz+fo0ePMnPmTOLj453v2bNnD0OHDiUqKopZs2YpOBMpQnJzc+nSpQslS5bkkUceITs7m6ZNm+Ln5wfA3r17mTZtGjt37sTb2xsfHx9Kly7NokWLcHNz03gXERERERERERH5lRSgiRRxeXl5PPvssxQvXpyIiAj69+8PwPHjxxkzZgwnTpxgxowZd4RoR44cISYmxhm+iUjRsHbtWubMmcO0adMICgoCYNu2bWzcuBGLxcLTTz9N2bJl2bFjB+np6QQEBNCwYUNMJhNWq1UtWkVERERERERERH4lzaSJFEG37zzLysriypUrHDlyhF69ejn/Ljo6mhEjRvDqq6/y5JNP8t577zlDtMqVK//kc0Sk8MvMzOT06dP4+vry5ZdfsmLFCj7//HMCAwPJyclh165dLF++nMaNG9/xPp1vKCIiIiIiIiIi8tuoj5NIEWQymcjLy2Pnzp0EBAQwZcoUGjZsyIoVK9i9e7fzdRUrVmTEiBHExMTQpUsXTp069ZPPEZHCyW63/+R3tWvXplSpUtSuXZtnnnmGPXv2MHbsWJYsWcLUqVPJysri6NGjP3mfxrqIiIiIiIiIiMhvYxo1atQoVxchIr+Nw+Fg6NChfPjhhwQGBpKQkEBiYiIHDhxg2bJlxMfHU65cOQBKlSpFTEwMbm5utGvXTucfiRQBVqsVk8lEfn4+58+f5+zZsxiNRkJCQqhRowZ+fn60bduW/v37U79+fby9vblw4QLbtm2jQ4cOlC1b1tWXICIiIiIiIiIiUqTpDDSRImr//v289tprADz++OMkJydz6dIlhgwZQlpaGm+//Ta1atX6yfvUtlGkcLPb7RiNRrKysnj88ce5fv06Z86cITw8nAYNGjB8+HDna7/77jtKly7N2bNnefvttzEYDLz//vsKykVERERERERERP4gBWgiRYDVav3Z84sOHTrEyJEjMZlMd4Rozz//PKdPn+aNN96gfv36LqhYRH6rguAMID8/n+7du+Ph4cGjjz6Kr68vhw8f5o033qBNmzaMGzeO1NRUBg4cyOnTpwkNDaVUqVIsWLAANzc3BeUiIiIiIiIiIiJ/kJaoixQBZrOZnJwc5s+fz5UrV5y/r1q1Kq+88go2m42ZM2eyceNGgoKCGD9+PL6+vixcuNCFVYvIr3H58mVu3bqF0WjEZrMB8O2335KRkUH//v1p0qQJiYmJFC9eHKPRSFJSEqdOnSI6OpopU6bw+uuv89JLL/HBBx/g5ubmbP8oIiIiIiIiIiIiv5/OQBMpIj777DNGjRqF0WgkJiYGLy8vAMqWLUvlypVZsmQJBw8eJCAggOrVq5OcnEynTp3Uyk2kELtw4QJ9+/bl5MmT1KtXDzc3NwBOnDjBJ598wsMPP0zZsmVZs2YNL7zwAoMHDyYpKYmXX36ZUqVKUbNmTWJjYwkPD3cGcD+3W1VERERERP58NpvN+Z3b4XBgMBhcXJGIiIj8mTSzLlJI2e32O37u0KEDzz//PHPmzGHOnDlcvXrV+br4+Hj69evHDz/8wLRp09i2bRulSpXCZDI5d7SISOHj7+9PxYoV2b17N5MmTeLWrVsA+Pj4YLVa+eGHH9iyZQtDhgxh0KBBPP744wAcPnyYjIyMn3yedp6JiIiIiPx9TCYTeXl5ZGVlYTAY0CkpIiIidxctUxcphArOPMvPz+fcuXP4+flRokQJ+vTpg81mY+LEiQD06dOH0qVLA5CRkUH16tWpUaPGHeeeaUJdpHCy2+14eXkxYcIERo4cyY4dO3A4HDz33HMkJCSQkpLCwIEDsVqtvPzyyzz00EMA3Lx5k8DAQAICAlx8BSIiIiIi9za73c7zzz9PXl4e06dP1/dvERGRu4wCNJFCxm63YzabycrKol+/fqSlpeHj40Pz5s3p378/ffv2BWDSpEnY7Xbatm2Ll5cX+/bto3Xr1s5JdrvdrvaNIoVYwepUs9lMo0aNOHjwICtXrsTd3Z2BAwfy5JNPkpWVxfbt2wkICODcuXNcvnyZCRMmUKpUKRo0aODiKxARERERubc5HA7CwsLYsWMH165do3Tp0vouLiIichcxOLS/XKTQKHjQzs/Pp3fv3tjtdlJSUti1axfHjx+nQYMGvPjiixQrVox58+Yxbtw4vL29MZvNlClThuXLl2M2m9V7XaSIyM7Opn379oSFhZGTk8PVq1fJzMykY8eODB48mLS0NKZPn86GDRsoXrw4vr6+BAYGMnfuXNzc3LDZbFrlKiIiIiLyN/m5cOzatWu0bt2azp07M3jwYBdVJiIiIn8FBWgihUx+fj7fffcdU6ZMYciQIVSuXBm73c748ePZtGkTderU4aWXXsLT05MdO3Zw/Phx3N3d6dq1K2azWRPqIkVAwa137Nix7N+/n3fffZfg4GDy8/N588032b17N/Xr1+e5557Dzc2NvXv3kpmZiY+PDwkJCRiNRmerVxERERER+esUPHffvuAVwN3d3fmaWbNm8emnnzJx4kQiIyNdVaqIiIj8yUyjRo0a5eoiRATnrrFevXoxc+ZMTCYTTzzxBG5ubhgMBmrWrMmVK1fYvn07qamp1K1bl8jISGrUqEF8fDxGo1HhmUgRYTAYMBgMfPzxx5jNZrp164bRaMRkMlG7dm3Onz/PypUryczMJCEhgdDQUCpUqEBwcDAGgwG73a6xLiIiIiLyF7t69Srr1q2jTJkyeHt7Y7FY6NmzJ8uWLSMvL4/4+HgMBgOenp4sWrSIqKgoYmNjsdvt6gojIiJyF1CAJlJIFDxc16pVi02bNnH69GliYmKIiIjAaDTi5uZGzZo1SU9PZ+fOnRw4cID77rvvjkl09VkXKRoKdqCtXr2arKwsOnfuDIDFYsHDw4N69eqxfPly9u/fz+nTp2nUqNEdY11fxkVERERE/joFC1z37NnDxIkTCQwMxM/PjyVLltC6dWsOHTrEli1bWLJkCd7e3iQkJODt7c2MGTNISUnBx8fH1ZcgIiIifwIFaCKFiNVqxd/fnxYtWrB27VoOHz5MeHg4oaGhALi5uZGYmMiJEyewWq2kpKRoIl2kiDIYDPj7+zNjxgzn2C4IyYxGIxs3bsThcFChQgWaNGmisS4iIiIi8je5fv06np6ehIeHs3//fubPn88nn3xCdnY2gwYNomXLltSqVYuLFy+ycuVKli9fjre3N9nZ2fj5+WkXmoiIyF1CAZpIIVLQhtHX15fk5GQWLFjAvn37CAsLc4ZoZrOZhg0b0rp1a4xGox7KRYqggjHr7+9PXl4eM2bMwGw2k5iYSH5+PufPn2fjxo306NGDf/zjHxrrIiIiIiJ/k6NHjzJo0CBiY2MpW7YsKSkpTJ48Gbvdzv333090dDS+vr4EBQWRnJxMdHQ0Pj4+LF26lAsXLnDz5k06duyoZ3cREZG7gAI0kULGaDRitVopUaKEM0Q7cOAAYWFhlC9fHgCTyeQ8B0ltG0WKLg8PDypUqIDRaGTGjBls2rSJVatWsWrVKm7dusWoUaOc4ZnGuoiIiIjIX+/UqVP4+fmRkpJCfn4+N2/e5OLFiwQEBLBq1SpKly5NeHg4xYoVAyA0NJQ6derQvHlzAgIC2LFjB97e3sTFxbn4SkREROSPMjgKDmIRkULFarViNpu5dOkSXbt2xWazMX36dOLj411dmoj8yXJycjh27BiffPIJJpOJMmXK8OSTT2I2m7HZbHecfyYiIiIiIn++77//3rmzDODWrVsMGTKE9u3b07x5cwAGDhzIli1b+Oc//8n9999PiRIlnOcbGwwGcnNzef755/H39+eVV15x2bWIiIjIn8Ps6gJE5OeZzWasVitBQUEsWrSIsWPHagWbyF3Ky8uLhIQEEhIS7vi9wjMRERERkb9ednY2w4YN4+LFi6xatYrAwEC+//57jhw5woULF8jJyaF9+/ZMmTKFAQMGMGHCBMxmM+3atcPT0xM3NzcAPD09CQ4O5osvvuDGjRv4+fm5+MpERETkj1ALR5FCrKCdo7+/P23btnX+rFZuIncnm82G0Wh0rmLVWBcRERER+euZzWbKly/PwYMH+eijj7jvvvuIiooiJiaG3bt3s2/fPjw9PalUqRKtW7fm2LFjLFy4kKtXrzJ+/HisVis1atRg7969LFq0CC8vLzp16oS7u7urL01ERET+AAVoIn+Dgknx/2S323/xYGGDweB8jcVica5sE5HC54+MdYfD4Xyvdp6JiIiIiPx9DAYDISEhREREsH37dj7++GOaN29ObGws5cuXZ8+ePezfv98ZoqWkpHDmzBkOHTpE6dKlGTlyJCaTCV9fXy5cuMCzzz5LYGCgqy9LRERE/iCdgSbyFyuYCM/NzWXlypV4enoSEhJCYmIi8O+J9f+2y8ThcDgn3WfOnMnJkyd5/fXXNbEuUghprIuIiIiIFD23P4tbrVa+/vprXn31VfLz81mwYAHBwcF89dVXTJo0CYvFwqOPPkr79u0BOHfuHCEhIRgMBvLz83F3d/+fz/0iIiJStGgHmshfqGBHSXZ2Nh07dmTbtm2sX7+eHTt24OHhQdWqVTEYDD+7O+X2h/gPPviAcePG0blzZ6pXr+6KSxGR/0FjXURERESk6LFarZhMJux2O3l5eXh4eBASEkJ0dDT/+te/+OSTT+7Yifb1119z4MABjEYjcXFxlChRwvmcbzabAX6x84SIiIgUHQrQRP4iBa3cbDYbEyZMwGq1MnHiRJo2bUpGRgYrV66kWLFixMfH/2Ri/fYJ9YULF/Laa68xZswYunXr5spLEpGfobEuIiIiIlL0WK1WzGYz2dnZjBs3jsWLF3Pu3DmCg4OpXLkyMTExPwnRQkND+eyzzzAajTRv3tz5WQrNRERE7k4K0ET+IkajkdzcXD799FP27t1LixYtaNSoEaGhoYSHh5Oens4nn3yCp6fnHRPrwE8m1F955RU6d+7syssRkf9CY11EREREpGix2+2YTCaysrLo1KkTqampeHh4sHr1ai5evEhUVBTx8fHOEG358uW0aNGCSpUqUbNmTbp06aI2jSIiIvcABWgif6F169YxbNgwzp07R0pKCjExMQAEBAQQFhZGeno6K1aswNPT09nirWBCfcGCBbz++uuaUBcpAjTWRURERESKDoPBgMVi4emnn8bX15cpU6bQp08fbDYbH330ERkZGURGRjpDtO3bt/Pee+/x4IMPEh4e7uxAoRBNRETk7qYATebTtH8AACAASURBVORPdHs7NoCYmBgCAwPZvn07V65coVKlSpQpUwb4/4n1jIwMZs6cSXR0NJGRkQDMnTuXCRMmMHr0aE2oixRCGusiIiIiIkXblStX+Oijj3jssceoVq0aV65cYefOnfj7+7NlyxYuXbpExYoViY+Pp0KFCuTk5NC+fXtnaKbwTERE5O6nAE3kT1Jw+LDVaiUrK4vLly/j5+dH5cqVKVOmDGvXriU9PZ3w8HACAgKAf0+sBwcHU6ZMGWcLiGPHjjF48GBeeOEFnYMkUghprIuIiIiIFH1nz55l2rRpdO3alZCQEObMmcP69euZOXMm1apVY/r06fz444/k5ubSokULWrVqpZ1nIiIi9xizqwsQuRvYbDbMZjNZWVkMGzaMkydPcuPGDUJDQ+nfvz+dOnXCYDAwbtw4AP7xj39QqVIlAOLi4oiLi3N+TkhICMuXLycqKspl1yMiP09jXURERESk6LHZbJhMpjt+FxcXx6OPPoqfnx/btm1j+vTpTJs2jZIlSxIUFATAV199hcFgoEOHDs4OFP/5OSIiInL30g40kT+B0WgkNzeXhx56CIfDQXJyMg0aNODIkSMsXryY4sWL061bN0qUKMHHH39Meno6wcHBlC1b9ief4+7uTsmSJV10JSLyv2isi4iIiIgULVarFbPZTF5eHp9//jnff/89pUqVwsvLi8TERAIDA5k9ezZlypRhwIAB2Gw2Dh8+TFZWFi+++CK9e/fGaDT+pI27iIiI3P20A03kd7h+/Tr+/v53/O7zzz8H4OWXX6ZixYoAPPLIIwwcOJBJkyYRFRVFly5dyM/PZ8yYMYSFhVG1atW/vXYR+fU01kXuXZokExERuTsUdJDo2rUrV69e5ebNm0RERDBgwABatGjhfF1aWhqnTp0CYN68efj5+VGrVi0MBgN2u11tG0VERO5BuvuL/EanTp2iRYsWHDt27I7fnzhxguzsbOeEen5+PgBTpkyhQoUKTJ48GYDu3bvzzjvv0L9//7+3cBH5TTTWRe5tBeGZw+FwcSUiIiLye1itVuefX331VcqUKcP06dNZuXIlJpOJKVOmsH79egDq1KmD1WqlS5cu9O3bl7y8PCZPnozBYMDhcCg8ExERuUdpB5rIr1SwEt3b25uhQ4dSqVIl7HY7BoMBg8GAl5cXdrud1NRUIiIicHd3Jz8/H3d3d9q3b8/06dM5d+4c5cuXd65y+7k+7CLiWhrrIve2VatWcfbsWdzc3KhRowZ16tRxdUkiIiLyO5jNZnJycli5ciXe3t40btyYhIQEAN5//30effRR3nnnHTw8PGjbti0+Pj6cPHkSo9FIjx49MJvNzvaPIlI4/a/v2uooISJ/Bp2BJvIrZWVl4eHhQfHixYmLiyM/P5/u3bvj4eFBTEwMnp6ezJs3D3d3d2rWrImbm5vzJr5r1y4uXbpE9+7dcXd3d36mVrGJFD4a6yL3roEDB7Jq1SrOnDnDqVOnmDt3Ljdv3qR8+fI/aecqIiIihd/WrVsZOnQohw4dIjk5mYoVK2K1WilevDgpKSmsXbuWrVu3EhgYSJMmTahZsyY1atTAaDRis9kUnokUYreHZ1u3buXw4cOcOHGCEiVK4O3t7Wy/qhBNRP4IBWgiv0JOTg6DBw8mLy+PuLg4LBYLV65cYefOnSxdupTQ0FDq169P8eLFmTp1Knl5eQQFBeHn58exY8eYO3cuUVFRtGnTRjdukUJMY13k3rVw4UI+/fRTJkyYwFNPPUWvXr2wWCzMnTuXxo0bU65cOYXhIiIihdx/7jgJDg4mMjKSnTt3cuvWLZKSkvDy8sJms+Hl5UVKSgrr1q1j+fLlVKlShdDQUOdn6L4vUnjd3lp14MCBLF68mG3btrF27Vp2797NjRs3qFmzpsaxiPxhWkoj8it4eXkREhLCqFGjsFqtbN68mT59+vDqq68ybtw4XnjhBcxmMw899BAGg4Fx48bx2WefYTabMZvNeHt78/rrrzv7p2tiXaRw0lgXuXedOHGC6OhoEhISMJvNnD59mmXLlvHggw/i7e3Npk2baNWqlavLFBERkf+ioN2izWYjNzcXq9WKl5cX7dq149atW4wePZpx48bx0ksv4e3tjc1mw8/Pj9mzZzN+/Hjq1q0LoGd4kSKgYJxOnjyZb7/9lokTJxIaGkrJkiXp0aMH06ZNo0qVKtSvX9/FlYpIUacATeR/OHLkCMWLFycsLIyePXuSnp7OK6+8QlBQEElJSQAMGzYMu93OkCFDePPNN+nduzeJiYls3bqVrKwswsPDefDBBzGZTOqfLlJIaayLSH5+PpmZmZjNZs6dO0eXLl2oV68eL774IitWrGD27NlUr16dMmXKaGJNRESkkClot5iVlcWLL77I+fPnSU9PJyoqiqeeeooHH3wQg8FAQROm20O0kiVL8sYbbzg/R2cXixQd33//PXXr1qVKlSp4enpy/fp1Tpw4Qffu3fHx8eHrr7+mVq1ari5TRIowtXAU+S+ysrLo06eP85wjPz8/li9fzvXr17lx4wZBQUHExsbi7e1NjRo1OHv2LLNnzyY8PJx69epRu3ZtGjRoQJUqVdQ/XaQQ01gXEYDjx4+zadMm/P39GTRoEPXq1WPMmDEUL16cLVu2cOLECR5//HGNbxERkUKgoNvD7e0Wc3Nz6datG7du3aJZs2ZERUVx8uRJZs6cSWRkJO3bt6d06dLMnj2b9PR0EhMT8fT0vONz1e5NpPCy2Wx3jNGcnBymTp1KWFgYLVu25PTp03To0MG5CG7q1Kl88803tGzZUgvgROR3U4Am8l+4u7vTtGlTmjZtSm5uLpcvXyYqKop27dqRnZ3N7NmzCQwMvGNi/fz588ybN4+goCCio6PvuEHrQVykcNJYF7k3zZ07ly1btnDq1Cni4+OpVasWX375JYsXL6ZWrVqMHz8eb29vbty4wapVq/Dx8aFVq1a4ubm5unQREZF73rVr1/Dy8rojRFuyZAnHjx9n0qRJNG/enHr16lGrVi2ysrKYMmUK9erVo3nz5pQtW5Z33nkHb29vEhMTXX0pIvIrFXzXPnLkCD4+Pnh6enL27Fm++uorSpYsydNPP+1cBOft7c3mzZtJTU2lY8eO2lkqIr+bAjSR/8HX1xeHw8HTTz/NvHnz6N69O7GxsURERJCRkcGcOXMoW7YslStXxtvbm+rVq/P9999z6NAhHnjgAVeXLyK/ksa6yL3lueeeY+nSpaSlpbFu3TrOnTtH8+bNqVGjBsePH+f48eMA7N27l2XLlrFjxw4mTpxIYGCgiysXERGREydO0KtXL6pXr05gYKBzMduqVau4ePEivXr1wmQyYTAY8Pf3JyIigv3793PkyBGSk5OJjo6mevXqdOjQQYvfRIqY9957j0GDBlGlShUiIiIwmUxs2LCBlStXkpCQwPTp0/Hw8ODHH39kyZIlVKhQgfvuu09jXUR+NwVoIj+jYAVbwZ+Dg4PZvHkzmzZtol69ekRERFChQgWuX7/OvHnzKF26NCEhIezevZsHH3yQnj17anu4SBGgsS5y77l8+TIrV65k7NixdO/enSpVqjBnzhzOnDlDp06daNWqFadPn2bv3r189913lCxZkvHjxxMdHe3q0kXkF9jt9p+9L99+vxeRou/MmTOEhYXRvHnzO1q6bdiwgTNnzvDoo49iMBiwWCyYTCZKlixJWloaX331FZ07d6Z48eKEh4djNBqxWq2aWBcpQqpUqcLu3btZs2YN4eHhNG7c2HnWmZ+fH15eXhw/fpz58+ezf/9+xo4dS6lSpVxdtogUYQrQRP6D1WrFZDJhs9mw2+3k5uYSHh5OfHw8a9eudU6sR0ZGEhERwfXr15kxYwbvv/8+P/74I927d8doNP7XL/AiUjhorIvce44cOcI333zD0aNHefjhhwkKCqJ8+fKEhIQwe/Zs0tLSaNu2LS1atKBVq1Z069aNlJQUypYt6+rSReQX2Gw2Z3ums2fPcuHCBby9vTGbzRgMBt2vRe4C165dw9PTk6CgIOLi4rBYLPTr14/MzEzi4+Px9/dn0aJFXLp0iaZNm97Rsu3AgQPcuHGDBx54AHd3d+fvFZ6JFF4FIfjtP3t4eNCuXTs2b97MihUriIiIoHXr1pQrV45Tp04xZ84cDh8+jNVqZfLkyVoEJyJ/mAI0kdvYbDbMZjPZ2dmMGDGC9957j3379hETE0NsbOxPJtYjIiKIjo6mcuXKhIaGMnbsWEwmEw6HQw/iIoWYxrrIvWfQoEFMmzaNzZs3k5mZSbNmzQgICMDd3Z2wsDBCQkKYN28eqamp3HfffXh6euLu7o7ZbHZ16SLyX+Tk5DB37lxiYmIoVqwYAEOGDOG9995j9uzZbN68mbNnz1K3bl1MJpNCNJEiLCcnh9mzZ5Oamkp8fDwGg4EbN27wySefsGvXLvz8/GjSpAk//vgjn376KWfOnKFZs2ZkZ2dz/vx5Zs6cSVRUFCkpKa6+FBH5HywWCzdu3MDLy8sZni1atIj4+HhMJhNWqxV3d3fatWvHl19+yfLly6lQoQItW7akTZs2tG/fnocffpgHHniA4OBgF1+NiNwNFKCJ3MZoNJKdnU2nTp2w2WxUrlyZ8uXLU716dby9vQkMDKR69eqsXr2azZs3k5SURHh4OJUrV6Z+/frOFhA6nFSkcNNYF7m3vPfee2zbto2+fftStWpV9u/fz/nz56lVqxbe3t64ubkRFhZG+fLlmTp1KlevXqVp06auLltEfsHmzZt5+eWXyc7Opm7durzzzjts2bKFfv360b17dzIyMti5cyebN2+mTZs2mM1mtXMUKaKsVitLlixhx44dwL8Xxtx///3UrFmT1NRUVq9eTbly5Xj44YfJzMzk448/ZtmyZaxevZo1a9ZgNpuZPn06RqNR/w6IFFIWi4VOnTpx5swZ5/nj+/bt45lnnuHYsWO0bt3a+V3c3d2d1q1bs3HjRjZs2EC5cuUIDAwkICDAuRBOROTPoABN5D9MnTqV9PR03nrrLdq2bUvt2rW5ePEiu3fv5vTp09SrV48aNWrw6aef8tFHH5GcnIyPj4/z/dqNIlI0aKyL3Bs2bNjApUuXqFWrFj179iQxMZGKFSs6zz1LSEhwhmjly5cnOjqa5s2bU7JkSVeXLiK/IDQ0lKCgIKZNm8bNmzfJysqiQYMG9OjRg9DQUOrVq4enpyebN2/m+PHj3Hfffbp/ixRRbm5uNGzYkJUrV7J+/XrMZjPdunUjIiKCoKAgzpw5w4oVKyhfvjw9evQgKSmJrKwswsLCqFu3LmPHjsVsNmsRnEghZjKZOH36NB9++CFGo5GYmBjCw8MJCQnhww8/5NChQ84QLT8/n2LFilGmTBmWLFnCt99+S2RkJBUqVHD1ZYjIXUYBmsh/+PjjjylWrBhdunRh3759zJ07lxdeeIEvv/ySNWvWkJ+fz/33309sbCzp6el07NhRX8RFiiCNdZG737Fjx+jVqxd79+6lSpUq1K1bF4Dw8HBiYmKc557VrFkTb29v3N3dqVixosIzkSLCZDJRsWJFSpcuzaxZs/j222+pU6cOiYmJ2Gw2PDw8iIyM5MqVK2zbto0WLVrg6+vr6rJF5HfIz8/Hy8uLefPmkZ2dTcmSJQkICCA2Npbg4GCCg4OdIVrJkiVp3LgxzZo1o1GjRtSoUQOj0ehs4y4ihU9Bm+WGDRvicDiYOXMmDoeDKlWqUL16dcqUKcMHH3zA4cOHSUlJcY7lo0ePcuvWLUJCQkhJScHPz8/FVyIidxsFaHJP+7nWDZcuXWLx4sX861//YsmSJezfv5+BAwfy5JNPUq5cOZYtW0a7du2IjIwkJSXF+SCuiXWRwktjXeTeFBAQQM2aNdm6dSs3b94kPj6egIAA4N8hWqVKlZg/fz4HDx6kXr16FC9eXC2dRIqA2+/rJpOJyMhIypUrx65du/Dw8KBly5bOnSYeHh5ER0fz7rvvUrt2bSIiIlxcvYj8FgWT6gW7xqKioujUqRPbt29n//79eHh4ULlyZWeIdvbsWdatW4fJZCI+Pv6Oz9JzvEjhZTAYnN+369Spg81mY9asWdjtduLi4qhevTply5blgw8+4LvvvqNWrVpcu3aNVatW4efnx9ixYxWeichfQgGa3LMKWjfY7XZyc3PJy8vDaDRSsWJFjEYjFy5coFGjRgwdOpS2bdtSrlw5jhw5wrlz5+jatSseHh7Oz9KDuEjhpbEucu85deoUVqsVh8NBZGQk0dHRfPjhh1y4cIG4uDj8/f2Bf4do4eHhrF69mk6dOuHt7e3iykXkl9y+mMVmszlDssqVK+Pr68uCBQuwWq0kJSU5X5eamsrWrVvp2LEjQUFBrixfRH6Dguf4/Px89u3bR15eHqGhoVSoUIH69euzfv16Dhw4cEeIFhoayr59+7h+/TopKSlaGCNSBBTc22//vl23bl0sFguzZ8++I0QLDAxk6dKlzJo1i40bN3LkyBFGjBhB6dKlXXgFInI3MzgcDoerixD5u9lsNkwmE1lZWQwfPpzz58+TkZFBxYoVGTBgAFWrViU/Px93d3fn/0+ePOm8KU+ePFkP4iJFgMa6yL1n5MiRbN++HZvNRpMmTejbty8hISFs3bqVZ555hqSkJJ5//vk7zkfIycnBy8vLhVWLyK9RcF8HmDJlCidOnOD8+fPExMTQo0cP4uLi+PDDDxkzZgydO3emXbt2mM1mli1bxvbt2/noo48oW7asi69CRH6N25/jn3jiCU6dOoXBYKBp06b069eP0NBQzpw5wzPPPAPAI488Qp06dcjLy8Pd3Z3Q0FCMRuPPdqIQkcLj9nv7ypUrycvLo0SJEqSkpADw1ltvMWPGDB599FH69u1LyZIlOXPmDEuXLsXPz4+WLVsSHh7uwisQkbuddqDJPcloNJKbm0u3bt3Izc2lcePGlC9fnmPHjjF79mwqVqxIdHQ0W7dupVevXqxZs4Y1a9ZgMBiYNWuWczeLHsRFCjeNdZF7y8CBA9m5cyddu3bF39+fzZs3c/bsWeLi4oiPjyc2NpZZs2Zx8eJFoqOjnWedubm5ubhyEfklVqvVed7JoEGD2LhxIxUrVsTLy4vDhw8zd+5cYmJi6NChA35+fsyePZuPPvqIq1evcu7cOSZNmqQJNpEixGg0kpOTQ+fOnXFzc2PAgAH4+fmxYcMGzp8/T2xsLGFhYdSrV4/PP/+cLVu2MGPGDA4ePMg//vEPDAYDdrtdHSRECjG73e4Mz5599lkWL17Mli1b2Lt3LxcvXqRRo0YkJSXdsRMtKiqKkJAQ6tevT82aNdW2UUT+cgrQ5J5TsAJt0aJFnDp1ikmTJtGyZUsaNmzo7KE8bdo0mjVrRkhICF5eXhQrVoykpCTGjh2Lm5ubs5WEiBReGusi95aPPvqIL774gilTptCmTRuqVavGpk2bOHXqFGlpaVStWtUZok2ePJmbN2/SrFkzjXGRQiw3N5ePP/6YqlWrOneSbNy4kdWrV/Pmm2/So0cPUlJSqFWrFtevX2fq1Kk0aNCAFi1aEBgYyKZNm2jRogVvvPGGWjeKFEHvvvsu165dY/LkySQkJBAQEMC2bds4c+YMaWlpxMXFERoaSr169XB3d6dmzZqMGzfOeW/XIjiRwq1gjL700kvs27eP1157jccee4y0tDQ2bNjA1atX7wjR5syZQ15eHrGxsWq9LiJ/G7OrCxD5uxXcoFNTU3E4HJQtW9Y50R4VFcWTTz7JiRMnmDJlClOmTOHxxx+/4/02m825+lVECi+NdZF7y8WLFyldujRhYWE4HA7Wr19PqVKliI2NZfPmzUyYMIGhQ4fSuHFj5syZQ2BgoHaeiRRy7777LkuWLCEjI4Onn34ag8HAmTNnsFgshIaGOu/1lSpV4umnn+b8+fO8/vrrzJkzh+TkZBwOB9WqVaNYsWIuvhIR+T3Onz9PQEAApUqVIi8vjxUrVhAZGUmZMmVYvnw5JpOJQYMGUb58eZ544gnns/vtO1ZFpPC5vW1jamoq+/fv5+WXX6ZRo0ZYLBZKlSpFuXLlWLduHfDvgG3QoEHk5ubyySefMGDAAFeWLyL3GO1ll3uOzWYD4NatW+Tl5eHh4YHBYMBisQAQHR1NrVq1OHbsGLm5uT95v1aqixQNGusi9wa73Q5ARkYGOTk5+Pj4sHfvXiZOnEhycjKvv/46tWrVYv369fTq1YsDBw5Qv359IiMjXVy5iPySnj170rhxY1avXs1bb70F/P99vVSpUgDO+3pkZCTNmjUjNTWVzMxMPD096dixo8a6SBHhcDicfy64t1ssFux2O2azmSVLlrB06VKeeOIJhg8fTuXKlVm/fj2dO3fm6NGjdwRmCs9ECreC79rbt2/nxo0b3LhxA3d3dwA+/fRTDh06RK9evahbty4LFy5k4sSJALz44ots3LjR+QwgIvJ3UIAmdz2r1Qrc+UAO0LFjR86dO8eYMWOAO88/8fHxoWzZsnrwFilCNNZF7k0FZ5s888wz9OrVC7vdzvDhw2nTpg2dO3cGwNvbm4iICKpUqYKvr68ryxWRXyk/P5+yZcsycuRIqlSpwoYNG5g7dy5NmjTBYDBQcBKBm5ubc9FMQEAA3t7eatsmUsRYrVYMBgM2m835H8Dw4cN59NFHycjI4J133mHYsGFUqVKF/Px8DAYDdevWpUuXLlSsWNHFVyAiv9WiRYsYOXIkOTk5BAQEUKZMGVJTUxk1ahQtW7bkwQcfdHaJmTVrlvM539/f38WVi8i9RjOGclcraMGWnZ3NtGnTOHPmDKVKlaJ+/frcd999dO3alXXr1uFwOBgxYgQ5OTlcvnyZbdu2UalSJTw9PV19CSLyK2isi9x7pk2bxokTJwgLC6NBgwYkJibSvn17Tp8+TX5+Ph07dgQgPT2dmzdv0r59e3r16qXxLlIE2O1250r09evX43A4OHv2LO+88w7dunWjbdu2bNq0iTFjxjB8+HBMJhPXrl1j+/btlCtXjuLFi7v4CkTk17r9Of7ll1/m0qVLxMTE0KpVK+rUqUOZMmXYs2cPHh4eNGnSBPg/9u48SMr6Tvz4p5kRRgTkGDnlWmARIgYSBQ+SgKtIXHbdwSNm8cSDNRGFQklcVw0hrruYUK6CaGlWE9GIZ8ogKTWJoCBB1ogEL47FK7JeSBQ5h+7fH9bM74tXwDxj9zSvV9VU9TTdHz6NPsUw73me/vDSjrW1tXH88cfHP/3TP9XPcQUJaDy+9rWvxdVXXx1/+tOfYs6cOdG8efO47LLL4ktf+lKcd955ERGxatWq6NWrV1xwwQXRr1+/+h+cA/gi5Qof/VF9KBN173W0efPmOP7446OioiJat24dGzdujOeffz7OO++8GDVqVMyZMyfuvPPO6NSpU+y9996xdevWaNasWdxzzz1RWVlZPwcoTY512PNMmDAhFi9eHF27do1XX301unTpEhdccEEMGzYs3njjjRg1alQMHTo0TjzxxLj//vtj8eLFcccdd0S3bt2KvTqwGyZPnhxLliyJE044Iaqrq+Pee++NiIgePXpEy5Yt49e//nV07NgxOnToEJs2bYoXXnghZs+eHX379i3y5sCuqPv6e8uWLXHCCSdEkyZNon379rFq1apo165djB8/PoYPHx5r166N4447LoYNGxZHHHFE3HXXXdGkSZO48847o6KiwtfxUOI+Gri3b98ee+21V8yYMSMefPDBuOaaa6Jv375x7rnnRtOmTWPGjBnx7rvvxvTp0+Odd96JadOmRYsWLYr4CoA9WcUP6q59AWWk7gvofD4f8+bNi+effz5+8pOfxFlnnRXDhw+Pjh07xrXXXhutW7eOiy66KA4++OB49913Y//9948hQ4bEVVddFZWVlVFbW+un2KCEOdZhz7N06dKYN29eXHXVVTFhwoQ44IAD4umnn44nnngi2rdvHwceeGBUVlbGPffcE7/97W/j3XffjVmzZnkfJGhkVq9eHTfeeGNcfPHFccopp8SAAQPiyCOPjFdeeSVeeumlaNeuXZx55pnx9ttvx3vvvRe9evWKqVOnupQbNBLp1/GPPPJIrF27Nq6++uo49dRTo0+fPvH000/HokWLomPHjjFo0KCoqqqKefPmxTPPPBMdO3aMW265JSorK2PHjh3OSoESV3eM1sXxun975/P5eOihh2L//fePAw88MF577bX4xS9+Ec8991zce++9sXTp0pg2bVp06tSpmOsDeziXcKQs1V0/fcyYMdGsWbNo165d/TfOOnToECeffHLk8/mYNm1aDBgwIIYNGxaHHHLITjPqLiUBlC7HOuxZpkyZElu2bIm2bdvGQQcdFBEfXv6lUCjErFmz4r/+67+iqqoqxo4dGyNGjIj/+7//ix49ekR1dXWRNwd2144dO+Ldd9+NZs2aRcSHP61eXV0d3/3ud+PSSy+NefPmRffu3WP69OkR8eE34XwTHRqPuq/jTznllGjWrFm0bdu2/kzxoUOHRj6fj1mzZsU111wTzZo1izPPPDOOO+64WL9+ffTq1StyuVzU1tb6Oh4aiRkzZsSMGTNi7NixMXLkyDjooIPi0EMPjaOOOiquvfbaGDlyZIwZMyby+Xw8/vjj0alTp/jhD3/oB2OAonMGGmWrSZMmUV1dHTfeeGNs3749Ro4cWX/Kd2VlZXTq1CkWLVoULVq0iCFDhkQ+n49cLlf/k3D+AQ6Ng2Md9hz5fD5+8pOfRD6fj29+85v1x3r37t2jffv28fTTT8fChQujTZs2MWjQoOjcuXM0b968yFsDn0dtbW3cfffdUV1dHYcffnhUVFTEtm3bomXLltG3b9+4/fbbY/ny5bFly5YYPHiwS7hBI1T3dfwNN9wQlZWVMXz48I/93f7MM8/EY489Fq1atYqDDjoo2rZtW3/mmitIQOPRuXPn6NSpU9x+++2xaNGiJATOtwAAIABJREFU+MMf/hCDBg2K7t27x7PPPhubNm2KQw89NAYPHhyjRo2KESNGRPv27Yu9NoCARnnr0aNHHHLIIXHbbbdFx44do0+fPvVvSN6iRYu4//77o2XLljFs2LD6f3D7hzc0Po512DP07NkzBg8eHD/72c+ioqIi+vTpE/vss09EfPiNto4dO8aCBQti5cqVccwxx8Ree+3lWIdGqkWLFpHP52PmzJnRpUuX6NevX/03y5977rlYsWJFHHnkkfHtb387WrVq5ViHRqpHjx5x8MEHx8033xxNmzaN3r177/R3e/v27eN3v/tdbNu2LY466qj65znmoXFp1apVDBo0KI499tiorKyMxx57LObMmRNbt26Nl19+OTZv3hwjR46MXC4XlZWVAjlQMgQ0yt7+++8fX/7yl+PSSy+NNm3aRNeuXaN58+axcuXK+NWvfhVHHHFEDBw4sNhrAn8lxzrsGbp06RKDBg2KH/zgB9G0adPo1atX/TfaunXrFj179ozRo0dHdXW1b65BI9e7d+9Yt25dzJo1KyoqKqJNmzaxbt26uPfee6OysjL+7d/+Lfbbb79irwn8lfbff/8YNGhQXHHFFR/7u7179+4xYMCAOPnkk105AspAy5YtY9CgQXHyySfHhg0bYt26dfGHP/whVq5cGe3atYsBAwYUe0WAneQKhUKh2EvAF+Hxxx+Pc845J3r37h3dunWLt99+O7Zv3x533XVX7LXXXsVeD8iIYx32DAsXLoyzzz47xo0bF6eccopvokOZWr9+fdx6663x3//931FVVRV77713VFRUxKxZs6Jfv37FXg/IUN3f7f/yL/8SY8aM+djf7Tt27HBWCpSB9LLLb731VixZsiRmz54dV111VfTs2bPI2wHsrNEGtDfeeCOOPfbYGD9+fJxxxhnFXodG4ve//32cccYZcdBBB8WoUaPitNNOi4gP35TcN9ahfDjWYc+wcOHCGDduXIwZMybOPffcqK6uLvZKQANZs2ZNrF69OvbZZ5/o06dPdOjQodgrAQ1g0aJFcfbZZ8eYMWPiu9/9brRp06bYKwEN4KPvXbpt27b6t2EAKCWN8vz3Dz74IMaPHx8bN24s9io0Moceemjccsst8eyzz8b69evjvffei4jwDXUoM4512DMMHTo0Zs6cGffdd5/LOkGZ69WrVxxzzDExdOhQ8QzK2BFHHBEzZ86MFStWROvWrYu9DtBAPnqpdf9WB0pVozsD7U9/+lOMHz8+nn322YiIuOSSS5yBxm5buHBhnHvuuXHKKaf4iXUoY4512DNs2rQpmjdvXuw1AICM1J2d8tGzVAAAvkiN6kd1b7311viHf/iHeOGFF+LQQw8t9jo0YkOHDo3rr7/eT6xDmXOsw55BPAOA8iKeAQCloFF9N/HnP/95dOnSJWbPnh3HHXdcsdehkRs2bFg89thj0bZt22KvAjQgxzoAADQ+4hkAUGyVxV5gd0yZMiUOP/zwqKioiJdeeqnY61AG/MQ67Bkc6wAAAADA7mhUAe1rX/tasVcAAAAAAACgzDWqgLa7hg0bVuwVgC/ANddcExEREyZMKPImQENyrMOew/EOewbHOuw5HO+wZ5k/f36xV+ALUlNTE1//+tdj4sSJxV6lQTSq90ADAAAAAACAhiagAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEhUFnuBz2v06NExevToYq8BAAAAAABAmXEGGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASFQWewEAAAAAAAAa1sqVK2Px4sU73VcoFHZ7Tt1zNmzYkMlepUpAAwAAAAAAKHPjxo3LfObcuXNj4sSJmc8tBQIaAAAAAABAmbvhhhviiSee2KXH5nK5+jPNPu32z3/+8xg1alTDLFsCBDQAAAAAAIAy17dv3+jbt29m8x544IHMZpUiAQ0AAAAAAKDM1dbWxubNm3e6r1Ao1J9Rls/n6z9Pb3/087rbO3bsKMbL+MIIaAAAAAAAAGXu6KOPznzm/PnzvQcaAAAAAAAAjdN5550Xt9122y49dlfeA23jxo3xla98pWGWLQECGgAAAAAAQJmbO3dubNy4MdOZr776aqbzSkmTYi8AAAAAAABAw6qurs58ZlVVVeYzS4Uz0AAAAAAAAMrc9OnTM51XU1MTvXr1ynRmKXEGGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAAicpiLwAAAAAAAEDDWrhwYcydO/evnlMoFCIiYsOGDbFt27a/el6pEtAAAAAAAADK3GWXXZb5zN/97nfxve99L/O5pUBAAwAAAAAAKHN33XVXrF69uv7zQqEQ+Xw+CoVC/Ufd5xGx06990u0ZM2bEyJEji/VyGpyABgAAAAAAUOZOOumkzGcuWrQoJk6cmPncUtCk2AsAAAAAAADQsE499dTMZ/bv3z/zmaXCGWgAAAAAAABlbuzYsTF27NjM5tXU1ESbNm0ym1dqnIEGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAECistgLAAAAAAAA0LDeeuutWLVq1S49NpfLRaFQ+MzbW7ZsaZhFS4SABgAAAAA0qIEDB8b8+fOLvQbAHu2kk07KfOYDDzwQEydOzHxuKRDQAAAAAIAGtWzZspgwYUKx1wC+AGL5nqVDhw7FXqHBCGgAAAAAAABl7tFHH81sVqFQiJqamhgyZEhmM0tNk2IvAAAAAAAAQOORy+Uil8sVe40GJaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAECistgLAAAAAAAA0LC+//3vx5IlSzKd+dJLL2U6r5Q4Aw0AAAAAAKDMNUTsev/99zOfWSqcgQYAAAAAAFDm7rzzzkzn1dTUxIABAzKdWUqcgQYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACBRWewFAAAAAAAAaFgPPvhg3HXXXbv02FwuF4VC4TNvb9iwITZt2tQwy5YAAQ0AAAAAAKDM/fjHP8585uLFizOfWSoENAAAAAAAgDI3d+7cePvtt3e679POMkt/vVAoRD6f/9jtiy++OI488sgv7gV8wQQ0AAAAAACAMnfnnXfG7NmzM525fv36TOeVkibFXgAAAAAAAICGlXU8i4h4/vnnM59ZKgQ0AAAAAAAAdltlZfle6FBAAwAAAAAAKHPHH3985jP/5m/+JvOZpaJ80yAAAAAAAAAREXH++efH+eefv9vPKxQKO33k8/koFArxrW99K/bbb78G2LQ0CGgAAAAAAABl7u/+7u8in89nOnPp0qWZzislLuEIAAAAAABQ5o499tjMZ3bt2jXzmaXCGWgAAAAAAABlbtKkSTFp0qTM5tXU1ETHjh0zm1dqnIEGAAAAAAAACWegAQAAAAAAlLnrrrsu7rvvvkxnvvnmm5nOKyXOQAMAAAAAAChzWceziIi1a9dmPrNUOAMNAAAAAACgzD366KO7/NhCoRD5fD4KhUL9R/p5Pp+PMWPGxJAhQxpw4+IS0AAAAAAAAKiXy+WioqLiMx/TpEl5X+SwvF8dAAAAAAAA7CYBDQAAAAAAABICGgAAAAAAACS8BxoAAAAAAECZu/fee+OOO+7Y6b5CofC5523YsCHef//9v3atkiWgAQAAAAAAlLkZM2ZkPvOpp57KfGapENAAAAAAAADK3EMPPfSZZ4wVCoWPfeTz+Z1uR0T9fRdeeGEMGzbsC9r+iyegAQAAAAAAlLmmTZtGu3btMptXUVGR2axS1KTYCwAAAAAAAEApcQYaAAAAANCgBg4cGPPnzy/2GgB7rHw+HxdddFE8/fTTmc5du3ZtpvNKiYAGAAAAADSoZcuWxYQJE4q9BvAFEMtLUy6XyzyeRUT88Y9/zHxmqRDQAAAAAIAG5Qw0gOIqFAoNMre6urpB5pYCAQ0AAAAAaFDOQIM9h1hemnK5XPTv3z+ee+65TOd2794903mlpEmxFwAAAAAAAKDh5HK5zONZRMTy5cszn1kqnIEGAAAAAABQ5n70ox/FvHnzdumxuVyu/rKPn3b7iSeeiKOOOqphli0BAhoAAAAAAECZ69+/fzRv3nyXHrsrAe2ZZ56Jvfbaq2GWLQECGgAAAAAAQJkbPXp05jMffvjhmDhxYuZzS4GABgAAAAAAUOYuv/zyeOCBB3a6r+5sst1R95zly5fHN77xjUx2K0UCGgAAAAAAQJkbPnx4DB8+PLN5NTU10axZs8zmlZomxV4AAAAAAAAASomABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAorLYCwAAAAAAANCw3nvvvXj99dd3uq9QKOzWjEKhEIVCIfL5fGzfvj3L9UqOgAYAAAAAAFDmjjvuuMxnPvLIIzFx4sTM55YCAQ0AAAAAAKDMXXLJJXHfffft0mNzuVz92WmfdvvFF1+MoUOHNsyyJUBAAwAAAAAAKHMjRoyIESNGZDavpqYm9t5778zmlZomxV4AAAAAAAAASomABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAICGgAQAAAAAAQEJAAwAAAAAAgISABgAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkKgs9gIAAAAAAAA0rB/84AexYMGCTGe+/vrrmc4rJc5AAwAAAAAAKHNZx7OIiP/5n//JfGapcAYaAAAAAABAmZs0aVLcfffdu/TYXC4XhULhY7cjIgqFQhQKhXjttdfiqKOOapBdS4GABgAAAAAAUOZGjRoVo0aNymxeTU1NNG/ePLN5pcYlHAEAAAAAACAhoAEAAAAAAEBCQAMAAAAAAICEgAYAAAAAAAAJAQ0AAAAAAAASAhoAAAAAAAAkKou9AAAAAABQ3gYOHBjz588v9hoAe7Tx48fHihUrMp25atWqTOeVEgENAAAAAGhQy5YtiwkTJhR7DeALIJaXrk2bNmU+s7a2NvOZpUJAAwAAAAAAKHM//elPM51XU1MT/fr1y3RmKRHQAAAAAIAG5RKOAMV34oknxttvv53pzOXLl2c6r5QIaAAAAABAg3IJR9hziOWl62//9m8zD2j77rtvpvNKiYAGAAAAAABQ5q688spM59XU1ET37t0znVlKmhR7AQAAAAAAACglAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQqCz2AgAAAABAeRs4cGDMnz+/2GsA7NEuvPDCWL58eaYz16xZk+m8UiKgAQAAAAANatmyZTFhwoRirwF8AcTy0vXnP/8585nbtm3LfGapENAAAAAAAADK3K233prpvJqamujXr1+mM0uJ90ADAAAAAACAhDPQAAAAAAAA9jCFQiHy+XxEROTz+SgUCp95X93tus/rHleuBDQAAAAAAIAyN3z48MxnLlq0KCZOnJj53FLgEo4AAAAAAABl7qtf/WrmMw844IDMZ5YKAQ0AAAAAAKDMPfXUU5nPXLx4ceYzS4VLOAIAAAAAAJS5m266KZYuXbrTfYVCYbfn1D3n5ptvjlGjRmWyWykS0AAAAAAAAMpc7969o3fv3pnNu+eeezKbVYpcwhEAAAAAAAASAhoAAAAAAAAkBDQAAAAAAABICGgAAAAAAACQENAAAAAAAAAgIaABAAAAAABAQkADAAAAAACAhIAGAAAAAAAAicpiLwAAAAAAAEDjUCgU6j/KmYAGAAAAAABQ5r75zW/Gli1bMp351FNPZTqvlAhoAAAAAECDGjhwYMyfP7/YawDs0bKOZxERb775ZuYzS4WABgAAAAA0qGXLlsWECROKvQbwBRDLS9dhhx0WixcvznRm//79M51XSgQ0AAAAAACAMvfv//7vmc6rqamJ7t27ZzqzlDQp9gIAAAAAAABQSgQ0AAAAAAAASAhoAAAAAAAAkBDQAAAAAAAAIFFZ7AUAAAAAAABoWOedd1688MILmc5cs2ZNpvNKiTPQAAAAAAAAylzW8Swi4tlnn818ZqlwBhoAAAAAAECZu+CCC2LOnDk73VcoFCIiIpfL7XT7oz7pcW+++WYceeSRDblyUQloAAAAAAAAZa6mpiZqamoyndeiRYvM5pUal3AEAAAAAACAhIAGAAAAAAAACQENAAAAAAAAEgIaAAAAAADALvrVr34Vffv23enjO9/5TrHXImOVxV4AAAAAAACgsVi9enUcffTRccUVV9Tf16xZsyJutGvefPPNePHFF3fpsblcLgqFwmfe3rRpU8MsWiIENAAAAAAAgF20Zs2a6Nu3b+y3337FXmW3fOtb38p85q9//euYOHFi5nNLwS4HtLfeeiuuu+66WLBgQbzzzjux7777xmGHHRYXXnhhdO3atf5xmzdvjpkzZ8a8efPizTffjNatW8eRRx4ZEyZMiLZt2+4084MPPohZs2bFvHnz4u23347OnTtHTU1NnHHGGZ9Ya+fPnx+zZs2KlStXRlVVVQwfPjwmTZoU7dq1+yv+CAAAAAAAAHbN6tWrY+TIkcVeY7fs2LEjTj311FiwYEG0aNEi2rRps9PZZLuj7jlPPPFEHHXUUVmvWjJ2KaC99dZbceKJJ8a6deviiCOOiGOPPTbWrl0bc+fOjccffzzmzJkTPXr0iHw+H+ecc04sXbo0DjzwwDjmmGNi5cqVMWfOnFiyZEncc8890bJly4j4MLSddtppsWLFiujTp0+cfPLJ8corr8T06dNj4cKFcdNNN0VVVVX9DnPnzo1JkyZF165d49vf/nasW7cu7r///li6dGnce++90apVq4b5EwIAAAAAAIiIbdu2xauvvhqPPvpoXHvttZHP52PkyJFxwQUXRNOmTYu93ifasWNHTJ48OZYvXx61tbWZzn7//fcznddQtm/fHj/+8Y/jl7/8ZUREjBw5Mi699NLP/G+2SwHtuuuui3Xr1sX3v//9OPPMM+vvf+CBB+Liiy+O//iP/4gbbrghHnnkkVi6dGkcffTRce2110aTJk0iImL69Olx4403xs9+9rM4//zzIyLi5ptvjhUrVsTRRx8d06dPr1/y9ttvjx/+8Idx0003xfjx4yPiwzPVpk6dGl27do1f/vKX0aJFi4iIOOKII+LSSy+NWbNmxfe+973d/fMCAAAAAADYZS+//HLU1tZG8+bN49prr41XXnklrrzyyvjggw92ek+0UvLkk0/G888/n3k8i4hYsmRJ5jMbwrRp0+K3v/1tXH/99ZHL5eKiiy6KmTNnfublJ5vsyuDf/OY30bZt2zj99NN3uv8f//Efo1u3brFw4cLI5/Pxxz/+MSIiampq6uNZRMRJJ50UERHLli2rv+/BBx+MXC4Xl1122U6F75//+Z+jR48eMXv27Pr/mA8++GBs2LAhzjjjjPp4FhFxwgknRM+ePeO+++6LHTt27MpLAQAAAAAA+Fz69OkTv//972Pq1KlxwAEHxIgRI+Jf//Vf4+67726QQJWFVatWxZYtWxpk9t57790gc7P03nvvxS9+8YuYOnVqfPWrX42vfOUrcf7558ezzz77mc/7i2eg7dixI8aNGxeVlZU7RbE6TZs2je3bt8f27dujdevWERHx+uuv7/SYN954IyJip/dAe+2116Jz587RoUOHnR6by+Wib9++8dBDD9W/Ed/SpUsjImLIkCEf+/0HDx4cc+bMiVWrVsUBBxzwl14OAAAAAADA59amTZudPu/Vq1ds37491q9fH+3bty/SVp+uT58+UVVVFZs3b66/r6qqKi6++OIYPHhw5PP5KBQKUSgUIp/PR0R87L6P/tqOHTviO9/5TnTq1CkWL14cgwcPjoqKiqK8vr/kqaeeiqqqqjj88MPr7xs9enSMHj36M5/3F89Aq6ioiNNPPz3GjBnzsV9bs2ZN/O///m9069YtmjVrFqNGjYqWLVvGzJkzY8GCBbFp06ZYsWJFXHHFFbHXXnvtNKNp06axbdu2T/w9666ZWRfiXn311YiI6Nq168ceu//++0dExNq1a//SSwEAAAAAAPjcHn744Tj88MN36hvPPfdctGrVKvbbb78ibvbpBg8eHP369YuqqqrI5XJRVVUV/fv3j2984xvRokWLaNWqVey7777RunXraNu2bbRt2zaqq6tjv/32i/bt20fHjh2jU6dO0aVLl+jSpUt07tw5rrvuuti8eXO8+OKLMXXq1Jg8eXLJXinwlVdeiS5dusTcuXPj7//+72P48OHxn//5n5/aqOrs0nugfZJ8Ph9Tp06NfD5ff4nGjh07xuzZs2PSpElx7rnn1j923333jVtvvTW+/OUv19934IEHxpIlS+Lpp5+OQYMG1d//zjvvxDPPPBMR/z+kvfvuu9HKu1hnAAAGEklEQVS0adOoqqr62B51l3TcuHHj530pAAAAAAAAf9EhhxwShUIhLr/88hg3bly8/PLLMW3atDjrrLMil8sVe71PVFFREdOmTYsnn3wyVq9eHb179/6rzhire0+1Ops3b47nnnsunnzyyTjssMOyWjszH3zwQbz22msxe/bsmDJlSnzwwQcxZcqUqK2tjUsvvfRTn/e5Alrd/xyLFy+OAw88sP690TZt2hTXXXddrF69OoYMGRJf+tKXYu3atTF//vy4/PLL4+abb47OnTtHRMTYsWNjyZIlMXHixJgyZUocfPDB8corr8SUKVOiUCjU/z4REbW1tTu9T1qq7v6tW7d+7Nfmz5//eV4e0Eg55mHP4FiHPYfjHfYMjnXYczjegXLQpk2b+OlPfxpXXXVVjB49Olq0aBEnn3xyjBs3rtirfaaKioo47LDDMglcn/Sealu3bo3Vq1eXZECrrKyMjRs3xtVXXx3dunWLiIjJkyfH5MmT45JLLvnEty+L+BwBrba2Ni677LK47777omvXrnH99dfXR6wrr7wyfvOb38RFF10U55xzTv1zHn744Rg/fnxccMEFcc8990RExLBhw2Ly5Mkxffr0nc5WO/zww2Ps2LExY8aM+jefq6qqiu3bt3/iPnWn2DWGN6oDAAAAAAAat/79+8dtt91W7DWK5rTTTovTTjut2Gvssvbt20dlZWV9PIuI6NmzZ2zdujXWr18f1dXVn/i83QpomzdvjgsvvDAWLFgQPXr0iFtuuSU6dOgQERE7duyIBx54ILp06RJnn332Ts8bMWJEfP3rX4/HHnus/vTAiIizzjorRowYEQsWLIgtW7bEgAEDYvDgwTFt2rSIiGjXrl1ERLRq1Sq2bt0a27Zt+9iZaHWXbmzZsuXuvBQAAAAAAADK3MCBA6O2tjZefPHF6Nu3b0RErFmzJvbZZ59o3br1pz7vk89L+wR//vOf4/TTT48FCxZE//7944477qi/HGPEh+9dtm3btujZs+cnXuezLpq9/vrrO93ftWvXOOWUU+Lss8+OIUOGRC6XixUrVkQul4tevXpFRESPHj3+X3t3DJIFF4UB+A0SCc1vKnAolAhDRIhIMFwyHZozyanNBl3C6dsaBEEicHDSQXBQwVkX4VtyEA2JWoIGDUFE0XBTA/8p+URT/9nnGS/nHrjzy7knSbKxsXGm79+z+vr6qz4FAAAAAACAa6Curi4vXrxIsVjM9+/fs7Kyko8fP6a7uzs3b/57zuxKAdrBwUHevXuXr1+/pqWlJZOTkyfTYX8VCoVUVFRkbW3t3B7r6+tJkjt37iRJhoeH8/Tp0+zu7p6q29nZyerqapqamk6SvydPniRJlpeXz/RdWlrK7du3T8I2AAAAAAAA+Gt4eDgNDQ15+/Zt+vr60tnZmYGBgQvvXClA+/TpU1ZXV/P48eOMjY2lurr6TE1lZWXa29uzsbFx5u/PxcXFlEqlPHjwII8ePUqSPHz4MPv7+5menj6pOzw8TLFYzNHR0am9aB0dHamqqsr4+Hh+//59cj47O5u1tbW8fv36n0veAAAAAAAAuL6qq6szNDSUL1++ZGlpKcViMRUVFRfeuXF8fHx8UcH29naeP3+eo6OjvHr1KrW1tefW9fb2Zm9vL2/evMnm5maePXuWxsbG/Pr1KwsLC7l161YmJibS3NycJPnz5096enry7du3dHZ25t69e/n8+XN+/PiRrq6uDA4OnvoKcmpqKh8+fEhtbW1evnyZra2tzM/P5/79+5mZmbnwn0oAAAAAAAC4qksDtIWFhfT19V3aaHl5OTU1NdnZ2cno6GhKpVK2t7dTKBTS2tqa/v7+M3vK9vf3MzIyklKplL29vdTV1aWnpyddXV3nTpTNzc1lfHw8P3/+TKFQSFtbW96/f5+7d+/+z2cDAAAAAADA+S4N0AAAAAAAAOA6sTgMAAAAAAAAygjQAAAAAAAAoIwADQAAAAAAAMoI0AAAAAAAAKCMAA0AAAAAAADKCNAAAAAAAACgjAANAAAAAAAAygjQAAAAAAAAoIwADQAAAAAAAMoI0AAAAAAAAKDMf73215CnNwB1AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 2160x504 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot visualisation of the missing values for each feature of the raw DataFrame, df_tm_player_top5_2021_raw\n",
    "msno.matrix(df_tm_player_top5_2021_raw, figsize = (30, 7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "value    36\n",
       "dtype: int64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Counts of missing values\n",
    "tm_null_value_stats = df_tm_player_top5_2021_raw.isnull().sum(axis=0)\n",
    "tm_null_value_stats[tm_null_value_stats != 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The visualisation shows us very quickly that there a few missing values in the `value` column, but otherwise the dataset is complete."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section4'>4. Data Engineering</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.1'>4.1. Introduction</a>\n",
    "Before we answer the questions in the brief through [Exploratory Data Analysis (EDA)](#section5), we'll first need to clean and wrangle the datasets to a form that meet our needs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.2'>4.2. Assign Raw DataFrames to New Engineered DataFrames</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign Raw DataFrame to new Engineered DataFrame\n",
    "df_tm_player_top5_2021 = df_tm_player_top5_2021_raw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.2'>4.2. String Cleaning</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_tm_player_top5_2021['name_lower'] = df_tm_player_top5_2021['name'].str.normalize('NFKD')\\\n",
    "                                                                     .str.encode('ascii', errors='ignore')\\\n",
    "                                                                     .str.decode('utf-8')\\\n",
    "                                                                     .str.lower()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First Name Lower\n",
    "df_tm_player_top5_2021['firstname_lower'] = df_tm_player_top5_2021['name_lower'].str.rsplit(' ', 0).str[0]\n",
    "\n",
    "# Last Name Lower\n",
    "df_tm_player_top5_2021['lastname_lower'] = df_tm_player_top5_2021['name_lower'].str.rsplit(' ', 1).str[-1]\n",
    "\n",
    "# First Initial Lower\n",
    "df_tm_player_top5_2021['firstinitial_lower'] = df_tm_player_top5_2021['name_lower'].astype(str).str[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### DoB and Age\n",
    "The `dob` column is messy and contains both the date of birth as a string and also the age in brackets.\n",
    "\n",
    "This string cleaning consists of two parts, firstly, to split this apart into their seperate components. However, once the `age` column is created, we will replaced this by determining the current age using the Python [datetime](https://docs.python.org/3/library/datetime.html) module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "# DoB string cleaning to create birth_date and age columns\n",
    "df_tm_player_top5_2021[['birth_date', 'age']] = df_tm_player_top5_2021['dob'].str.extract(r'(.+) \\((\\d+)\\)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Nationality\n",
    "For the nationality, some of the players have duel nationality.\n",
    "\n",
    "For example, [Claudio Pizarro](https://www.transfermarkt.co.uk/claudio-pizarro/profil/spieler/532) is a Peruvian-born player who has has made 85 appearances for Peru, scoring 20 goals. However, his citizenship according to [TransferMarkt](https://www.transfermarkt.co.uk/) is 'Peru / Italy'. For our needs, we only want to know the country the player is eligible to play for, not their full heritage which from observations is always the first part of the string. We'll therefore be discarding anything after the first space in the string to form a new `playing_country` column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Take the first nationality i.e. text before the first space, ex. 'Peru / Italy'\n",
    "df_tm_player_top5_2021['playing_country'] = df_tm_player_top5_2021['nationality'].str.split(' /').str[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Value\n",
    "The values of the players have prefixes (£), commas, spaces, and suffixes (m, k) that need to cleaned and replaced before converting to a numerical value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Value string cleaning from shortened string value to full numerical value\n",
    "\n",
    "## Convert 'm' to '000000'\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace('m','0000')\n",
    "\n",
    "## Convert 'k' to '000'\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace('k','000')\n",
    "\n",
    "## Convert 'Th' to '000'\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace('Th','000')\n",
    "\n",
    "## Remove '.'\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace('.','')\n",
    "\n",
    "## Remove '£' sign\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace('£','')\n",
    "\n",
    "## Remove '-'\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace('-','')\n",
    "\n",
    "## Remove '¬†¬†'\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace('¬†¬†','')\n",
    "\n",
    "## Remove gaps\n",
    "df_tm_player_top5_2021['value'] = df_tm_player_top5_2021['value'].str.replace(' ','')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Position"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Goalkeeper', 'Centre-Back', 'Left-Back', 'Right-Back',\n",
       "       'Defensive Midfield', 'Central Midfield', 'Attacking Midfield',\n",
       "       'Left Winger', 'Right Winger', 'Second Striker', 'Centre-Forward',\n",
       "       'Right Midfield', 'Left Midfield', 'Midfielder'], dtype=object)"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# List unique values in the df_tm_player_top5_2021['position_description'] column\n",
    "df_tm_player_top5_2021['position_description'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "dict_positions={\n",
    "    'Goalkeeper': 'Goalkeeper',\n",
    "    'Centre-Back': 'Defender',\n",
    "    'Left-Back': 'Defender',\n",
    "    'Right-Back': 'Defender',\n",
    "    'Defensive Midfield': 'Midfielder',\n",
    "    'Central Midfield': 'Midfielder',\n",
    "    'Attacking Midfield': 'Midfielder',\n",
    "    'Left Winger': 'Forward',\n",
    "    'Right Winger': 'Forward',\n",
    "    'Second Striker': 'Forward',\n",
    "    'Centre-Forward': 'Forward',\n",
    "    'Right Midfield': 'Midfielder',\n",
    "    'Left Midfield': 'Midfielder',\n",
    "    'Midfielder': 'Midfielder'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_tm_player_top5_2021['position_cleaned'] = df_tm_player_top5_2021['position_description'].map(dict_positions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.3'>4.3. Converting Data Types</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### DoB\n",
    "First we need to convert the `dob` column from the `object` data type to `datetime64[ns]`, again using the [.to_datetime()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>position_number</th>\n",
       "      <th>position_description</th>\n",
       "      <th>name</th>\n",
       "      <th>dob</th>\n",
       "      <th>nationality</th>\n",
       "      <th>value</th>\n",
       "      <th>name_lower</th>\n",
       "      <th>firstname_lower</th>\n",
       "      <th>lastname_lower</th>\n",
       "      <th>firstinitial_lower</th>\n",
       "      <th>birth_date</th>\n",
       "      <th>age</th>\n",
       "      <th>playing_country</th>\n",
       "      <th>position_cleaned</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Marc-André ter Stegen</td>\n",
       "      <td>Apr 30, 1992 (28)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>64800000</td>\n",
       "      <td>marc-andre ter stegen</td>\n",
       "      <td>marc-andre</td>\n",
       "      <td>stegen</td>\n",
       "      <td>m</td>\n",
       "      <td>Apr 30, 1992</td>\n",
       "      <td>28</td>\n",
       "      <td>Germany</td>\n",
       "      <td>Goalkeeper</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>13</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Neto</td>\n",
       "      <td>Jul 19, 1989 (31)</td>\n",
       "      <td>Brazil / Italy</td>\n",
       "      <td>13050000</td>\n",
       "      <td>neto</td>\n",
       "      <td>neto</td>\n",
       "      <td>neto</td>\n",
       "      <td>n</td>\n",
       "      <td>Jul 19, 1989</td>\n",
       "      <td>31</td>\n",
       "      <td>Brazil</td>\n",
       "      <td>Goalkeeper</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>26</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Iñaki Peña</td>\n",
       "      <td>Mar 2, 1999 (21)</td>\n",
       "      <td>Spain</td>\n",
       "      <td>2070000</td>\n",
       "      <td>inaki pena</td>\n",
       "      <td>inaki</td>\n",
       "      <td>pena</td>\n",
       "      <td>i</td>\n",
       "      <td>Mar 2, 1999</td>\n",
       "      <td>21</td>\n",
       "      <td>Spain</td>\n",
       "      <td>Goalkeeper</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>15</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Clément Lenglet</td>\n",
       "      <td>Jun 17, 1995 (25)</td>\n",
       "      <td>France</td>\n",
       "      <td>43200000</td>\n",
       "      <td>clement lenglet</td>\n",
       "      <td>clement</td>\n",
       "      <td>lenglet</td>\n",
       "      <td>c</td>\n",
       "      <td>Jun 17, 1995</td>\n",
       "      <td>25</td>\n",
       "      <td>France</td>\n",
       "      <td>Defender</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Samuel Umtiti</td>\n",
       "      <td>Nov 14, 1993 (26)</td>\n",
       "      <td>France / Cameroon</td>\n",
       "      <td>28800000</td>\n",
       "      <td>samuel umtiti</td>\n",
       "      <td>samuel</td>\n",
       "      <td>umtiti</td>\n",
       "      <td>s</td>\n",
       "      <td>Nov 14, 1993</td>\n",
       "      <td>26</td>\n",
       "      <td>France</td>\n",
       "      <td>Defender</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2885</th>\n",
       "      <td>18</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Sergio Córdova</td>\n",
       "      <td>Aug 9, 1997 (23)</td>\n",
       "      <td>Venezuela</td>\n",
       "      <td>1440000</td>\n",
       "      <td>sergio cordova</td>\n",
       "      <td>sergio</td>\n",
       "      <td>cordova</td>\n",
       "      <td>s</td>\n",
       "      <td>Aug 9, 1997</td>\n",
       "      <td>23</td>\n",
       "      <td>Venezuela</td>\n",
       "      <td>Forward</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2886</th>\n",
       "      <td>9</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Fabian Klos</td>\n",
       "      <td>Dec 2, 1987 (32)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>900000</td>\n",
       "      <td>fabian klos</td>\n",
       "      <td>fabian</td>\n",
       "      <td>klos</td>\n",
       "      <td>f</td>\n",
       "      <td>Dec 2, 1987</td>\n",
       "      <td>32</td>\n",
       "      <td>Germany</td>\n",
       "      <td>Forward</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2887</th>\n",
       "      <td>13</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Sebastian Müller</td>\n",
       "      <td>Jan 23, 2001 (19)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>270000</td>\n",
       "      <td>sebastian muller</td>\n",
       "      <td>sebastian</td>\n",
       "      <td>muller</td>\n",
       "      <td>s</td>\n",
       "      <td>Jan 23, 2001</td>\n",
       "      <td>19</td>\n",
       "      <td>Germany</td>\n",
       "      <td>Forward</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2888</th>\n",
       "      <td>36</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Sven Schipplock</td>\n",
       "      <td>Nov 8, 1988 (31)</td>\n",
       "      <td>Germany</td>\n",
       "      <td>270000</td>\n",
       "      <td>sven schipplock</td>\n",
       "      <td>sven</td>\n",
       "      <td>schipplock</td>\n",
       "      <td>s</td>\n",
       "      <td>Nov 8, 1988</td>\n",
       "      <td>31</td>\n",
       "      <td>Germany</td>\n",
       "      <td>Forward</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2889</th>\n",
       "      <td>39</td>\n",
       "      <td>Centre-Forward</td>\n",
       "      <td>Prince Osei Owusu</td>\n",
       "      <td>Jan 7, 1997 (23)</td>\n",
       "      <td>Germany / Ghana</td>\n",
       "      <td>225000</td>\n",
       "      <td>prince osei owusu</td>\n",
       "      <td>prince</td>\n",
       "      <td>owusu</td>\n",
       "      <td>p</td>\n",
       "      <td>Jan 7, 1997</td>\n",
       "      <td>23</td>\n",
       "      <td>Germany</td>\n",
       "      <td>Forward</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2890 rows × 14 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     position_number position_description                   name  \\\n",
       "0                  1           Goalkeeper  Marc-André ter Stegen   \n",
       "1                 13           Goalkeeper                   Neto   \n",
       "2                 26           Goalkeeper             Iñaki Peña   \n",
       "3                 15          Centre-Back        Clément Lenglet   \n",
       "4                 23          Centre-Back          Samuel Umtiti   \n",
       "...              ...                  ...                    ...   \n",
       "2885              18       Centre-Forward         Sergio Córdova   \n",
       "2886               9       Centre-Forward            Fabian Klos   \n",
       "2887              13       Centre-Forward       Sebastian Müller   \n",
       "2888              36       Centre-Forward        Sven Schipplock   \n",
       "2889              39       Centre-Forward      Prince Osei Owusu   \n",
       "\n",
       "                    dob        nationality     value             name_lower  \\\n",
       "0     Apr 30, 1992 (28)            Germany  64800000  marc-andre ter stegen   \n",
       "1     Jul 19, 1989 (31)     Brazil / Italy  13050000                   neto   \n",
       "2      Mar 2, 1999 (21)              Spain   2070000             inaki pena   \n",
       "3     Jun 17, 1995 (25)             France  43200000        clement lenglet   \n",
       "4     Nov 14, 1993 (26)  France / Cameroon  28800000          samuel umtiti   \n",
       "...                 ...                ...       ...                    ...   \n",
       "2885   Aug 9, 1997 (23)          Venezuela   1440000         sergio cordova   \n",
       "2886   Dec 2, 1987 (32)            Germany    900000            fabian klos   \n",
       "2887  Jan 23, 2001 (19)            Germany    270000       sebastian muller   \n",
       "2888   Nov 8, 1988 (31)            Germany    270000        sven schipplock   \n",
       "2889   Jan 7, 1997 (23)    Germany / Ghana    225000      prince osei owusu   \n",
       "\n",
       "     firstname_lower lastname_lower firstinitial_lower    birth_date age  \\\n",
       "0         marc-andre         stegen                  m  Apr 30, 1992  28   \n",
       "1               neto           neto                  n  Jul 19, 1989  31   \n",
       "2              inaki           pena                  i   Mar 2, 1999  21   \n",
       "3            clement        lenglet                  c  Jun 17, 1995  25   \n",
       "4             samuel         umtiti                  s  Nov 14, 1993  26   \n",
       "...              ...            ...                ...           ...  ..   \n",
       "2885          sergio        cordova                  s   Aug 9, 1997  23   \n",
       "2886          fabian           klos                  f   Dec 2, 1987  32   \n",
       "2887       sebastian         muller                  s  Jan 23, 2001  19   \n",
       "2888            sven     schipplock                  s   Nov 8, 1988  31   \n",
       "2889          prince          owusu                  p   Jan 7, 1997  23   \n",
       "\n",
       "     playing_country position_cleaned  \n",
       "0            Germany       Goalkeeper  \n",
       "1             Brazil       Goalkeeper  \n",
       "2              Spain       Goalkeeper  \n",
       "3             France         Defender  \n",
       "4             France         Defender  \n",
       "...              ...              ...  \n",
       "2885       Venezuela          Forward  \n",
       "2886         Germany          Forward  \n",
       "2887         Germany          Forward  \n",
       "2888         Germany          Forward  \n",
       "2889         Germany          Forward  \n",
       "\n",
       "[2890 rows x 14 columns]"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tm_player_top5_2021"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "position_number         object\n",
       "position_description    object\n",
       "name                    object\n",
       "dob                     object\n",
       "nationality             object\n",
       "value                   object\n",
       "name_lower              object\n",
       "firstname_lower         object\n",
       "lastname_lower          object\n",
       "firstinitial_lower      object\n",
       "birth_date              object\n",
       "age                     object\n",
       "playing_country         object\n",
       "position_cleaned        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tm_player_top5_2021.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_tm_player_top5_2021.to_csv(data_dir + f'test_tm_{today}.csv', index=None, header=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_tm_player_top5_2021 = df_tm_player_top5_2021.replace('N/A', '', regex=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Convert birth_date from string to datetime64[ns]\n",
    "df_tm_player_top5_2021['birth_date'] = pd.to_datetime(df_tm_player_top5_2021['birth_date'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Age\n",
    "The calculated `age` column needs to be converted from a float to an integer, with all null values ignored, using to [astype()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html) method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Date and time manipulation\n",
    "from datetime import datetime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Redetermine the age using the newly created birth_date column (after formatted to datetime data type)\n",
    "\n",
    "## Remove all not numeric values use to_numeric with parameter errors='coerce' - it replaces non numeric to NaNs\n",
    "df_tm_player_top5_2021['age'] = pd.to_numeric(df_tm_player_top5_2021['age'], errors='coerce')\n",
    "\n",
    "## Convert floats to integers and leave null values\n",
    "df_tm_player_top5_2021['age'] = np.nan_to_num(df_tm_player_top5_2021['age']).astype(int)\n",
    "\n",
    "## Calculate current age\n",
    "today = datetime.today()\n",
    "df_tm_player_top5_2021['age'] = df_tm_player_top5_2021['birth_date'].apply(lambda x: today.year - x.year - \n",
    "                                                                               ((today.month, today.day) < (x.month, x.day)) \n",
    "                                                                          )\n",
    "\n",
    "\n",
    "# df_tm_player_top5_2021['age'] = pd.to_numeric(ddf_tm_player_top5_2021['age'], downcast='signed')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Value\n",
    "The `value` column needs to be converted from a string to an integer using to [to_numeric()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html) method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert string to integer\n",
    "df_tm_player_top5_2021['value'] = pd.to_numeric(df_tm_player_top5_2021['value'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Position\n",
    "..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Attacking Midfield',\n",
       " 'Central Midfield',\n",
       " 'Centre-Back',\n",
       " 'Centre-Forward',\n",
       " 'Defensive Midfield',\n",
       " 'Goalkeeper',\n",
       " 'Left Midfield',\n",
       " 'Left Winger',\n",
       " 'Left-Back',\n",
       " 'Midfielder',\n",
       " 'Right Midfield',\n",
       " 'Right Winger',\n",
       " 'Right-Back',\n",
       " 'Second Striker']"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sorted(df_tm_player_top5_2021['position_description'].unique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "dict_positions_tm = {\n",
    "    'Attacking Midfield': 'Midfielder',\n",
    "    'Central Midfield': 'Midfielder',\n",
    "    'Centre-Back': 'Defender',\n",
    "    'Centre-Forward': 'Forward',\n",
    "    'Defensive Midfield': 'Midfielder',\n",
    "    'Forward': 'Forward',\n",
    "    'Goalkeeper': 'Goalkeeper',\n",
    "    'Left Midfield': 'Midfielder',\n",
    "    'Left Winger': 'Forward',\n",
    "    'Left-Back': 'Defender',\n",
    "    'Right Midfield': 'Midfielder',\n",
    "    'Right Winger': 'Forward',\n",
    "    'Right-Back': 'Defender',\n",
    "    'Second Striker': 'Forward'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_tm_player_top5_2021['position_description_cleaned'] = df_tm_player_top5_2021['position_description'].map(dict_positions_tm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.4'>4.4. Create New Attributes</a>\n",
    "Create new attributes for birth month and birth year."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_tm_player_top5_2021['birth_year'] = pd.DatetimeIndex(df_tm_player_top5_2021['birth_date']).year\n",
    "df_tm_player_top5_2021['birth_month'] = pd.DatetimeIndex(df_tm_player_top5_2021['birth_date']).month"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.5'>4.5. Columns of Interest</a>\n",
    "We are interested in the following thirteen columns in the [TransferMarkt](https://www.transfermarkt.co.uk/) dataset:\n",
    "*    `name`\n",
    "*    `name_lower`\n",
    "*    `firstinitial_lower`\n",
    "*    `firstname_lower`\n",
    "*    `lastname_lower`\n",
    "*    `position_description`\n",
    "*    `position_description_cleaned`\n",
    "*    `value`\n",
    "*    `birth_date`\n",
    "*    `birth_year`\n",
    "*    `birth_month`\n",
    "*    `age`\n",
    "*    `playing_country`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select columns of interest\n",
    "df_tm_player_top5_2021 = df_tm_player_top5_2021[['name', 'name_lower', 'firstinitial_lower', 'firstname_lower', 'lastname_lower', 'position_description', 'position_description_cleaned', 'value', 'birth_date', 'birth_year', 'birth_month', 'age', 'playing_country']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.6'>4.6. Split Dataset into Outfielder Players and Goalkeepers</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign df_tm as a new DataFrame - df_tm_player_top5_all_2021_all, to represent all the players\n",
    "df_tm_player_top5_all_2021 = df_tm_player_top5_2021\n",
    "\n",
    "# Filter rows for position_description is not equal to 'Goalkeeper'\n",
    "df_tm_player_top5_outfield_2021 = df_tm_player_top5_all_2021[df_tm_player_top5_all_2021['position_description'] != 'Goalkeeper']\n",
    "\n",
    "# Filter rows for position_description are equal to 'Goalkeeper'\n",
    "df_tm_player_top5_goalkeeper_2021 = df_tm_player_top5_all_2021[df_tm_player_top5_all_2021['position_description'] == 'Goalkeeper']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>name_lower</th>\n",
       "      <th>firstinitial_lower</th>\n",
       "      <th>firstname_lower</th>\n",
       "      <th>lastname_lower</th>\n",
       "      <th>position_description</th>\n",
       "      <th>position_description_cleaned</th>\n",
       "      <th>value</th>\n",
       "      <th>birth_date</th>\n",
       "      <th>birth_year</th>\n",
       "      <th>birth_month</th>\n",
       "      <th>age</th>\n",
       "      <th>playing_country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Marc-André ter Stegen</td>\n",
       "      <td>marc-andre ter stegen</td>\n",
       "      <td>m</td>\n",
       "      <td>marc-andre</td>\n",
       "      <td>stegen</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>64800000.0</td>\n",
       "      <td>1992-04-30</td>\n",
       "      <td>1992.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Germany</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Neto</td>\n",
       "      <td>neto</td>\n",
       "      <td>n</td>\n",
       "      <td>neto</td>\n",
       "      <td>neto</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>13050000.0</td>\n",
       "      <td>1989-07-19</td>\n",
       "      <td>1989.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>Brazil</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Iñaki Peña</td>\n",
       "      <td>inaki pena</td>\n",
       "      <td>i</td>\n",
       "      <td>inaki</td>\n",
       "      <td>pena</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>2070000.0</td>\n",
       "      <td>1999-03-02</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>Spain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Clément Lenglet</td>\n",
       "      <td>clement lenglet</td>\n",
       "      <td>c</td>\n",
       "      <td>clement</td>\n",
       "      <td>lenglet</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Defender</td>\n",
       "      <td>43200000.0</td>\n",
       "      <td>1995-06-17</td>\n",
       "      <td>1995.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Samuel Umtiti</td>\n",
       "      <td>samuel umtiti</td>\n",
       "      <td>s</td>\n",
       "      <td>samuel</td>\n",
       "      <td>umtiti</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Defender</td>\n",
       "      <td>28800000.0</td>\n",
       "      <td>1993-11-14</td>\n",
       "      <td>1993.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    name             name_lower firstinitial_lower  \\\n",
       "0  Marc-André ter Stegen  marc-andre ter stegen                  m   \n",
       "1                   Neto                   neto                  n   \n",
       "2             Iñaki Peña             inaki pena                  i   \n",
       "3        Clément Lenglet        clement lenglet                  c   \n",
       "4          Samuel Umtiti          samuel umtiti                  s   \n",
       "\n",
       "  firstname_lower lastname_lower position_description  \\\n",
       "0      marc-andre         stegen           Goalkeeper   \n",
       "1            neto           neto           Goalkeeper   \n",
       "2           inaki           pena           Goalkeeper   \n",
       "3         clement        lenglet          Centre-Back   \n",
       "4          samuel         umtiti          Centre-Back   \n",
       "\n",
       "  position_description_cleaned       value birth_date  birth_year  \\\n",
       "0                   Goalkeeper  64800000.0 1992-04-30      1992.0   \n",
       "1                   Goalkeeper  13050000.0 1989-07-19      1989.0   \n",
       "2                   Goalkeeper   2070000.0 1999-03-02      1999.0   \n",
       "3                     Defender  43200000.0 1995-06-17      1995.0   \n",
       "4                     Defender  28800000.0 1993-11-14      1993.0   \n",
       "\n",
       "   birth_month   age playing_country  \n",
       "0          4.0  28.0         Germany  \n",
       "1          7.0  31.0          Brazil  \n",
       "2          3.0  21.0           Spain  \n",
       "3          6.0  25.0          France  \n",
       "4         11.0  26.0          France  "
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tm_player_top5_all_2021.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>name_lower</th>\n",
       "      <th>firstinitial_lower</th>\n",
       "      <th>firstname_lower</th>\n",
       "      <th>lastname_lower</th>\n",
       "      <th>position_description</th>\n",
       "      <th>position_description_cleaned</th>\n",
       "      <th>value</th>\n",
       "      <th>birth_date</th>\n",
       "      <th>birth_year</th>\n",
       "      <th>birth_month</th>\n",
       "      <th>age</th>\n",
       "      <th>playing_country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Clément Lenglet</td>\n",
       "      <td>clement lenglet</td>\n",
       "      <td>c</td>\n",
       "      <td>clement</td>\n",
       "      <td>lenglet</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Defender</td>\n",
       "      <td>43200000.0</td>\n",
       "      <td>1995-06-17</td>\n",
       "      <td>1995.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Samuel Umtiti</td>\n",
       "      <td>samuel umtiti</td>\n",
       "      <td>s</td>\n",
       "      <td>samuel</td>\n",
       "      <td>umtiti</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Defender</td>\n",
       "      <td>28800000.0</td>\n",
       "      <td>1993-11-14</td>\n",
       "      <td>1993.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Gerard Piqué</td>\n",
       "      <td>gerard pique</td>\n",
       "      <td>g</td>\n",
       "      <td>gerard</td>\n",
       "      <td>pique</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Defender</td>\n",
       "      <td>18000000.0</td>\n",
       "      <td>1987-02-02</td>\n",
       "      <td>1987.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>Spain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Jean-Clair Todibo</td>\n",
       "      <td>jean-clair todibo</td>\n",
       "      <td>j</td>\n",
       "      <td>jean-clair</td>\n",
       "      <td>todibo</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Defender</td>\n",
       "      <td>12600000.0</td>\n",
       "      <td>1999-12-30</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Ronald Araújo</td>\n",
       "      <td>ronald araujo</td>\n",
       "      <td>r</td>\n",
       "      <td>ronald</td>\n",
       "      <td>araujo</td>\n",
       "      <td>Centre-Back</td>\n",
       "      <td>Defender</td>\n",
       "      <td>9000000.0</td>\n",
       "      <td>1999-03-07</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>Uruguay</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                name         name_lower firstinitial_lower firstname_lower  \\\n",
       "3    Clément Lenglet    clement lenglet                  c         clement   \n",
       "4      Samuel Umtiti      samuel umtiti                  s          samuel   \n",
       "5       Gerard Piqué       gerard pique                  g          gerard   \n",
       "6  Jean-Clair Todibo  jean-clair todibo                  j      jean-clair   \n",
       "7      Ronald Araújo      ronald araujo                  r          ronald   \n",
       "\n",
       "  lastname_lower position_description position_description_cleaned  \\\n",
       "3        lenglet          Centre-Back                     Defender   \n",
       "4         umtiti          Centre-Back                     Defender   \n",
       "5          pique          Centre-Back                     Defender   \n",
       "6         todibo          Centre-Back                     Defender   \n",
       "7         araujo          Centre-Back                     Defender   \n",
       "\n",
       "        value birth_date  birth_year  birth_month   age playing_country  \n",
       "3  43200000.0 1995-06-17      1995.0          6.0  25.0          France  \n",
       "4  28800000.0 1993-11-14      1993.0         11.0  26.0          France  \n",
       "5  18000000.0 1987-02-02      1987.0          2.0  33.0           Spain  \n",
       "6  12600000.0 1999-12-30      1999.0         12.0  20.0          France  \n",
       "7   9000000.0 1999-03-07      1999.0          3.0  21.0         Uruguay  "
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tm_player_top5_outfield_2021.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>name_lower</th>\n",
       "      <th>firstinitial_lower</th>\n",
       "      <th>firstname_lower</th>\n",
       "      <th>lastname_lower</th>\n",
       "      <th>position_description</th>\n",
       "      <th>position_description_cleaned</th>\n",
       "      <th>value</th>\n",
       "      <th>birth_date</th>\n",
       "      <th>birth_year</th>\n",
       "      <th>birth_month</th>\n",
       "      <th>age</th>\n",
       "      <th>playing_country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Marc-André ter Stegen</td>\n",
       "      <td>marc-andre ter stegen</td>\n",
       "      <td>m</td>\n",
       "      <td>marc-andre</td>\n",
       "      <td>stegen</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>64800000.0</td>\n",
       "      <td>1992-04-30</td>\n",
       "      <td>1992.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Germany</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Neto</td>\n",
       "      <td>neto</td>\n",
       "      <td>n</td>\n",
       "      <td>neto</td>\n",
       "      <td>neto</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>13050000.0</td>\n",
       "      <td>1989-07-19</td>\n",
       "      <td>1989.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>Brazil</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Iñaki Peña</td>\n",
       "      <td>inaki pena</td>\n",
       "      <td>i</td>\n",
       "      <td>inaki</td>\n",
       "      <td>pena</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>2070000.0</td>\n",
       "      <td>1999-03-02</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>Spain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>Thibaut Courtois</td>\n",
       "      <td>thibaut courtois</td>\n",
       "      <td>t</td>\n",
       "      <td>thibaut</td>\n",
       "      <td>courtois</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>54000000.0</td>\n",
       "      <td>1992-05-11</td>\n",
       "      <td>1992.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Belgium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>Andriy Lunin</td>\n",
       "      <td>andriy lunin</td>\n",
       "      <td>a</td>\n",
       "      <td>andriy</td>\n",
       "      <td>lunin</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>Goalkeeper</td>\n",
       "      <td>2430000.0</td>\n",
       "      <td>1999-02-11</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>Ukraine</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     name             name_lower firstinitial_lower  \\\n",
       "0   Marc-André ter Stegen  marc-andre ter stegen                  m   \n",
       "1                    Neto                   neto                  n   \n",
       "2              Iñaki Peña             inaki pena                  i   \n",
       "32       Thibaut Courtois       thibaut courtois                  t   \n",
       "33           Andriy Lunin           andriy lunin                  a   \n",
       "\n",
       "   firstname_lower lastname_lower position_description  \\\n",
       "0       marc-andre         stegen           Goalkeeper   \n",
       "1             neto           neto           Goalkeeper   \n",
       "2            inaki           pena           Goalkeeper   \n",
       "32         thibaut       courtois           Goalkeeper   \n",
       "33          andriy          lunin           Goalkeeper   \n",
       "\n",
       "   position_description_cleaned       value birth_date  birth_year  \\\n",
       "0                    Goalkeeper  64800000.0 1992-04-30      1992.0   \n",
       "1                    Goalkeeper  13050000.0 1989-07-19      1989.0   \n",
       "2                    Goalkeeper   2070000.0 1999-03-02      1999.0   \n",
       "32                   Goalkeeper  54000000.0 1992-05-11      1992.0   \n",
       "33                   Goalkeeper   2430000.0 1999-02-11      1999.0   \n",
       "\n",
       "    birth_month   age playing_country  \n",
       "0           4.0  28.0         Germany  \n",
       "1           7.0  31.0          Brazil  \n",
       "2           3.0  21.0           Spain  \n",
       "32          5.0  28.0         Belgium  \n",
       "33          2.0  21.0         Ukraine  "
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tm_player_top5_goalkeeper_2021.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section4.7'>4.7. Exporting the Engineered DataFrames</a>\n",
    "Export the three engineered [TransferMarkt](https://www.transfermarkt.co.uk/) DataFrames as CSV files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Datetime\n",
    "import datetime\n",
    "from datetime import date\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define today's date\n",
    "today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export the three DataFrames\n",
    "df_tm_player_top5_all_2021.to_csv(data_dir_tm + '/engineered/all/' + f'all_big5_1920_{today}.csv', index=None, header=True)\n",
    "df_tm_player_top5_outfield_2021.to_csv(data_dir_tm + '/engineered/player/' + f'outfield_big5_1920_{today}.csv', index=None, header=True)\n",
    "df_tm_player_top5_goalkeeper_2021.to_csv(data_dir_tm + '/engineered/goalkeeper/' + f'goalkeeper_big5_1920_{today}.csv', index=None, header=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have created three pandas DataFrames and wrangled the data to meet our needs, we'll next conduct and [Exploratory Data Analysis ](#section5)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section5'>5. Exploratory Data Analysis</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section5.1'>5.1. Introduction</a>\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section5.2'>5.2. ...</a>\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section6'>6. Summary</a>\n",
    "This notebook scrapes data for player valuations using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) from [TransferMarkt](https://www.transfermarkt.co.uk/) using [pandas](http://pandas.pydata.org/) for data maniuplation through DataFrames and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section7'>7. Next Steps</a>\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section8'>8. References</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <a id='#section9.1'>8.1. Bibliography</a>\n",
    "\n",
    "#### Data and Web Scraping\n",
    "*    [FBref](https://fbref.com/) for the data to scrape\n",
    "*    FBref statement for using StatsBomb's data: https://fbref.com/en/statsbomb/\n",
    "*    [StatsBomb](https://statsbomb.com/) providing the data to FBref\n",
    "*    [FBref_EPL GitHub repository](https://github.com/chmartin/FBref_EPL) by [chmartin](https://github.com/chmartin) for the original web scraping code\n",
    "*    [Scrape-FBref-data GitHub repository](https://github.com/parth1902/Scrape-FBref-data) by [parth1902](https://github.com/parth1902) for the revised web scraping code for the new FBref metrics\n",
    "*    [Beyond crowd judgments: Data-driven estimation of market value in association football](https://www.sciencedirect.com/science/article/pii/S0377221717304332) by Oliver Müllera, Alexander Simons, and Markus Weinmann.\n",
    "*    [06/04/2020: BBC - Premier League squads 'drop £1.6bn in value'](https://www.bbc.co.uk/sport/football/52221463).\n",
    "*    [tyrone_mings GitHub repository](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://github.com/FCrSTATS)\n",
    "*    [Python Package Index (PyPI) tyrone-mings library](https://pypi.org/project/tyrone-mings/).\n",
    "\n",
    "#### Countries\n",
    "*    [Comparison of alphabetic country codes Wiki](https://en.wikipedia.org/wiki/Comparison_of_alphabetic_country_codes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id='#section9.2'>8.2. Python Techniques Observed</a>\n",
    "To conduct our analysis, we have used the following libraries and modules for the following tasks:\n",
    "*    [NumPy](http://www.numpy.org/) for multidimensional array computing,\n",
    "*    [pandas](http://pandas.pydata.org/) for data manipulation and ingestion, and\n",
    "*    [Beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) for scraping data from webpages.\n",
    "\n",
    "We have also demonstrated an array of techniques in Python using the following methods and functions:\n",
    "*    pandas EDA methods:\n",
    "     +    [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html),\n",
    "     +    [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html),\n",
    "     +    [shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html),\n",
    "     +    [columns](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.columns.html),\n",
    "     +    [dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html),\n",
    "     +    [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html), and\n",
    "     +    [describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html).\n",
    "*    The [missingno](https://pypi.org/project/missingno/) library to visualise how many missing values we have in the dataset, and\n",
    "*    The pandas [.to_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) method to export the DataFrames as csv files."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "***Visit my website [EddWebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Back to the top](#top)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "oldHeight": 642,
   "position": {
    "height": "664px",
    "left": "1119px",
    "right": "20px",
    "top": "-7px",
    "width": "489px"
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "varInspector_section_display": "block",
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}