{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## ATTEMPT AT MAKING AN EXPECTED GOALS MODEL" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DATA EXPLORATION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start off by importing pandas library to handle the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reading in the data into a DataFrame" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pd.read_excel('xgl.xlsx')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data has quire a few columns related to shots" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
XYa_goalsa_teamdateh_ah_goalsh_teamidlastActionmatch_idminuteplayerplayer_assistedplayer_idresultseasonshotTypesituation
00.7070.3790Hoffenheim2015-08-29 17:30:00h0Darmstadt76737Aerial104493György GaricsNaN2MissedShots2015RightFootFromCorner
10.7280.3731Darmstadt2015-09-12 17:30:00a0Bayer Leverkusen76808Pass10531György GaricsKonstantin Rausch2SavedShot2015RightFootSetPiece
20.0160.4640Darmstadt2015-11-01 18:30:00a2VfB Stuttgart78492Foul111867György GaricsNaN2OwnGoal2015HeadSetPiece
30.8750.5212Darmstadt2015-12-20 20:30:00a3Borussia M.Gladbach79876Aerial117358György GaricsNaN2MissedShots2015HeadFromCorner
00.9270.5572Werder Bremen2014-12-07 16:30:00a5Eintracht Frankfurt27374Pass532078Luca CaldirolaFin Bartels3Goal2014LeftFootOpenPlay
\n", "
" ], "text/plain": [ " X Y a_goals a_team date h_a h_goals \\\n", "0 0.707 0.379 0 Hoffenheim 2015-08-29 17:30:00 h 0 \n", "1 0.728 0.373 1 Darmstadt 2015-09-12 17:30:00 a 0 \n", "2 0.016 0.464 0 Darmstadt 2015-11-01 18:30:00 a 2 \n", "3 0.875 0.521 2 Darmstadt 2015-12-20 20:30:00 a 3 \n", "0 0.927 0.557 2 Werder Bremen 2014-12-07 16:30:00 a 5 \n", "\n", " h_team id lastAction match_id minute player \\\n", "0 Darmstadt 76737 Aerial 1044 93 György Garics \n", "1 Bayer Leverkusen 76808 Pass 1053 1 György Garics \n", "2 VfB Stuttgart 78492 Foul 1118 67 György Garics \n", "3 Borussia M.Gladbach 79876 Aerial 1173 58 György Garics \n", "0 Eintracht Frankfurt 27374 Pass 5320 78 Luca Caldirola \n", "\n", " player_assisted player_id result season shotType situation \n", "0 NaN 2 MissedShots 2015 RightFoot FromCorner \n", "1 Konstantin Rausch 2 SavedShot 2015 RightFoot SetPiece \n", "2 NaN 2 OwnGoal 2015 Head SetPiece \n", "3 NaN 2 MissedShots 2015 Head FromCorner \n", "0 Fin Bartels 3 Goal 2014 LeftFoot OpenPlay " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
XYa_goalsh_goalsidmatch_idminuteplayer_idseason
count226499.000000226499.000000226499.000000226499.000000226499.000000226499.000000226499.000000226499.000000226499.000000
mean0.8402790.5042041.2026061.564700115749.6397644913.07940048.4766162225.0268522015.729778
std0.0909280.1321191.1725511.34020668962.7919592932.81604226.5797041675.1443621.287854
min0.0030000.0010000.0000000.0000001.00000081.0000000.0000002.0000002014.000000
25%0.7750000.4100000.0000001.00000056663.5000002353.00000026.000000807.0000002015.000000
50%0.8590000.5020001.0000001.000000113368.0000004795.00000049.0000001933.0000002016.000000
75%0.9070000.5990002.0000002.000000170021.5000007515.00000071.0000003258.0000002017.000000
max0.9990000.9970009.00000010.000000243199.00000010792.000000103.0000007255.0000002018.000000
\n", "
" ], "text/plain": [ " X Y a_goals h_goals \\\n", "count 226499.000000 226499.000000 226499.000000 226499.000000 \n", "mean 0.840279 0.504204 1.202606 1.564700 \n", "std 0.090928 0.132119 1.172551 1.340206 \n", "min 0.003000 0.001000 0.000000 0.000000 \n", "25% 0.775000 0.410000 0.000000 1.000000 \n", "50% 0.859000 0.502000 1.000000 1.000000 \n", "75% 0.907000 0.599000 2.000000 2.000000 \n", "max 0.999000 0.997000 9.000000 10.000000 \n", "\n", " id match_id minute player_id \\\n", "count 226499.000000 226499.000000 226499.000000 226499.000000 \n", "mean 115749.639764 4913.079400 48.476616 2225.026852 \n", "std 68962.791959 2932.816042 26.579704 1675.144362 \n", "min 1.000000 81.000000 0.000000 2.000000 \n", "25% 56663.500000 2353.000000 26.000000 807.000000 \n", "50% 113368.000000 4795.000000 49.000000 1933.000000 \n", "75% 170021.500000 7515.000000 71.000000 3258.000000 \n", "max 243199.000000 10792.000000 103.000000 7255.000000 \n", "\n", " season \n", "count 226499.000000 \n", "mean 2015.729778 \n", "std 1.287854 \n", "min 2014.000000 \n", "25% 2015.000000 \n", "50% 2016.000000 \n", "75% 2017.000000 \n", "max 2018.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
a_teamh_ah_teamlastActionplayerplayer_assistedresultshotTypesituation
count226499226499226499226499226499165662226499226499226499
unique16021603948464839645
topReal MadridhReal MadridPassCristiano RonaldoDimitri PayetMissedShotsRightFootOpenPlay
freq239112541524378011088953589945118641166304
\n", "
" ], "text/plain": [ " a_team h_a h_team lastAction player \\\n", "count 226499 226499 226499 226499 226499 \n", "unique 160 2 160 39 4846 \n", "top Real Madrid h Real Madrid Pass Cristiano Ronaldo \n", "freq 2391 125415 2437 80110 889 \n", "\n", " player_assisted result shotType situation \n", "count 165662 226499 226499 226499 \n", "unique 4839 6 4 5 \n", "top Dimitri Payet MissedShots RightFoot OpenPlay \n", "freq 535 89945 118641 166304 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe(include=['O'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset has around 226,500 shots which aren't really a lot but enough to have a basic model running." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "Index(['X', 'Y', 'a_goals', 'a_team', 'date', 'h_a', 'h_goals', 'h_team', 'id',\n", " 'lastAction', 'match_id', 'minute', 'player', 'player_assisted',\n", " 'player_id', 'result', 'season', 'shotType', 'situation'],\n", " dtype='object')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PRE-PROCESSING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The X & Y co-ordinates present in the data are scaled from 0 to 1 so I had to find a suitable multipliying factor. The field was considered to be of dimensions of 104*76 yards.\n", "The co-ordinates were scaled acccordingly." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df['X'] = df['X']*104\n", "df['Y'] = df['Y']*76" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try and plot the shot locations" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from matplotlib.patches import Arc\n", "\n", "fig=plt.figure()\n", "#fig,ax = plt.subplots(figsize=(10.4,7.6))\n", "ax=fig.add_subplot(1,1,1)\n", "ax.axis('off')\n", "\n", "plt.plot([0,104],[0,0],color=\"black\")\n", "plt.plot([0,0],[76,0],color=\"black\")\n", "plt.plot([0,104],[76,76],color=\"black\")\n", "plt.plot([104,104],[76,0],color=\"black\")\n", "plt.plot([52,52],[0,76],color=\"black\")\n", "\n", "plt.plot([104,86.32],[60,60],color=\"black\")\n", "plt.plot([86.32,104],[16,16],color=\"black\")\n", "plt.plot([86.32,86.32],[60,16],color=\"black\")\n", "plt.plot([104,97.97],[48,48],color=\"black\")\n", "plt.plot([104,97.97],[27.968,27.968],color=\"black\")\n", "plt.plot([97.97,97.97],[48,27.968],color=\"black\")\n", "\n", "plt.plot([104,104],[34,42],color=\"red\")\n", "\n", "penaltySpot = plt.Circle((92.04,38),0.25,color=\"black\")\n", "centreSpot = plt.Circle((52,38),0.5,color=\"black\")\n", "centreCircle = plt.Circle((52,38),10,color=\"black\",fill=False)\n", "\n", "D = Arc((92.04,38),height=20,width=20,angle=0,theta1=125,theta2=235,color=\"black\")\n", "\n", "ax.add_patch(centreSpot)\n", "ax.add_patch(centreCircle)\n", "ax.add_patch(penaltySpot)\n", "ax.add_patch(D)\n", "\n", "\n", "plt.scatter(df['X'],df['Y'])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the shots are when the team is attacking from right to left. The shots on the left end of the pitch are perhaps own goals that will have to be removed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We know that distance is a major factor for shots to be converted into goals and thus I added a distance feature. The distance will be measured from the shot location to the centre of the goal." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "df['distance'] = (((df['X']-104)**2 + (df['Y']-38)**2)**(1/2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another important factor to be taken into account is the angle of view the striker has. A smaller angle will obviously minimize the chance of a shot being converted into a goal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use trigonometry to find the angle extended by both the goal posts onto the shot location.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "cos c = (A^2+B^2-C^2)/2*A*B" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "image = mpimg.imread(\"angle.jpg\")\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pardon my paint skills, but the diagram above explains the angle c we are trying to find. The Red Dot is the shot location and the rectangle on the right is the goalmouth." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, I find the distance between the goal posts and the shot location and use it to find the angle extended by the goal onto the shot location." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import math\n", "temp = pd.DataFrame()\n", "temp['a'] = ((df['X']-104)**2 + (df['Y']-42)**2)\n", "temp['b'] = ((df['X']-104)**2 + (df['Y']-34)**2)\n", "df['angle'] = (temp['a']+temp['b']-144)/(2*temp['a']*temp['b'])\n", "df['angle'] = df['angle'].apply(math.acos)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Distance and Angle have been added as columns to the DataFrame." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Index(['X', 'Y', 'a_goals', 'a_team', 'date', 'h_a', 'h_goals', 'h_team', 'id',\n", " 'lastAction', 'match_id', 'minute', 'player', 'player_assisted',\n", " 'player_id', 'result', 'season', 'shotType', 'situation', 'distance',\n", " 'angle'],\n", " dtype='object')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the columns in the categs list are actually categories, we convert them into categories." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "categs = ['lastAction','result','shotType', 'situation']\n", "for categ in categs:\n", " df[categ] = df[categ].astype('category')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us have a look at the categories for each specific feature." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "lastAction\n", "Index(['Aerial', 'BallRecovery', 'BallTouch', 'BlockedPass', 'Card',\n", " 'Challenge', 'ChanceMissed', 'Chipped', 'Clearance', 'CornerAwarded',\n", " 'Cross', 'CrossNotClaimed', 'Dispossessed', 'End', 'Error',\n", " 'FormationChange', 'Foul', 'Goal', 'GoodSkill', 'HeadPass',\n", " 'Interception', 'KeeperPickup', 'KeeperSweeper', 'LayOff', 'None',\n", " 'OffsidePass', 'OffsideProvoked', 'Pass', 'PenaltyFaced', 'Punch',\n", " 'Rebound', 'Save', 'ShieldBallOpp', 'Standard', 'Start',\n", " 'SubstitutionOn', 'Tackle', 'TakeOn', 'Throughball'],\n", " dtype='object')\n", "result\n", "Index(['BlockedShot', 'Goal', 'MissedShots', 'OwnGoal', 'SavedShot',\n", " 'ShotOnPost'],\n", " dtype='object')\n", "shotType\n", "Index(['Head', 'LeftFoot', 'OtherBodyPart', 'RightFoot'], dtype='object')\n", "situation\n", "Index(['DirectFreekick', 'FromCorner', 'OpenPlay', 'Penalty', 'SetPiece'], dtype='object')\n" ] } ], "source": [ "for categ in categs:\n", " print(categ)\n", " print(df[categ].cat.categories)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we can train the model, we need to remove any OwnGoals that might spoil the model and thus, we do so." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df = df[df['result'] != 'OwnGoal']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Introducing a binary column by the name of Goal." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "df['goal'] = (df['result'] == 'Goal')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One Hot Encoding for the situation column is done in the DataFrame so that Logistic Regression might not misinterpret the situations as something of different weightages.\n", "\n", "LabelBinarizer from sklearn is used to carry out the one hot encoding." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.preprocessing import LabelBinarizer\n", "le = LabelBinarizer()\n", "le.fit(df['situation'])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "temp = le.transform(df['situation'])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['DirectFreekick', 'FromCorner', 'OpenPlay', 'Penalty', 'SetPiece'],\n", " dtype='\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
XYa_goalsa_teamdateh_ah_goalsh_teamidlastAction...goalisDirectFreekickisFromCornerisOpenPlayisPenaltyisSetPiecefromHeadfromLeftFootfromOtherBodyPartfromRightFoot
073.52799728.8040010Hoffenheim2015-08-29 17:30:00h0Darmstadt76737Aerial...False010000001
175.71200328.3479991Darmstadt2015-09-12 17:30:00a0Bayer Leverkusen76808Pass...False000010001
391.00000039.5959992Darmstadt2015-12-20 20:30:00a3Borussia M.Gladbach79876Aerial...False010001000
096.40799742.3320012Werder Bremen2014-12-07 16:30:00a5Eintracht Frankfurt27374Pass...True001000100
193.49600245.4479992Darmstadt2015-09-27 19:30:00a2Borussia Dortmund77444None...False000010001
2101.92000035.8720012Darmstadt2015-10-17 17:30:00a0Augsburg78030None...False010000100
3102.33600236.1759992Darmstadt2015-10-17 17:30:00a0Augsburg78029Rebound...False010000100
492.04000032.9079990Darmstadt2015-11-01 18:30:00a2VfB Stuttgart78500Cross...False001000100
597.96799748.2600000Darmstadt2015-11-01 18:30:00a2VfB Stuttgart78501Aerial...False010001000
699.52799747.5759991Hamburger SV2015-11-07 21:30:00h1Darmstadt78849Aerial...False000011000
794.64000046.2840011Darmstadt2015-12-06 20:30:00a0Eintracht Frankfurt79502Cross...False010001000
895.36799740.2800001Darmstadt2015-12-06 20:30:00a0Eintracht Frankfurt79514Cross...False001001000
997.03200342.7879994Hertha Berlin2015-12-12 18:30:00h0Darmstadt79822Aerial...False010001000
1096.92799736.4040012Darmstadt2015-12-20 20:30:00a3Borussia M.Gladbach79863Cross...False010001000
1194.84799746.5120012Bayer Leverkusen2016-02-13 18:30:00h1Darmstadt80950Aerial...False001001000
1290.37600248.8680020Darmstadt2016-03-06 18:30:00a0Mainz 0581919Rebound...False000010100
1390.37600248.8680020Darmstadt2016-03-06 18:30:00a0Mainz 0581918Aerial...False000010001
14100.46399832.5279992VfB Stuttgart2016-04-02 17:30:00h2Darmstadt82623Cross...False010001000
1591.41600239.2159991Darmstadt2016-04-23 17:30:00a4FC Cologne83244None...False010000100
1677.48000043.7000000Werder Bremen2016-08-26 22:30:00a6Bayern Munich127431None...False001000100
17103.06399846.9679990Werder Bremen2016-08-26 22:30:00a6Bayern Munich127447Cross...False010000100
198.38399835.4159991Darmstadt2015-08-22 17:30:00a1Schalke 0476326Aerial...False000011000
297.03200334.3520011Darmstadt2015-08-22 17:30:00a1Schalke 0476330Aerial...False010001000
399.00799732.9079991Darmstadt2015-09-12 17:30:00a0Bayer Leverkusen76809Aerial...True000011000
491.31200337.4679992Darmstadt2015-09-27 19:30:00a2Borussia Dortmund77459None...True000010001
595.05600244.3840010Darmstadt2015-11-01 18:30:00a2VfB Stuttgart78477Aerial...False000011000
694.22399843.5479990Darmstadt2015-11-01 18:30:00a2VfB Stuttgart78498Chipped...False000011000
792.14399841.4959991Hamburger SV2015-11-07 21:30:00h1Darmstadt78845None...False001000001
899.32000036.4040011Darmstadt2015-11-22 20:40:00a3Ingolstadt79074Cross...True010001000
994.53600237.3159991Darmstadt2015-12-06 20:30:00a0Eintracht Frankfurt79498Cross...True000011000
..................................................................
983.09600227.2079992Villarreal2018-11-11 17:30:00a2Rayo Vallecano240027Pass...False001000100
1076.54399838.8359991Real Betis2018-11-25 19:45:00h2Villarreal241740None...False001000100
1193.28799735.1120011Real Betis2018-11-25 19:45:00h2Villarreal241746Pass...False001000100
1292.76799724.9279991Real Betis2018-11-25 19:45:00h2Villarreal241754Pass...True001000100
1388.71200324.6240011Real Betis2018-11-25 19:45:00h2Villarreal241757Pass...False001000100
1495.88799724.0160000Villarreal2018-12-02 17:30:00a2Barcelona243092Rebound...False001000100
1581.12000029.9440010Villarreal2018-12-02 17:30:00a2Barcelona243094Pass...False001000100
079.14399840.1279992Fulham2018-10-20 14:00:00h4Cardiff234806Pass...False001000001
096.30399835.1120012Udinese2018-10-28 13:00:00h2Genoa236059Cross...True010001000
195.88799737.4679990Genoa2018-11-03 14:00:00a5Inter236765Aerial...False010001000
293.91200342.4840012Napoli2018-11-10 19:30:00h1Genoa239498Cross...False001001000
384.34399835.4159991Sampdoria2018-11-25 19:30:00h1Genoa241719Rebound...False001000001
496.92799736.6320011Sampdoria2018-11-25 19:30:00h1Genoa241725Aerial...False010001000
592.45600240.9640011Genoa2018-12-02 14:00:00a2Torino242821Cross...False010001000
683.40799736.5559991Genoa2018-12-02 14:00:00a2Torino242824Aerial...False000010100
085.38399842.5600000Toulouse2018-11-24 16:00:00a1Paris Saint Germain240889BallRecovery...False001000100
092.35200343.3200000Monaco2018-11-03 19:00:00h1Reims237309Pass...False001000100
184.96799735.2640011Guingamp2018-11-24 19:00:00h2Reims241121Chipped...True001000100
098.48799733.4400000Real Valladolid2018-11-25 15:15:00a1Sevilla241542Aerial...False010001000
087.88000022.8000001Wolfsburg2018-11-09 19:30:00a2Hannover 96238742Pass...False001000100
088.29600238.0759994Paris Saint Germain2018-11-11 20:00:00h0Monaco240130None...False010000001
098.48799740.5840011Eintracht Frankfurt2018-10-28 11:30:00h1Nuernberg235992Chipped...True001001000
191.10399842.7879992Nuernberg2018-11-03 14:30:00a2Augsburg236785Cross...False001000100
2101.19200324.3959992Nuernberg2018-11-03 14:30:00a2Augsburg236802Chipped...False000011000
396.30399837.6200002Nuernberg2018-11-03 14:30:00a2Augsburg236807Pass...False001000100
4100.15200336.3279992Nuernberg2018-11-24 17:30:00a5Schalke 04241048Rebound...True001000001
088.50399857.0759991Atletico Madrid2018-12-02 15:15:00h1Girona242927Pass...False001000001
096.30399834.5040013Fortuna Duesseldorf2018-11-24 14:30:00a3Bayern Munich240596None...False000011000
178.93600238.9879991Mainz 052018-11-30 19:30:00h0Fortuna Duesseldorf241967Pass...False001000001
299.11200349.0200001Mainz 052018-11-30 19:30:00h0Fortuna Duesseldorf241976Cross...False010000001
\n", "

225797 rows × 31 columns

\n", "" ], "text/plain": [ " X Y a_goals a_team date \\\n", "0 73.527997 28.804001 0 Hoffenheim 2015-08-29 17:30:00 \n", "1 75.712003 28.347999 1 Darmstadt 2015-09-12 17:30:00 \n", "3 91.000000 39.595999 2 Darmstadt 2015-12-20 20:30:00 \n", "0 96.407997 42.332001 2 Werder Bremen 2014-12-07 16:30:00 \n", "1 93.496002 45.447999 2 Darmstadt 2015-09-27 19:30:00 \n", "2 101.920000 35.872001 2 Darmstadt 2015-10-17 17:30:00 \n", "3 102.336002 36.175999 2 Darmstadt 2015-10-17 17:30:00 \n", "4 92.040000 32.907999 0 Darmstadt 2015-11-01 18:30:00 \n", "5 97.967997 48.260000 0 Darmstadt 2015-11-01 18:30:00 \n", "6 99.527997 47.575999 1 Hamburger SV 2015-11-07 21:30:00 \n", "7 94.640000 46.284001 1 Darmstadt 2015-12-06 20:30:00 \n", "8 95.367997 40.280000 1 Darmstadt 2015-12-06 20:30:00 \n", "9 97.032003 42.787999 4 Hertha Berlin 2015-12-12 18:30:00 \n", "10 96.927997 36.404001 2 Darmstadt 2015-12-20 20:30:00 \n", "11 94.847997 46.512001 2 Bayer Leverkusen 2016-02-13 18:30:00 \n", "12 90.376002 48.868002 0 Darmstadt 2016-03-06 18:30:00 \n", "13 90.376002 48.868002 0 Darmstadt 2016-03-06 18:30:00 \n", "14 100.463998 32.527999 2 VfB Stuttgart 2016-04-02 17:30:00 \n", "15 91.416002 39.215999 1 Darmstadt 2016-04-23 17:30:00 \n", "16 77.480000 43.700000 0 Werder Bremen 2016-08-26 22:30:00 \n", "17 103.063998 46.967999 0 Werder Bremen 2016-08-26 22:30:00 \n", "1 98.383998 35.415999 1 Darmstadt 2015-08-22 17:30:00 \n", "2 97.032003 34.352001 1 Darmstadt 2015-08-22 17:30:00 \n", "3 99.007997 32.907999 1 Darmstadt 2015-09-12 17:30:00 \n", "4 91.312003 37.467999 2 Darmstadt 2015-09-27 19:30:00 \n", "5 95.056002 44.384001 0 Darmstadt 2015-11-01 18:30:00 \n", "6 94.223998 43.547999 0 Darmstadt 2015-11-01 18:30:00 \n", "7 92.143998 41.495999 1 Hamburger SV 2015-11-07 21:30:00 \n", "8 99.320000 36.404001 1 Darmstadt 2015-11-22 20:40:00 \n", "9 94.536002 37.315999 1 Darmstadt 2015-12-06 20:30:00 \n", ".. ... ... ... ... ... \n", "9 83.096002 27.207999 2 Villarreal 2018-11-11 17:30:00 \n", "10 76.543998 38.835999 1 Real Betis 2018-11-25 19:45:00 \n", "11 93.287997 35.112001 1 Real Betis 2018-11-25 19:45:00 \n", "12 92.767997 24.927999 1 Real Betis 2018-11-25 19:45:00 \n", "13 88.712003 24.624001 1 Real Betis 2018-11-25 19:45:00 \n", "14 95.887997 24.016000 0 Villarreal 2018-12-02 17:30:00 \n", "15 81.120000 29.944001 0 Villarreal 2018-12-02 17:30:00 \n", "0 79.143998 40.127999 2 Fulham 2018-10-20 14:00:00 \n", "0 96.303998 35.112001 2 Udinese 2018-10-28 13:00:00 \n", "1 95.887997 37.467999 0 Genoa 2018-11-03 14:00:00 \n", "2 93.912003 42.484001 2 Napoli 2018-11-10 19:30:00 \n", "3 84.343998 35.415999 1 Sampdoria 2018-11-25 19:30:00 \n", "4 96.927997 36.632001 1 Sampdoria 2018-11-25 19:30:00 \n", "5 92.456002 40.964001 1 Genoa 2018-12-02 14:00:00 \n", "6 83.407997 36.555999 1 Genoa 2018-12-02 14:00:00 \n", "0 85.383998 42.560000 0 Toulouse 2018-11-24 16:00:00 \n", "0 92.352003 43.320000 0 Monaco 2018-11-03 19:00:00 \n", "1 84.967997 35.264001 1 Guingamp 2018-11-24 19:00:00 \n", "0 98.487997 33.440000 0 Real Valladolid 2018-11-25 15:15:00 \n", "0 87.880000 22.800000 1 Wolfsburg 2018-11-09 19:30:00 \n", "0 88.296002 38.075999 4 Paris Saint Germain 2018-11-11 20:00:00 \n", "0 98.487997 40.584001 1 Eintracht Frankfurt 2018-10-28 11:30:00 \n", "1 91.103998 42.787999 2 Nuernberg 2018-11-03 14:30:00 \n", "2 101.192003 24.395999 2 Nuernberg 2018-11-03 14:30:00 \n", "3 96.303998 37.620000 2 Nuernberg 2018-11-03 14:30:00 \n", "4 100.152003 36.327999 2 Nuernberg 2018-11-24 17:30:00 \n", "0 88.503998 57.075999 1 Atletico Madrid 2018-12-02 15:15:00 \n", "0 96.303998 34.504001 3 Fortuna Duesseldorf 2018-11-24 14:30:00 \n", "1 78.936002 38.987999 1 Mainz 05 2018-11-30 19:30:00 \n", "2 99.112003 49.020000 1 Mainz 05 2018-11-30 19:30:00 \n", "\n", " h_a h_goals h_team id lastAction ... \\\n", "0 h 0 Darmstadt 76737 Aerial ... \n", "1 a 0 Bayer Leverkusen 76808 Pass ... \n", "3 a 3 Borussia M.Gladbach 79876 Aerial ... \n", "0 a 5 Eintracht Frankfurt 27374 Pass ... \n", "1 a 2 Borussia Dortmund 77444 None ... \n", "2 a 0 Augsburg 78030 None ... \n", "3 a 0 Augsburg 78029 Rebound ... \n", "4 a 2 VfB Stuttgart 78500 Cross ... \n", "5 a 2 VfB Stuttgart 78501 Aerial ... \n", "6 h 1 Darmstadt 78849 Aerial ... \n", "7 a 0 Eintracht Frankfurt 79502 Cross ... \n", "8 a 0 Eintracht Frankfurt 79514 Cross ... \n", "9 h 0 Darmstadt 79822 Aerial ... \n", "10 a 3 Borussia M.Gladbach 79863 Cross ... \n", "11 h 1 Darmstadt 80950 Aerial ... \n", "12 a 0 Mainz 05 81919 Rebound ... \n", "13 a 0 Mainz 05 81918 Aerial ... \n", "14 h 2 Darmstadt 82623 Cross ... \n", "15 a 4 FC Cologne 83244 None ... \n", "16 a 6 Bayern Munich 127431 None ... \n", "17 a 6 Bayern Munich 127447 Cross ... \n", "1 a 1 Schalke 04 76326 Aerial ... \n", "2 a 1 Schalke 04 76330 Aerial ... \n", "3 a 0 Bayer Leverkusen 76809 Aerial ... \n", "4 a 2 Borussia Dortmund 77459 None ... \n", "5 a 2 VfB Stuttgart 78477 Aerial ... \n", "6 a 2 VfB Stuttgart 78498 Chipped ... \n", "7 h 1 Darmstadt 78845 None ... \n", "8 a 3 Ingolstadt 79074 Cross ... \n", "9 a 0 Eintracht Frankfurt 79498 Cross ... \n", ".. .. ... ... ... ... ... \n", "9 a 2 Rayo Vallecano 240027 Pass ... \n", "10 h 2 Villarreal 241740 None ... \n", "11 h 2 Villarreal 241746 Pass ... \n", "12 h 2 Villarreal 241754 Pass ... \n", "13 h 2 Villarreal 241757 Pass ... \n", "14 a 2 Barcelona 243092 Rebound ... \n", "15 a 2 Barcelona 243094 Pass ... \n", "0 h 4 Cardiff 234806 Pass ... \n", "0 h 2 Genoa 236059 Cross ... \n", "1 a 5 Inter 236765 Aerial ... \n", "2 h 1 Genoa 239498 Cross ... \n", "3 h 1 Genoa 241719 Rebound ... \n", "4 h 1 Genoa 241725 Aerial ... \n", "5 a 2 Torino 242821 Cross ... \n", "6 a 2 Torino 242824 Aerial ... \n", "0 a 1 Paris Saint Germain 240889 BallRecovery ... \n", "0 h 1 Reims 237309 Pass ... \n", "1 h 2 Reims 241121 Chipped ... \n", "0 a 1 Sevilla 241542 Aerial ... \n", "0 a 2 Hannover 96 238742 Pass ... \n", "0 h 0 Monaco 240130 None ... \n", "0 h 1 Nuernberg 235992 Chipped ... \n", "1 a 2 Augsburg 236785 Cross ... \n", "2 a 2 Augsburg 236802 Chipped ... \n", "3 a 2 Augsburg 236807 Pass ... \n", "4 a 5 Schalke 04 241048 Rebound ... \n", "0 h 1 Girona 242927 Pass ... \n", "0 a 3 Bayern Munich 240596 None ... \n", "1 h 0 Fortuna Duesseldorf 241967 Pass ... \n", "2 h 0 Fortuna Duesseldorf 241976 Cross ... \n", "\n", " goal isDirectFreekick isFromCorner isOpenPlay isPenalty isSetPiece \\\n", "0 False 0 1 0 0 0 \n", "1 False 0 0 0 0 1 \n", "3 False 0 1 0 0 0 \n", "0 True 0 0 1 0 0 \n", "1 False 0 0 0 0 1 \n", "2 False 0 1 0 0 0 \n", "3 False 0 1 0 0 0 \n", "4 False 0 0 1 0 0 \n", "5 False 0 1 0 0 0 \n", "6 False 0 0 0 0 1 \n", "7 False 0 1 0 0 0 \n", "8 False 0 0 1 0 0 \n", "9 False 0 1 0 0 0 \n", "10 False 0 1 0 0 0 \n", "11 False 0 0 1 0 0 \n", "12 False 0 0 0 0 1 \n", "13 False 0 0 0 0 1 \n", "14 False 0 1 0 0 0 \n", "15 False 0 1 0 0 0 \n", "16 False 0 0 1 0 0 \n", "17 False 0 1 0 0 0 \n", "1 False 0 0 0 0 1 \n", "2 False 0 1 0 0 0 \n", "3 True 0 0 0 0 1 \n", "4 True 0 0 0 0 1 \n", "5 False 0 0 0 0 1 \n", "6 False 0 0 0 0 1 \n", "7 False 0 0 1 0 0 \n", "8 True 0 1 0 0 0 \n", "9 True 0 0 0 0 1 \n", ".. ... ... ... ... ... ... \n", "9 False 0 0 1 0 0 \n", "10 False 0 0 1 0 0 \n", "11 False 0 0 1 0 0 \n", "12 True 0 0 1 0 0 \n", "13 False 0 0 1 0 0 \n", "14 False 0 0 1 0 0 \n", "15 False 0 0 1 0 0 \n", "0 False 0 0 1 0 0 \n", "0 True 0 1 0 0 0 \n", "1 False 0 1 0 0 0 \n", "2 False 0 0 1 0 0 \n", "3 False 0 0 1 0 0 \n", "4 False 0 1 0 0 0 \n", "5 False 0 1 0 0 0 \n", "6 False 0 0 0 0 1 \n", "0 False 0 0 1 0 0 \n", "0 False 0 0 1 0 0 \n", "1 True 0 0 1 0 0 \n", "0 False 0 1 0 0 0 \n", "0 False 0 0 1 0 0 \n", "0 False 0 1 0 0 0 \n", "0 True 0 0 1 0 0 \n", "1 False 0 0 1 0 0 \n", "2 False 0 0 0 0 1 \n", "3 False 0 0 1 0 0 \n", "4 True 0 0 1 0 0 \n", "0 False 0 0 1 0 0 \n", "0 False 0 0 0 0 1 \n", "1 False 0 0 1 0 0 \n", "2 False 0 1 0 0 0 \n", "\n", " fromHead fromLeftFoot fromOtherBodyPart fromRightFoot \n", "0 0 0 0 1 \n", "1 0 0 0 1 \n", "3 1 0 0 0 \n", "0 0 1 0 0 \n", "1 0 0 0 1 \n", "2 0 1 0 0 \n", "3 0 1 0 0 \n", "4 0 1 0 0 \n", "5 1 0 0 0 \n", "6 1 0 0 0 \n", "7 1 0 0 0 \n", "8 1 0 0 0 \n", "9 1 0 0 0 \n", "10 1 0 0 0 \n", "11 1 0 0 0 \n", "12 0 1 0 0 \n", "13 0 0 0 1 \n", "14 1 0 0 0 \n", "15 0 1 0 0 \n", "16 0 1 0 0 \n", "17 0 1 0 0 \n", "1 1 0 0 0 \n", "2 1 0 0 0 \n", "3 1 0 0 0 \n", "4 0 0 0 1 \n", "5 1 0 0 0 \n", "6 1 0 0 0 \n", "7 0 0 0 1 \n", "8 1 0 0 0 \n", "9 1 0 0 0 \n", ".. ... ... ... ... \n", "9 0 1 0 0 \n", "10 0 1 0 0 \n", "11 0 1 0 0 \n", "12 0 1 0 0 \n", "13 0 1 0 0 \n", "14 0 1 0 0 \n", "15 0 1 0 0 \n", "0 0 0 0 1 \n", "0 1 0 0 0 \n", "1 1 0 0 0 \n", "2 1 0 0 0 \n", "3 0 0 0 1 \n", "4 1 0 0 0 \n", "5 1 0 0 0 \n", "6 0 1 0 0 \n", "0 0 1 0 0 \n", "0 0 1 0 0 \n", "1 0 1 0 0 \n", "0 1 0 0 0 \n", "0 0 1 0 0 \n", "0 0 0 0 1 \n", "0 1 0 0 0 \n", "1 0 1 0 0 \n", "2 1 0 0 0 \n", "3 0 1 0 0 \n", "4 0 0 0 1 \n", "0 0 0 0 1 \n", "0 1 0 0 0 \n", "1 0 0 0 1 \n", "2 0 0 0 1 \n", "\n", "[225797 rows x 31 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Index(['X', 'Y', 'a_goals', 'a_team', 'date', 'h_a', 'h_goals', 'h_team', 'id',\n", " 'lastAction', 'match_id', 'minute', 'player', 'player_assisted',\n", " 'player_id', 'result', 'season', 'shotType', 'situation', 'distance',\n", " 'angle', 'goal', 'isDirectFreekick', 'isFromCorner', 'isOpenPlay',\n", " 'isPenalty', 'isSetPiece', 'fromHead', 'fromLeftFoot',\n", " 'fromOtherBodyPart', 'fromRightFoot'],\n", " dtype='object')" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oh Boi, that's a lot of columns. Not to worry though. We will not be using all of them. Only a subset, atleast for this model. This subset is stored in the cols list" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "cols = ['distance','angle','isDirectFreekick','isFromCorner','isOpenPlay','isPenalty','isSetPiece','fromHead','fromLeftFoot','fromOtherBodyPart','fromRightFoot','goal']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a new DataFrame by the name of shot to store the subset data from the selected columns." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "shot = df[cols]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The aim of this whole endeavour is to make a model which will output wether a shot is goal or not. Thus, the goal feature is taken as a series by the name of Y and the rest of the independent features are stored in the DataFrame X." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "X=shot.drop('goal',axis=1)\n", "Y=shot['goal']" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['distance', 'angle', 'isDirectFreekick', 'isFromCorner', 'isOpenPlay',\n", " 'isPenalty', 'isSetPiece', 'fromHead', 'fromLeftFoot',\n", " 'fromOtherBodyPart', 'fromRightFoot'],\n", " dtype='object')" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y series currently consists of values in the form of True and False but we would like it to be a binary value i.e. 1 or 0. Thus, we use LabelEncoder available in the sklearn library." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import LabelEncoder\n", "le = LabelEncoder()\n", "Y=le.fit_transform(Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will have to construct both a train data set and test data set to check the validity of our model.\n", "Thus, we break up our dataset into two parts in 70:30 ratio. 70% of shots are present in the training set and the rest 30% are used as an hold-out test set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### MODEL FITTING & TESTING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lo and Behold, we are finally at the point to train the model. The first Model that I'm using here is Logistic Regression because it is realtively cheaper computationally." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "D:\\ProgramFiles\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", " \"This module will be removed in 0.20.\", DeprecationWarning)\n" ] }, { "data": { "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", " verbose=0, warm_start=False)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "from sklearn.cross_validation import train_test_split\n", "\n", "reg = LogisticRegression()\n", "\n", "X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=42)\n", "\n", "reg.fit(X_train,Y_train)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9048863300856215" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reg.score(X_test,Y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We obtain an accuracy of 90.4% which is decent considering the size of data. Moreover, we did not take a lot of features into account." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "90% sounds good but it's classifying wheteher a shot is a goal or not and the dataset is skewed towards having a lot more shots as not a goal in comaprisons to ones which actually ended up being a goal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus, judging the performance of a model based on just the accuracy is not a good practice. Let's look at the predictions themselves with the help of a confusion matrix." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[60210 442]\n", " [ 6001 1087]]\n" ] } ], "source": [ "preds = reg.predict(X_test)\n", "\n", "from sklearn.metrics import confusion_matrix\n", "confusion_matrix = confusion_matrix(Y_test,preds)\n", "print(confusion_matrix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Confusion Matrix shows us Exactly how the model performed in terms of raw output. The top left value indicates shots that were rightly predicted as not goals. The righta are shots that were predicted as goals but were not.\n", "\n", "The second row contains shots that were classified as not goal but were infact, goals. On the right of it, we have the shots that were correctly classified as goals." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another tool to look at the performance of the model is an ROC curve. The area under ROC curve for our model is 80% which is quite good." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.metrics import roc_curve\n", "from sklearn.metrics import auc\n", "probs = reg.predict_proba(X_test)\n", "preds = probs[:,1]\n", "fpr, tpr, threshold = roc_curve(Y_test, preds)\n", "roc_auc = auc(fpr, tpr)\n", "\n", "# method I: plt\n", "import matplotlib.pyplot as plt\n", "plt.title('Receiver Operating Characteristic')\n", "plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)\n", "plt.legend(loc = 'lower right')\n", "plt.plot([0, 1], [0, 1],'r--')\n", "plt.xlim([0, 1])\n", "plt.ylim([0, 1])\n", "plt.ylabel('True Positive Rate')\n", "plt.xlabel('False Positive Rate')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same methodology is used to build a Random Forest Classification Model." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "reg2 = RandomForestClassifier()\n", "reg2.fit(X_train,Y_train)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.8782698553291999" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reg2.score(X_test,Y_test)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[57700 2952]\n", " [ 5294 1794]]\n" ] } ], "source": [ "preds = reg2.predict(X_test)\n", "\n", "from sklearn.metrics import confusion_matrix\n", "confusion_matrix = confusion_matrix(Y_test,preds)\n", "print(confusion_matrix)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.metrics import roc_curve\n", "from sklearn.metrics import auc\n", "probs = reg2.predict_proba(X_test)\n", "preds = probs[:,1]\n", "fpr, tpr, threshold = roc_curve(Y_test, preds)\n", "roc_auc = auc(fpr, tpr)\n", "\n", "# method I: plt\n", "import matplotlib.pyplot as plt\n", "plt.title('Receiver Operating Characteristic')\n", "plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)\n", "plt.legend(loc = 'lower right')\n", "plt.plot([0, 1], [0, 1],'r--')\n", "plt.xlim([0, 1])\n", "plt.ylim([0, 1])\n", "plt.ylabel('True Positive Rate')\n", "plt.xlabel('False Positive Rate')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we see that the both accuracy and area under ROC curve is worse, Logistic Regression seems to be a better choice for our Expected Goals Model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### RESUTLS" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "temp = df[cols]\n", "shots=shot.drop('goal',axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below graph shows the value of xG changing as the distance and angle change." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(shots['distance'],shots['angle'],c=reg.predict_proba(shots)[:,1])\n", "plt.xlabel('distance')\n", "plt.ylabel('angle')\n", "plt.colorbar()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from matplotlib.patches import Arc\n", "\n", "fig=plt.figure()\n", "#fig,ax = plt.subplots(figsize=(10.4,7.6))\n", "ax=fig.add_subplot(1,1,1)\n", "ax.axis('off')\n", "\n", "plt.plot([0,104],[0,0],color=\"black\")\n", "plt.plot([0,0],[76,0],color=\"black\")\n", "plt.plot([0,104],[76,76],color=\"black\")\n", "plt.plot([104,104],[76,0],color=\"black\")\n", "plt.plot([52,52],[0,76],color=\"black\")\n", "\n", "plt.plot([104,86.32],[60,60],color=\"black\")\n", "plt.plot([86.32,104],[16,16],color=\"black\")\n", "plt.plot([86.32,86.32],[60,16],color=\"black\")\n", "plt.plot([104,97.97],[48,48],color=\"black\")\n", "plt.plot([104,97.97],[27.968,27.968],color=\"black\")\n", "plt.plot([97.97,97.97],[48,27.968],color=\"black\")\n", "\n", "plt.plot([104,104],[34,42],color=\"red\")\n", "\n", "penaltySpot = plt.Circle((92.04,38),0.25,color=\"black\")\n", "centreSpot = plt.Circle((52,38),0.5,color=\"black\")\n", "centreCircle = plt.Circle((52,38),10,color=\"black\",fill=False)\n", "\n", "D = Arc((92.04,38),height=20,width=20,angle=0,theta1=125,theta2=235,color=\"black\")\n", "\n", "ax.add_patch(centreSpot)\n", "ax.add_patch(centreCircle)\n", "ax.add_patch(penaltySpot)\n", "ax.add_patch(D)\n", "\n", "\n", "plt.scatter(df['X'],df['Y'],c=reg.predict_proba(shots)[:,1],alpha=0.7)\n", "plt.colorbar()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above graph shows that our xG model atleast captures the intuitive nature of shot conversion. A shot from a greater angle of view and lesser distance is more likely to go in." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus, we add the model output of xG to the original dataframe" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "df['xG'] = reg.predict_proba(shots)[:,1]" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['X', 'Y', 'a_goals', 'a_team', 'date', 'h_a', 'h_goals', 'h_team', 'id',\n", " 'lastAction', 'match_id', 'minute', 'player', 'player_assisted',\n", " 'player_id', 'result', 'season', 'shotType', 'situation', 'distance',\n", " 'angle', 'goal', 'isDirectFreekick', 'isFromCorner', 'isOpenPlay',\n", " 'isPenalty', 'isSetPiece', 'fromHead', 'fromLeftFoot',\n", " 'fromOtherBodyPart', 'fromRightFoot', 'xG'],\n", " dtype='object')" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Text(0,0.5,'Expected Goals')" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Actual = df.groupby('player')['goal'].sum()\n", "Expected = df.groupby('player')['xG'].sum()\n", "\n", "plt.scatter(Actual,Expected)\n", "plt.title('Goals v Expected Goals')\n", "plt.xlabel('Actual Goals')\n", "plt.ylabel('Expected Goals')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that a clear linear relationship exists between the Expected Goals and Actual Goals which is a very good sign for our model.\n", "Let's try fitting a line to this." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0,0.5,'Expected Goals')" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "from sklearn.linear_model import LinearRegression\n", "lmod = LinearRegression()\n", "lmod.fit(np.array(Actual).reshape(-1,1),Expected)\n", "\n", "x=np.linspace(0,150,1000)\n", "y=lmod.predict(x.reshape(-1,1))\n", "\n", "plt.plot(x,y,color='r')\n", "plt.scatter(Actual,Expected)\n", "plt.title('Goals v Expected Goals')\n", "plt.xlabel('Actual Goals')\n", "plt.ylabel('Expected Goals')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On a side note, 10 Ballon D'or on the top right." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### CONCLUSION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The model was rather simplistic in nature and was made with very few features still managed to capture the intuitiveness of the data and is decent at predicting the number of goals given shot locations and some context data. \n", "\n", "The model could be refined more with more context and more importantly, more data." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }