"
],
"text/plain": [
" a_team h_a h_team lastAction player \\\n",
"count 226499 226499 226499 226499 226499 \n",
"unique 160 2 160 39 4846 \n",
"top Real Madrid h Real Madrid Pass Cristiano Ronaldo \n",
"freq 2391 125415 2437 80110 889 \n",
"\n",
" player_assisted result shotType situation \n",
"count 165662 226499 226499 226499 \n",
"unique 4839 6 4 5 \n",
"top Dimitri Payet MissedShots RightFoot OpenPlay \n",
"freq 535 89945 118641 166304 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe(include=['O'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dataset has around 226,500 shots which aren't really a lot but enough to have a basic model running."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['X', 'Y', 'a_goals', 'a_team', 'date', 'h_a', 'h_goals', 'h_team', 'id',\n",
" 'lastAction', 'match_id', 'minute', 'player', 'player_assisted',\n",
" 'player_id', 'result', 'season', 'shotType', 'situation'],\n",
" dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### PRE-PROCESSING"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The X & Y co-ordinates present in the data are scaled from 0 to 1 so I had to find a suitable multipliying factor. The field was considered to be of dimensions of 104*76 yards.\n",
"The co-ordinates were scaled acccordingly."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df['X'] = df['X']*104\n",
"df['Y'] = df['Y']*76"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try and plot the shot locations"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"from matplotlib.patches import Arc\n",
"\n",
"fig=plt.figure()\n",
"#fig,ax = plt.subplots(figsize=(10.4,7.6))\n",
"ax=fig.add_subplot(1,1,1)\n",
"ax.axis('off')\n",
"\n",
"plt.plot([0,104],[0,0],color=\"black\")\n",
"plt.plot([0,0],[76,0],color=\"black\")\n",
"plt.plot([0,104],[76,76],color=\"black\")\n",
"plt.plot([104,104],[76,0],color=\"black\")\n",
"plt.plot([52,52],[0,76],color=\"black\")\n",
"\n",
"plt.plot([104,86.32],[60,60],color=\"black\")\n",
"plt.plot([86.32,104],[16,16],color=\"black\")\n",
"plt.plot([86.32,86.32],[60,16],color=\"black\")\n",
"plt.plot([104,97.97],[48,48],color=\"black\")\n",
"plt.plot([104,97.97],[27.968,27.968],color=\"black\")\n",
"plt.plot([97.97,97.97],[48,27.968],color=\"black\")\n",
"\n",
"plt.plot([104,104],[34,42],color=\"red\")\n",
"\n",
"penaltySpot = plt.Circle((92.04,38),0.25,color=\"black\")\n",
"centreSpot = plt.Circle((52,38),0.5,color=\"black\")\n",
"centreCircle = plt.Circle((52,38),10,color=\"black\",fill=False)\n",
"\n",
"D = Arc((92.04,38),height=20,width=20,angle=0,theta1=125,theta2=235,color=\"black\")\n",
"\n",
"ax.add_patch(centreSpot)\n",
"ax.add_patch(centreCircle)\n",
"ax.add_patch(penaltySpot)\n",
"ax.add_patch(D)\n",
"\n",
"\n",
"plt.scatter(df['X'],df['Y'])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All the shots are when the team is attacking from right to left. The shots on the left end of the pitch are perhaps own goals that will have to be removed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know that distance is a major factor for shots to be converted into goals and thus I added a distance feature. The distance will be measured from the shot location to the centre of the goal."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"df['distance'] = (((df['X']-104)**2 + (df['Y']-38)**2)**(1/2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another important factor to be taken into account is the angle of view the striker has. A smaller angle will obviously minimize the chance of a shot being converted into a goal."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use trigonometry to find the angle extended by both the goal posts onto the shot location.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"cos c = (A^2+B^2-C^2)/2*A*B"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import matplotlib.image as mpimg\n",
"image = mpimg.imread(\"angle.jpg\")\n",
"plt.imshow(image)\n",
"plt.axis('off')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pardon my paint skills, but the diagram above explains the angle c we are trying to find. The Red Dot is the shot location and the rectangle on the right is the goalmouth."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below, I find the distance between the goal posts and the shot location and use it to find the angle extended by the goal onto the shot location."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import math\n",
"temp = pd.DataFrame()\n",
"temp['a'] = ((df['X']-104)**2 + (df['Y']-42)**2)\n",
"temp['b'] = ((df['X']-104)**2 + (df['Y']-34)**2)\n",
"df['angle'] = (temp['a']+temp['b']-144)/(2*temp['a']*temp['b'])\n",
"df['angle'] = df['angle'].apply(math.acos)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Distance and Angle have been added as columns to the DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['X', 'Y', 'a_goals', 'a_team', 'date', 'h_a', 'h_goals', 'h_team', 'id',\n",
" 'lastAction', 'match_id', 'minute', 'player', 'player_assisted',\n",
" 'player_id', 'result', 'season', 'shotType', 'situation', 'distance',\n",
" 'angle'],\n",
" dtype='object')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the columns in the categs list are actually categories, we convert them into categories."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"categs = ['lastAction','result','shotType', 'situation']\n",
"for categ in categs:\n",
" df[categ] = df[categ].astype('category')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us have a look at the categories for each specific feature."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"lastAction\n",
"Index(['Aerial', 'BallRecovery', 'BallTouch', 'BlockedPass', 'Card',\n",
" 'Challenge', 'ChanceMissed', 'Chipped', 'Clearance', 'CornerAwarded',\n",
" 'Cross', 'CrossNotClaimed', 'Dispossessed', 'End', 'Error',\n",
" 'FormationChange', 'Foul', 'Goal', 'GoodSkill', 'HeadPass',\n",
" 'Interception', 'KeeperPickup', 'KeeperSweeper', 'LayOff', 'None',\n",
" 'OffsidePass', 'OffsideProvoked', 'Pass', 'PenaltyFaced', 'Punch',\n",
" 'Rebound', 'Save', 'ShieldBallOpp', 'Standard', 'Start',\n",
" 'SubstitutionOn', 'Tackle', 'TakeOn', 'Throughball'],\n",
" dtype='object')\n",
"result\n",
"Index(['BlockedShot', 'Goal', 'MissedShots', 'OwnGoal', 'SavedShot',\n",
" 'ShotOnPost'],\n",
" dtype='object')\n",
"shotType\n",
"Index(['Head', 'LeftFoot', 'OtherBodyPart', 'RightFoot'], dtype='object')\n",
"situation\n",
"Index(['DirectFreekick', 'FromCorner', 'OpenPlay', 'Penalty', 'SetPiece'], dtype='object')\n"
]
}
],
"source": [
"for categ in categs:\n",
" print(categ)\n",
" print(df[categ].cat.categories)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we can train the model, we need to remove any OwnGoals that might spoil the model and thus, we do so."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"df = df[df['result'] != 'OwnGoal']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Introducing a binary column by the name of Goal."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"df['goal'] = (df['result'] == 'Goal')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One Hot Encoding for the situation column is done in the DataFrame so that Logistic Regression might not misinterpret the situations as something of different weightages.\n",
"\n",
"LabelBinarizer from sklearn is used to carry out the one hot encoding."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import LabelBinarizer\n",
"le = LabelBinarizer()\n",
"le.fit(df['situation'])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"temp = le.transform(df['situation'])"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['DirectFreekick', 'FromCorner', 'OpenPlay', 'Penalty', 'SetPiece'],\n",
" dtype='\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
X
\n",
"
Y
\n",
"
a_goals
\n",
"
a_team
\n",
"
date
\n",
"
h_a
\n",
"
h_goals
\n",
"
h_team
\n",
"
id
\n",
"
lastAction
\n",
"
...
\n",
"
goal
\n",
"
isDirectFreekick
\n",
"
isFromCorner
\n",
"
isOpenPlay
\n",
"
isPenalty
\n",
"
isSetPiece
\n",
"
fromHead
\n",
"
fromLeftFoot
\n",
"
fromOtherBodyPart
\n",
"
fromRightFoot
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
73.527997
\n",
"
28.804001
\n",
"
0
\n",
"
Hoffenheim
\n",
"
2015-08-29 17:30:00
\n",
"
h
\n",
"
0
\n",
"
Darmstadt
\n",
"
76737
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
1
\n",
"
75.712003
\n",
"
28.347999
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-09-12 17:30:00
\n",
"
a
\n",
"
0
\n",
"
Bayer Leverkusen
\n",
"
76808
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
3
\n",
"
91.000000
\n",
"
39.595999
\n",
"
2
\n",
"
Darmstadt
\n",
"
2015-12-20 20:30:00
\n",
"
a
\n",
"
3
\n",
"
Borussia M.Gladbach
\n",
"
79876
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
0
\n",
"
96.407997
\n",
"
42.332001
\n",
"
2
\n",
"
Werder Bremen
\n",
"
2014-12-07 16:30:00
\n",
"
a
\n",
"
5
\n",
"
Eintracht Frankfurt
\n",
"
27374
\n",
"
Pass
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
93.496002
\n",
"
45.447999
\n",
"
2
\n",
"
Darmstadt
\n",
"
2015-09-27 19:30:00
\n",
"
a
\n",
"
2
\n",
"
Borussia Dortmund
\n",
"
77444
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
2
\n",
"
101.920000
\n",
"
35.872001
\n",
"
2
\n",
"
Darmstadt
\n",
"
2015-10-17 17:30:00
\n",
"
a
\n",
"
0
\n",
"
Augsburg
\n",
"
78030
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
102.336002
\n",
"
36.175999
\n",
"
2
\n",
"
Darmstadt
\n",
"
2015-10-17 17:30:00
\n",
"
a
\n",
"
0
\n",
"
Augsburg
\n",
"
78029
\n",
"
Rebound
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
92.040000
\n",
"
32.907999
\n",
"
0
\n",
"
Darmstadt
\n",
"
2015-11-01 18:30:00
\n",
"
a
\n",
"
2
\n",
"
VfB Stuttgart
\n",
"
78500
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
5
\n",
"
97.967997
\n",
"
48.260000
\n",
"
0
\n",
"
Darmstadt
\n",
"
2015-11-01 18:30:00
\n",
"
a
\n",
"
2
\n",
"
VfB Stuttgart
\n",
"
78501
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
6
\n",
"
99.527997
\n",
"
47.575999
\n",
"
1
\n",
"
Hamburger SV
\n",
"
2015-11-07 21:30:00
\n",
"
h
\n",
"
1
\n",
"
Darmstadt
\n",
"
78849
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
7
\n",
"
94.640000
\n",
"
46.284001
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-12-06 20:30:00
\n",
"
a
\n",
"
0
\n",
"
Eintracht Frankfurt
\n",
"
79502
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
8
\n",
"
95.367997
\n",
"
40.280000
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-12-06 20:30:00
\n",
"
a
\n",
"
0
\n",
"
Eintracht Frankfurt
\n",
"
79514
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
9
\n",
"
97.032003
\n",
"
42.787999
\n",
"
4
\n",
"
Hertha Berlin
\n",
"
2015-12-12 18:30:00
\n",
"
h
\n",
"
0
\n",
"
Darmstadt
\n",
"
79822
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
10
\n",
"
96.927997
\n",
"
36.404001
\n",
"
2
\n",
"
Darmstadt
\n",
"
2015-12-20 20:30:00
\n",
"
a
\n",
"
3
\n",
"
Borussia M.Gladbach
\n",
"
79863
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
11
\n",
"
94.847997
\n",
"
46.512001
\n",
"
2
\n",
"
Bayer Leverkusen
\n",
"
2016-02-13 18:30:00
\n",
"
h
\n",
"
1
\n",
"
Darmstadt
\n",
"
80950
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
12
\n",
"
90.376002
\n",
"
48.868002
\n",
"
0
\n",
"
Darmstadt
\n",
"
2016-03-06 18:30:00
\n",
"
a
\n",
"
0
\n",
"
Mainz 05
\n",
"
81919
\n",
"
Rebound
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
13
\n",
"
90.376002
\n",
"
48.868002
\n",
"
0
\n",
"
Darmstadt
\n",
"
2016-03-06 18:30:00
\n",
"
a
\n",
"
0
\n",
"
Mainz 05
\n",
"
81918
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
14
\n",
"
100.463998
\n",
"
32.527999
\n",
"
2
\n",
"
VfB Stuttgart
\n",
"
2016-04-02 17:30:00
\n",
"
h
\n",
"
2
\n",
"
Darmstadt
\n",
"
82623
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
15
\n",
"
91.416002
\n",
"
39.215999
\n",
"
1
\n",
"
Darmstadt
\n",
"
2016-04-23 17:30:00
\n",
"
a
\n",
"
4
\n",
"
FC Cologne
\n",
"
83244
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
16
\n",
"
77.480000
\n",
"
43.700000
\n",
"
0
\n",
"
Werder Bremen
\n",
"
2016-08-26 22:30:00
\n",
"
a
\n",
"
6
\n",
"
Bayern Munich
\n",
"
127431
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
17
\n",
"
103.063998
\n",
"
46.967999
\n",
"
0
\n",
"
Werder Bremen
\n",
"
2016-08-26 22:30:00
\n",
"
a
\n",
"
6
\n",
"
Bayern Munich
\n",
"
127447
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
98.383998
\n",
"
35.415999
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-08-22 17:30:00
\n",
"
a
\n",
"
1
\n",
"
Schalke 04
\n",
"
76326
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
97.032003
\n",
"
34.352001
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-08-22 17:30:00
\n",
"
a
\n",
"
1
\n",
"
Schalke 04
\n",
"
76330
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
99.007997
\n",
"
32.907999
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-09-12 17:30:00
\n",
"
a
\n",
"
0
\n",
"
Bayer Leverkusen
\n",
"
76809
\n",
"
Aerial
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
91.312003
\n",
"
37.467999
\n",
"
2
\n",
"
Darmstadt
\n",
"
2015-09-27 19:30:00
\n",
"
a
\n",
"
2
\n",
"
Borussia Dortmund
\n",
"
77459
\n",
"
None
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
5
\n",
"
95.056002
\n",
"
44.384001
\n",
"
0
\n",
"
Darmstadt
\n",
"
2015-11-01 18:30:00
\n",
"
a
\n",
"
2
\n",
"
VfB Stuttgart
\n",
"
78477
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
6
\n",
"
94.223998
\n",
"
43.547999
\n",
"
0
\n",
"
Darmstadt
\n",
"
2015-11-01 18:30:00
\n",
"
a
\n",
"
2
\n",
"
VfB Stuttgart
\n",
"
78498
\n",
"
Chipped
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
7
\n",
"
92.143998
\n",
"
41.495999
\n",
"
1
\n",
"
Hamburger SV
\n",
"
2015-11-07 21:30:00
\n",
"
h
\n",
"
1
\n",
"
Darmstadt
\n",
"
78845
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
8
\n",
"
99.320000
\n",
"
36.404001
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-11-22 20:40:00
\n",
"
a
\n",
"
3
\n",
"
Ingolstadt
\n",
"
79074
\n",
"
Cross
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
9
\n",
"
94.536002
\n",
"
37.315999
\n",
"
1
\n",
"
Darmstadt
\n",
"
2015-12-06 20:30:00
\n",
"
a
\n",
"
0
\n",
"
Eintracht Frankfurt
\n",
"
79498
\n",
"
Cross
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
9
\n",
"
83.096002
\n",
"
27.207999
\n",
"
2
\n",
"
Villarreal
\n",
"
2018-11-11 17:30:00
\n",
"
a
\n",
"
2
\n",
"
Rayo Vallecano
\n",
"
240027
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
10
\n",
"
76.543998
\n",
"
38.835999
\n",
"
1
\n",
"
Real Betis
\n",
"
2018-11-25 19:45:00
\n",
"
h
\n",
"
2
\n",
"
Villarreal
\n",
"
241740
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
11
\n",
"
93.287997
\n",
"
35.112001
\n",
"
1
\n",
"
Real Betis
\n",
"
2018-11-25 19:45:00
\n",
"
h
\n",
"
2
\n",
"
Villarreal
\n",
"
241746
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
12
\n",
"
92.767997
\n",
"
24.927999
\n",
"
1
\n",
"
Real Betis
\n",
"
2018-11-25 19:45:00
\n",
"
h
\n",
"
2
\n",
"
Villarreal
\n",
"
241754
\n",
"
Pass
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
13
\n",
"
88.712003
\n",
"
24.624001
\n",
"
1
\n",
"
Real Betis
\n",
"
2018-11-25 19:45:00
\n",
"
h
\n",
"
2
\n",
"
Villarreal
\n",
"
241757
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
14
\n",
"
95.887997
\n",
"
24.016000
\n",
"
0
\n",
"
Villarreal
\n",
"
2018-12-02 17:30:00
\n",
"
a
\n",
"
2
\n",
"
Barcelona
\n",
"
243092
\n",
"
Rebound
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
15
\n",
"
81.120000
\n",
"
29.944001
\n",
"
0
\n",
"
Villarreal
\n",
"
2018-12-02 17:30:00
\n",
"
a
\n",
"
2
\n",
"
Barcelona
\n",
"
243094
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
0
\n",
"
79.143998
\n",
"
40.127999
\n",
"
2
\n",
"
Fulham
\n",
"
2018-10-20 14:00:00
\n",
"
h
\n",
"
4
\n",
"
Cardiff
\n",
"
234806
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
0
\n",
"
96.303998
\n",
"
35.112001
\n",
"
2
\n",
"
Udinese
\n",
"
2018-10-28 13:00:00
\n",
"
h
\n",
"
2
\n",
"
Genoa
\n",
"
236059
\n",
"
Cross
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
95.887997
\n",
"
37.467999
\n",
"
0
\n",
"
Genoa
\n",
"
2018-11-03 14:00:00
\n",
"
a
\n",
"
5
\n",
"
Inter
\n",
"
236765
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
93.912003
\n",
"
42.484001
\n",
"
2
\n",
"
Napoli
\n",
"
2018-11-10 19:30:00
\n",
"
h
\n",
"
1
\n",
"
Genoa
\n",
"
239498
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
84.343998
\n",
"
35.415999
\n",
"
1
\n",
"
Sampdoria
\n",
"
2018-11-25 19:30:00
\n",
"
h
\n",
"
1
\n",
"
Genoa
\n",
"
241719
\n",
"
Rebound
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
4
\n",
"
96.927997
\n",
"
36.632001
\n",
"
1
\n",
"
Sampdoria
\n",
"
2018-11-25 19:30:00
\n",
"
h
\n",
"
1
\n",
"
Genoa
\n",
"
241725
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
5
\n",
"
92.456002
\n",
"
40.964001
\n",
"
1
\n",
"
Genoa
\n",
"
2018-12-02 14:00:00
\n",
"
a
\n",
"
2
\n",
"
Torino
\n",
"
242821
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
6
\n",
"
83.407997
\n",
"
36.555999
\n",
"
1
\n",
"
Genoa
\n",
"
2018-12-02 14:00:00
\n",
"
a
\n",
"
2
\n",
"
Torino
\n",
"
242824
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
0
\n",
"
85.383998
\n",
"
42.560000
\n",
"
0
\n",
"
Toulouse
\n",
"
2018-11-24 16:00:00
\n",
"
a
\n",
"
1
\n",
"
Paris Saint Germain
\n",
"
240889
\n",
"
BallRecovery
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
0
\n",
"
92.352003
\n",
"
43.320000
\n",
"
0
\n",
"
Monaco
\n",
"
2018-11-03 19:00:00
\n",
"
h
\n",
"
1
\n",
"
Reims
\n",
"
237309
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
84.967997
\n",
"
35.264001
\n",
"
1
\n",
"
Guingamp
\n",
"
2018-11-24 19:00:00
\n",
"
h
\n",
"
2
\n",
"
Reims
\n",
"
241121
\n",
"
Chipped
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
0
\n",
"
98.487997
\n",
"
33.440000
\n",
"
0
\n",
"
Real Valladolid
\n",
"
2018-11-25 15:15:00
\n",
"
a
\n",
"
1
\n",
"
Sevilla
\n",
"
241542
\n",
"
Aerial
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
0
\n",
"
87.880000
\n",
"
22.800000
\n",
"
1
\n",
"
Wolfsburg
\n",
"
2018-11-09 19:30:00
\n",
"
a
\n",
"
2
\n",
"
Hannover 96
\n",
"
238742
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
0
\n",
"
88.296002
\n",
"
38.075999
\n",
"
4
\n",
"
Paris Saint Germain
\n",
"
2018-11-11 20:00:00
\n",
"
h
\n",
"
0
\n",
"
Monaco
\n",
"
240130
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
0
\n",
"
98.487997
\n",
"
40.584001
\n",
"
1
\n",
"
Eintracht Frankfurt
\n",
"
2018-10-28 11:30:00
\n",
"
h
\n",
"
1
\n",
"
Nuernberg
\n",
"
235992
\n",
"
Chipped
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
91.103998
\n",
"
42.787999
\n",
"
2
\n",
"
Nuernberg
\n",
"
2018-11-03 14:30:00
\n",
"
a
\n",
"
2
\n",
"
Augsburg
\n",
"
236785
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
101.192003
\n",
"
24.395999
\n",
"
2
\n",
"
Nuernberg
\n",
"
2018-11-03 14:30:00
\n",
"
a
\n",
"
2
\n",
"
Augsburg
\n",
"
236802
\n",
"
Chipped
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
96.303998
\n",
"
37.620000
\n",
"
2
\n",
"
Nuernberg
\n",
"
2018-11-03 14:30:00
\n",
"
a
\n",
"
2
\n",
"
Augsburg
\n",
"
236807
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
100.152003
\n",
"
36.327999
\n",
"
2
\n",
"
Nuernberg
\n",
"
2018-11-24 17:30:00
\n",
"
a
\n",
"
5
\n",
"
Schalke 04
\n",
"
241048
\n",
"
Rebound
\n",
"
...
\n",
"
True
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
0
\n",
"
88.503998
\n",
"
57.075999
\n",
"
1
\n",
"
Atletico Madrid
\n",
"
2018-12-02 15:15:00
\n",
"
h
\n",
"
1
\n",
"
Girona
\n",
"
242927
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
0
\n",
"
96.303998
\n",
"
34.504001
\n",
"
3
\n",
"
Fortuna Duesseldorf
\n",
"
2018-11-24 14:30:00
\n",
"
a
\n",
"
3
\n",
"
Bayern Munich
\n",
"
240596
\n",
"
None
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
78.936002
\n",
"
38.987999
\n",
"
1
\n",
"
Mainz 05
\n",
"
2018-11-30 19:30:00
\n",
"
h
\n",
"
0
\n",
"
Fortuna Duesseldorf
\n",
"
241967
\n",
"
Pass
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
"
\n",
"
2
\n",
"
99.112003
\n",
"
49.020000
\n",
"
1
\n",
"
Mainz 05
\n",
"
2018-11-30 19:30:00
\n",
"
h
\n",
"
0
\n",
"
Fortuna Duesseldorf
\n",
"
241976
\n",
"
Cross
\n",
"
...
\n",
"
False
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
225797 rows × 31 columns
\n",
""
],
"text/plain": [
" X Y a_goals a_team date \\\n",
"0 73.527997 28.804001 0 Hoffenheim 2015-08-29 17:30:00 \n",
"1 75.712003 28.347999 1 Darmstadt 2015-09-12 17:30:00 \n",
"3 91.000000 39.595999 2 Darmstadt 2015-12-20 20:30:00 \n",
"0 96.407997 42.332001 2 Werder Bremen 2014-12-07 16:30:00 \n",
"1 93.496002 45.447999 2 Darmstadt 2015-09-27 19:30:00 \n",
"2 101.920000 35.872001 2 Darmstadt 2015-10-17 17:30:00 \n",
"3 102.336002 36.175999 2 Darmstadt 2015-10-17 17:30:00 \n",
"4 92.040000 32.907999 0 Darmstadt 2015-11-01 18:30:00 \n",
"5 97.967997 48.260000 0 Darmstadt 2015-11-01 18:30:00 \n",
"6 99.527997 47.575999 1 Hamburger SV 2015-11-07 21:30:00 \n",
"7 94.640000 46.284001 1 Darmstadt 2015-12-06 20:30:00 \n",
"8 95.367997 40.280000 1 Darmstadt 2015-12-06 20:30:00 \n",
"9 97.032003 42.787999 4 Hertha Berlin 2015-12-12 18:30:00 \n",
"10 96.927997 36.404001 2 Darmstadt 2015-12-20 20:30:00 \n",
"11 94.847997 46.512001 2 Bayer Leverkusen 2016-02-13 18:30:00 \n",
"12 90.376002 48.868002 0 Darmstadt 2016-03-06 18:30:00 \n",
"13 90.376002 48.868002 0 Darmstadt 2016-03-06 18:30:00 \n",
"14 100.463998 32.527999 2 VfB Stuttgart 2016-04-02 17:30:00 \n",
"15 91.416002 39.215999 1 Darmstadt 2016-04-23 17:30:00 \n",
"16 77.480000 43.700000 0 Werder Bremen 2016-08-26 22:30:00 \n",
"17 103.063998 46.967999 0 Werder Bremen 2016-08-26 22:30:00 \n",
"1 98.383998 35.415999 1 Darmstadt 2015-08-22 17:30:00 \n",
"2 97.032003 34.352001 1 Darmstadt 2015-08-22 17:30:00 \n",
"3 99.007997 32.907999 1 Darmstadt 2015-09-12 17:30:00 \n",
"4 91.312003 37.467999 2 Darmstadt 2015-09-27 19:30:00 \n",
"5 95.056002 44.384001 0 Darmstadt 2015-11-01 18:30:00 \n",
"6 94.223998 43.547999 0 Darmstadt 2015-11-01 18:30:00 \n",
"7 92.143998 41.495999 1 Hamburger SV 2015-11-07 21:30:00 \n",
"8 99.320000 36.404001 1 Darmstadt 2015-11-22 20:40:00 \n",
"9 94.536002 37.315999 1 Darmstadt 2015-12-06 20:30:00 \n",
".. ... ... ... ... ... \n",
"9 83.096002 27.207999 2 Villarreal 2018-11-11 17:30:00 \n",
"10 76.543998 38.835999 1 Real Betis 2018-11-25 19:45:00 \n",
"11 93.287997 35.112001 1 Real Betis 2018-11-25 19:45:00 \n",
"12 92.767997 24.927999 1 Real Betis 2018-11-25 19:45:00 \n",
"13 88.712003 24.624001 1 Real Betis 2018-11-25 19:45:00 \n",
"14 95.887997 24.016000 0 Villarreal 2018-12-02 17:30:00 \n",
"15 81.120000 29.944001 0 Villarreal 2018-12-02 17:30:00 \n",
"0 79.143998 40.127999 2 Fulham 2018-10-20 14:00:00 \n",
"0 96.303998 35.112001 2 Udinese 2018-10-28 13:00:00 \n",
"1 95.887997 37.467999 0 Genoa 2018-11-03 14:00:00 \n",
"2 93.912003 42.484001 2 Napoli 2018-11-10 19:30:00 \n",
"3 84.343998 35.415999 1 Sampdoria 2018-11-25 19:30:00 \n",
"4 96.927997 36.632001 1 Sampdoria 2018-11-25 19:30:00 \n",
"5 92.456002 40.964001 1 Genoa 2018-12-02 14:00:00 \n",
"6 83.407997 36.555999 1 Genoa 2018-12-02 14:00:00 \n",
"0 85.383998 42.560000 0 Toulouse 2018-11-24 16:00:00 \n",
"0 92.352003 43.320000 0 Monaco 2018-11-03 19:00:00 \n",
"1 84.967997 35.264001 1 Guingamp 2018-11-24 19:00:00 \n",
"0 98.487997 33.440000 0 Real Valladolid 2018-11-25 15:15:00 \n",
"0 87.880000 22.800000 1 Wolfsburg 2018-11-09 19:30:00 \n",
"0 88.296002 38.075999 4 Paris Saint Germain 2018-11-11 20:00:00 \n",
"0 98.487997 40.584001 1 Eintracht Frankfurt 2018-10-28 11:30:00 \n",
"1 91.103998 42.787999 2 Nuernberg 2018-11-03 14:30:00 \n",
"2 101.192003 24.395999 2 Nuernberg 2018-11-03 14:30:00 \n",
"3 96.303998 37.620000 2 Nuernberg 2018-11-03 14:30:00 \n",
"4 100.152003 36.327999 2 Nuernberg 2018-11-24 17:30:00 \n",
"0 88.503998 57.075999 1 Atletico Madrid 2018-12-02 15:15:00 \n",
"0 96.303998 34.504001 3 Fortuna Duesseldorf 2018-11-24 14:30:00 \n",
"1 78.936002 38.987999 1 Mainz 05 2018-11-30 19:30:00 \n",
"2 99.112003 49.020000 1 Mainz 05 2018-11-30 19:30:00 \n",
"\n",
" h_a h_goals h_team id lastAction ... \\\n",
"0 h 0 Darmstadt 76737 Aerial ... \n",
"1 a 0 Bayer Leverkusen 76808 Pass ... \n",
"3 a 3 Borussia M.Gladbach 79876 Aerial ... \n",
"0 a 5 Eintracht Frankfurt 27374 Pass ... \n",
"1 a 2 Borussia Dortmund 77444 None ... \n",
"2 a 0 Augsburg 78030 None ... \n",
"3 a 0 Augsburg 78029 Rebound ... \n",
"4 a 2 VfB Stuttgart 78500 Cross ... \n",
"5 a 2 VfB Stuttgart 78501 Aerial ... \n",
"6 h 1 Darmstadt 78849 Aerial ... \n",
"7 a 0 Eintracht Frankfurt 79502 Cross ... \n",
"8 a 0 Eintracht Frankfurt 79514 Cross ... \n",
"9 h 0 Darmstadt 79822 Aerial ... \n",
"10 a 3 Borussia M.Gladbach 79863 Cross ... \n",
"11 h 1 Darmstadt 80950 Aerial ... \n",
"12 a 0 Mainz 05 81919 Rebound ... \n",
"13 a 0 Mainz 05 81918 Aerial ... \n",
"14 h 2 Darmstadt 82623 Cross ... \n",
"15 a 4 FC Cologne 83244 None ... \n",
"16 a 6 Bayern Munich 127431 None ... \n",
"17 a 6 Bayern Munich 127447 Cross ... \n",
"1 a 1 Schalke 04 76326 Aerial ... \n",
"2 a 1 Schalke 04 76330 Aerial ... \n",
"3 a 0 Bayer Leverkusen 76809 Aerial ... \n",
"4 a 2 Borussia Dortmund 77459 None ... \n",
"5 a 2 VfB Stuttgart 78477 Aerial ... \n",
"6 a 2 VfB Stuttgart 78498 Chipped ... \n",
"7 h 1 Darmstadt 78845 None ... \n",
"8 a 3 Ingolstadt 79074 Cross ... \n",
"9 a 0 Eintracht Frankfurt 79498 Cross ... \n",
".. .. ... ... ... ... ... \n",
"9 a 2 Rayo Vallecano 240027 Pass ... \n",
"10 h 2 Villarreal 241740 None ... \n",
"11 h 2 Villarreal 241746 Pass ... \n",
"12 h 2 Villarreal 241754 Pass ... \n",
"13 h 2 Villarreal 241757 Pass ... \n",
"14 a 2 Barcelona 243092 Rebound ... \n",
"15 a 2 Barcelona 243094 Pass ... \n",
"0 h 4 Cardiff 234806 Pass ... \n",
"0 h 2 Genoa 236059 Cross ... \n",
"1 a 5 Inter 236765 Aerial ... \n",
"2 h 1 Genoa 239498 Cross ... \n",
"3 h 1 Genoa 241719 Rebound ... \n",
"4 h 1 Genoa 241725 Aerial ... \n",
"5 a 2 Torino 242821 Cross ... \n",
"6 a 2 Torino 242824 Aerial ... \n",
"0 a 1 Paris Saint Germain 240889 BallRecovery ... \n",
"0 h 1 Reims 237309 Pass ... \n",
"1 h 2 Reims 241121 Chipped ... \n",
"0 a 1 Sevilla 241542 Aerial ... \n",
"0 a 2 Hannover 96 238742 Pass ... \n",
"0 h 0 Monaco 240130 None ... \n",
"0 h 1 Nuernberg 235992 Chipped ... \n",
"1 a 2 Augsburg 236785 Cross ... \n",
"2 a 2 Augsburg 236802 Chipped ... \n",
"3 a 2 Augsburg 236807 Pass ... \n",
"4 a 5 Schalke 04 241048 Rebound ... \n",
"0 h 1 Girona 242927 Pass ... \n",
"0 a 3 Bayern Munich 240596 None ... \n",
"1 h 0 Fortuna Duesseldorf 241967 Pass ... \n",
"2 h 0 Fortuna Duesseldorf 241976 Cross ... \n",
"\n",
" goal isDirectFreekick isFromCorner isOpenPlay isPenalty isSetPiece \\\n",
"0 False 0 1 0 0 0 \n",
"1 False 0 0 0 0 1 \n",
"3 False 0 1 0 0 0 \n",
"0 True 0 0 1 0 0 \n",
"1 False 0 0 0 0 1 \n",
"2 False 0 1 0 0 0 \n",
"3 False 0 1 0 0 0 \n",
"4 False 0 0 1 0 0 \n",
"5 False 0 1 0 0 0 \n",
"6 False 0 0 0 0 1 \n",
"7 False 0 1 0 0 0 \n",
"8 False 0 0 1 0 0 \n",
"9 False 0 1 0 0 0 \n",
"10 False 0 1 0 0 0 \n",
"11 False 0 0 1 0 0 \n",
"12 False 0 0 0 0 1 \n",
"13 False 0 0 0 0 1 \n",
"14 False 0 1 0 0 0 \n",
"15 False 0 1 0 0 0 \n",
"16 False 0 0 1 0 0 \n",
"17 False 0 1 0 0 0 \n",
"1 False 0 0 0 0 1 \n",
"2 False 0 1 0 0 0 \n",
"3 True 0 0 0 0 1 \n",
"4 True 0 0 0 0 1 \n",
"5 False 0 0 0 0 1 \n",
"6 False 0 0 0 0 1 \n",
"7 False 0 0 1 0 0 \n",
"8 True 0 1 0 0 0 \n",
"9 True 0 0 0 0 1 \n",
".. ... ... ... ... ... ... \n",
"9 False 0 0 1 0 0 \n",
"10 False 0 0 1 0 0 \n",
"11 False 0 0 1 0 0 \n",
"12 True 0 0 1 0 0 \n",
"13 False 0 0 1 0 0 \n",
"14 False 0 0 1 0 0 \n",
"15 False 0 0 1 0 0 \n",
"0 False 0 0 1 0 0 \n",
"0 True 0 1 0 0 0 \n",
"1 False 0 1 0 0 0 \n",
"2 False 0 0 1 0 0 \n",
"3 False 0 0 1 0 0 \n",
"4 False 0 1 0 0 0 \n",
"5 False 0 1 0 0 0 \n",
"6 False 0 0 0 0 1 \n",
"0 False 0 0 1 0 0 \n",
"0 False 0 0 1 0 0 \n",
"1 True 0 0 1 0 0 \n",
"0 False 0 1 0 0 0 \n",
"0 False 0 0 1 0 0 \n",
"0 False 0 1 0 0 0 \n",
"0 True 0 0 1 0 0 \n",
"1 False 0 0 1 0 0 \n",
"2 False 0 0 0 0 1 \n",
"3 False 0 0 1 0 0 \n",
"4 True 0 0 1 0 0 \n",
"0 False 0 0 1 0 0 \n",
"0 False 0 0 0 0 1 \n",
"1 False 0 0 1 0 0 \n",
"2 False 0 1 0 0 0 \n",
"\n",
" fromHead fromLeftFoot fromOtherBodyPart fromRightFoot \n",
"0 0 0 0 1 \n",
"1 0 0 0 1 \n",
"3 1 0 0 0 \n",
"0 0 1 0 0 \n",
"1 0 0 0 1 \n",
"2 0 1 0 0 \n",
"3 0 1 0 0 \n",
"4 0 1 0 0 \n",
"5 1 0 0 0 \n",
"6 1 0 0 0 \n",
"7 1 0 0 0 \n",
"8 1 0 0 0 \n",
"9 1 0 0 0 \n",
"10 1 0 0 0 \n",
"11 1 0 0 0 \n",
"12 0 1 0 0 \n",
"13 0 0 0 1 \n",
"14 1 0 0 0 \n",
"15 0 1 0 0 \n",
"16 0 1 0 0 \n",
"17 0 1 0 0 \n",
"1 1 0 0 0 \n",
"2 1 0 0 0 \n",
"3 1 0 0 0 \n",
"4 0 0 0 1 \n",
"5 1 0 0 0 \n",
"6 1 0 0 0 \n",
"7 0 0 0 1 \n",
"8 1 0 0 0 \n",
"9 1 0 0 0 \n",
".. ... ... ... ... \n",
"9 0 1 0 0 \n",
"10 0 1 0 0 \n",
"11 0 1 0 0 \n",
"12 0 1 0 0 \n",
"13 0 1 0 0 \n",
"14 0 1 0 0 \n",
"15 0 1 0 0 \n",
"0 0 0 0 1 \n",
"0 1 0 0 0 \n",
"1 1 0 0 0 \n",
"2 1 0 0 0 \n",
"3 0 0 0 1 \n",
"4 1 0 0 0 \n",
"5 1 0 0 0 \n",
"6 0 1 0 0 \n",
"0 0 1 0 0 \n",
"0 0 1 0 0 \n",
"1 0 1 0 0 \n",
"0 1 0 0 0 \n",
"0 0 1 0 0 \n",
"0 0 0 0 1 \n",
"0 1 0 0 0 \n",
"1 0 1 0 0 \n",
"2 1 0 0 0 \n",
"3 0 1 0 0 \n",
"4 0 0 0 1 \n",
"0 0 0 0 1 \n",
"0 1 0 0 0 \n",
"1 0 0 0 1 \n",
"2 0 0 0 1 \n",
"\n",
"[225797 rows x 31 columns]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['X', 'Y', 'a_goals', 'a_team', 'date', 'h_a', 'h_goals', 'h_team', 'id',\n",
" 'lastAction', 'match_id', 'minute', 'player', 'player_assisted',\n",
" 'player_id', 'result', 'season', 'shotType', 'situation', 'distance',\n",
" 'angle', 'goal', 'isDirectFreekick', 'isFromCorner', 'isOpenPlay',\n",
" 'isPenalty', 'isSetPiece', 'fromHead', 'fromLeftFoot',\n",
" 'fromOtherBodyPart', 'fromRightFoot'],\n",
" dtype='object')"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oh Boi, that's a lot of columns. Not to worry though. We will not be using all of them. Only a subset, atleast for this model. This subset is stored in the cols list"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"cols = ['distance','angle','isDirectFreekick','isFromCorner','isOpenPlay','isPenalty','isSetPiece','fromHead','fromLeftFoot','fromOtherBodyPart','fromRightFoot','goal']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We create a new DataFrame by the name of shot to store the subset data from the selected columns."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"shot = df[cols]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The aim of this whole endeavour is to make a model which will output wether a shot is goal or not. Thus, the goal feature is taken as a series by the name of Y and the rest of the independent features are stored in the DataFrame X."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"X=shot.drop('goal',axis=1)\n",
"Y=shot['goal']"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['distance', 'angle', 'isDirectFreekick', 'isFromCorner', 'isOpenPlay',\n",
" 'isPenalty', 'isSetPiece', 'fromHead', 'fromLeftFoot',\n",
" 'fromOtherBodyPart', 'fromRightFoot'],\n",
" dtype='object')"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Y series currently consists of values in the form of True and False but we would like it to be a binary value i.e. 1 or 0. Thus, we use LabelEncoder available in the sklearn library."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import LabelEncoder\n",
"le = LabelEncoder()\n",
"Y=le.fit_transform(Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will have to construct both a train data set and test data set to check the validity of our model.\n",
"Thus, we break up our dataset into two parts in 70:30 ratio. 70% of shots are present in the training set and the rest 30% are used as an hold-out test set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### MODEL FITTING & TESTING"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lo and Behold, we are finally at the point to train the model. The first Model that I'm using here is Logistic Regression because it is realtively cheaper computationally."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"D:\\ProgramFiles\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n",
" \"This module will be removed in 0.20.\", DeprecationWarning)\n"
]
},
{
"data": {
"text/plain": [
"LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
" penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
" verbose=0, warm_start=False)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.cross_validation import train_test_split\n",
"\n",
"reg = LogisticRegression()\n",
"\n",
"X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=42)\n",
"\n",
"reg.fit(X_train,Y_train)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9048863300856215"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reg.score(X_test,Y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We obtain an accuracy of 90.4% which is decent considering the size of data. Moreover, we did not take a lot of features into account."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"90% sounds good but it's classifying wheteher a shot is a goal or not and the dataset is skewed towards having a lot more shots as not a goal in comaprisons to ones which actually ended up being a goal."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thus, judging the performance of a model based on just the accuracy is not a good practice. Let's look at the predictions themselves with the help of a confusion matrix."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[60210 442]\n",
" [ 6001 1087]]\n"
]
}
],
"source": [
"preds = reg.predict(X_test)\n",
"\n",
"from sklearn.metrics import confusion_matrix\n",
"confusion_matrix = confusion_matrix(Y_test,preds)\n",
"print(confusion_matrix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Confusion Matrix shows us Exactly how the model performed in terms of raw output. The top left value indicates shots that were rightly predicted as not goals. The righta are shots that were predicted as goals but were not.\n",
"\n",
"The second row contains shots that were classified as not goal but were infact, goals. On the right of it, we have the shots that were correctly classified as goals."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another tool to look at the performance of the model is an ROC curve. The area under ROC curve for our model is 80% which is quite good."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"