{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ranking Algorithm\n", "\n", "In this notebook, Markov Chain is used to rank the teams that played in EPL 2016-2017.\n", "\n", "The algorithm used discussed in --\n", "\n", "```\n", "ColumbiaX: Machine Learning\n", "Lecture 20\n", "Prof. John Paisley\n", "Department of Electrical Engineering\n", "& Data Science Institute\n", "Columbia University\n", "```\n", "\n", "## Dataset \n", "\n", "Data is downloaded from [here](http://www.football-data.co.uk/englandm.php). The key for the dataset is [here](http://www.football-data.co.uk/notes.txt)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Import the required modules\n", "from __future__ import division\n", "import numpy as np\n", "import csv" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "file_name = \"../data/ranking-algorithm/epl_16_17.csv\"\n", "x = [];\n", "# Read the CSV files\n", "with open(file_name, 'rb') as csvfile:\n", " csvreader = csv.reader(csvfile, delimiter=',')\n", " next(csvreader)\n", " for row in csvreader:\n", " x.append([row[2], row[3], int(row[4]), int(row[5])])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# The team names\n", "teams = {x: i for i, x in enumerate({match[0] for match in x})}\n", "teams_rev = {v: k for k, v in teams.items()} \n", "# Convert into nice numpy array\n", "x = np.array([[teams[match[0]], teams[match[1]], match[2], match[3]] for match in x], dtype=np.int32)\n", "no_teams = len(teams)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Prepare the Transition matrix\n", "trans = np.zeros((no_teams, no_teams), dtype=np.float32)\n", "for match_id in xrange(x.shape[0]):\n", " i, j = x[match_id][0], x[match_id][1]\n", " i_score, j_score = x[match_id][2], x[match_id][3]\n", " den = i_score + j_score\n", " if den > 0:\n", " trans[i, i] += (i_score > j_score) + i_score / den\n", " trans[j, j] += (i_score < j_score) + j_score / den\n", " trans[i, j] += (i_score < j_score) + j_score / den\n", " trans[j, i] += (i_score > j_score) + i_score / den" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Normalize the transition matrix\n", "norm = np.sum(trans, axis=1) \n", "trans_norm = trans / np.expand_dims(norm, axis=0)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Perform the eigenvalue decomposition of the transition matrix\n", "w, v = np.linalg.eig(trans_norm.T)\n", "# Normalize the 1st eigenvector that corresponds to eigenvalue = 1\n", "s_d = v[:, 0].real / np.sum(v[:, 0].real)\n", "# Sort s_d\n", "sorted_ranking = np.argsort(s_d)[::-1]\n", "# Prepare a list to display \n", "best_teams = [(teams_rev[i], s_d[i]) for i in sorted_ranking]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The rankings of the teams are\n", "('Leicester', 0.13876668)\n", "('Arsenal', 0.098007552)\n", "('Tottenham', 0.086899467)\n", "('West Ham', 0.077981375)\n", "('Man United', 0.074728511)\n", "('Southampton', 0.064768665)\n", "('Man City', 0.062873565)\n", "('Liverpool', 0.056575444)\n", "('Chelsea', 0.044441242)\n", "('Stoke', 0.036718827)\n", "('Swansea', 0.035415944)\n", "('West Brom', 0.033552695)\n", "('Everton', 0.03319782)\n", "('Watford', 0.026719317)\n", "('Bournemouth', 0.026716128)\n", "('Newcastle', 0.026127573)\n", "('Crystal Palace', 0.024857888)\n", "('Sunderland', 0.022549277)\n", "('Norwich', 0.019476177)\n", "('Aston Villa', 0.009625854)\n" ] } ], "source": [ "print(\"The rankings of the teams are\")\n", "for team in best_teams:\n", " print team" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the possible reason for City below United and Southampton might be that City performed better against the good teams." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }