{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### European Soccer Leagues, Interactive Standings Visualization\n", "\n", "Love and Live the Game!\n", "\n", "1. Scrape the current table standings of each league (La Liga, Bundesliga, Premier League, Serie A, Ligue 1).\n", "2. Visualize the standings of each league in various interactive plots (bubble_2d, bubble_3d, boxplot, density).\n", "\n", "> To reproduce the plots you need an `api_key` to sign in to [Plotly](https://plot.ly/settings/api).\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Aziz\n", "Wed Dec 2 19:10:12 EST 2015\n" ] } ], "source": [ "%%bash\n", "whoami\n", "date" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Standings labels\n", "\n", "```\n", "P: Games Played\n", "W: Games Won\n", "D: Games Drawn\n", "L: Games Lost\n", "GS: Goals Scored\n", "GA: Goals Against\n", "Diff: Goals Difference\n", "Pts: Points\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Preview tables (sample) data**" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Premier League\n", " P W D L GS GA Diff Pts\n", "Team \n", "1-MCI 14 9 2 3 30 14 16 29\n", "2-LEI 14 8 5 1 29 21 8 29\n", "3-MUN 14 8 4 2 20 10 10 28\n", "4-ARS 14 8 3 3 24 12 12 27\n", "5-TOT 14 6 7 1 24 11 13 25\n", "\n", "Bundesliga\n", " P W D L GS GA Diff Pts\n", "Team \n", "1-BAY 14 13 1 0 42 5 37 40\n", "2-BVB 14 10 2 2 40 19 21 32\n", "3-WOB 14 7 4 3 23 15 8 25\n", "4-BMG 14 7 2 5 28 22 6 23\n", "5-HER 14 7 2 5 18 17 1 23\n", "\n", "Ligue 1\n", " P W D L GS GA Diff Pts\n", "Team \n", "1-PSG 16 13 3 0 37 8 29 42\n", "2-CAE 16 9 2 5 19 16 3 29\n", "3-ANG 16 7 6 3 14 9 5 27\n", "4-LYO 16 7 5 4 21 14 7 26\n", "5-NIC 16 7 4 5 30 19 11 25\n", "\n", "Serie A\n", " P W D L GS GA Diff Pts\n", "Team \n", "1-NAP 14 9 4 1 26 9 17 31\n", "2-INT 14 9 3 2 17 9 8 30\n", "3-FIO 14 9 2 3 27 12 15 29\n", "4-ROM 14 8 3 3 29 17 12 27\n", "5-JUV 14 7 3 4 20 11 9 24\n", "\n", "La Liga\n", " P W D L GS GA Diff Pts\n", "Team \n", "1-FCB 13 11 0 2 33 12 21 33\n", "2-ATM 13 9 2 2 18 6 12 29\n", "3-RMA 13 8 3 2 28 11 17 27\n", "4-CEL 13 7 3 3 24 21 3 24\n", "5-DEP 13 5 6 2 20 13 7 21\n", "\n" ] } ], "source": [ "for l, df in leagues.items():\n", " print(l)\n", " print(df.head())\n", " print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standings in an interactive `bubble_2d` plot" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_2d[0])" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_2d[1])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_2d[2])" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_2d[3])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_2d[4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standings in an interactive `bubble_3d` plot" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_3d[0])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_3d[1])" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_3d[2])" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_3d[3])" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_3d[4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### League in an interactive `boxplot` plot" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_box[0])" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_box[1])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_box[2])" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_box[3])" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot(figs_box[4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### League in an interactive `density` plot" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_kde[0])" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_kde[1])" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_kde[2])" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_kde[3])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "py.iplot_mpl(figs_kde[4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "# How?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Scrape the leagues standings data into DataFrames" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from bs4 import BeautifulSoup as bs\n", "import requests\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "urls = {\n", " 'La Liga' : 'http://www.goal.com/en/tables/primera-divisi%C3%B3n/7',\n", " 'Bundesliga' :'http://www.goal.com/en/tables/bundesliga/9?ICID=SP_TN_112',\n", " 'Premier League':'http://www.goal.com/en/tables/premier-league/8?ICID=TA',\n", " 'Serie A' :'http://www.goal.com/en/tables/serie-a/13?ICID=SP_TN_114',\n", " 'Ligue 1' :'http://www.goal.com/en/tables/ligue-1/16?ICID=SP_TN_114',\n", "}" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def scrape_table(url):\n", " '''input: league url, return: a list of teams' standings list '''\n", " \n", " data = requests.get(url).text\n", " so = bs(data)\n", " table = so.find('table', class_='short')\n", " standings = table.findChild('tbody')\n", " teams_html = standings.findAll('tr')\n", " \n", " teams = []\n", " for i, team in enumerate(teams_html):\n", " t = []\n", " for d in team.findChildren('td'):\n", " data = str(d.text.strip().encode('ascii', 'ignore'))\n", " # aggregate a team standings\n", " t.append(data)\n", " # remove empty string from the standings list\n", " t = [x for x in t if x]\n", " # add team standings into a list\n", " teams.append(t)\n", "\n", " return teams" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def to_df(teams):\n", " \"\"\"create dataframe from the teams' standings lists\"\"\"\n", "\n", " cols = ['pos','full_name', 'Team', 'PtsF', 'P', 'W', 'D', 'L', 'WH','DH', 'LH', 'WA','DA','LA', 'GS', 'GA', 'Diff', 'Pts']\n", " df = pd.DataFrame(columns=cols)\n", "\n", " for i, team in enumerate(teams):\n", " df.loc[i] = team\n", " \n", " return df" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def remove_cols(df):\n", " # remove un-needed cols\n", " useless = ['pos', 'full_name', 'PtsF', 'WH', 'WA', 'DH', 'DA', 'LH', 'LA'] #, 'diff']\n", " for u in useless:\n", " del df['{}'.format(u)]\n", "\n", "def apply_int(df):\n", " # convert cols type from str to int (for plotting)\n", " for c in df.columns:\n", " df[c] = df[c].apply(int)\n", " return df" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def league_df(url):\n", " \"\"\"return {league : dataframe_table}\"\"\"\n", " \n", " teams = scrape_table(url)\n", "\n", " df = to_df(teams)\n", "\n", " # concate 'position' and 'team'\n", " df['Team'] = ['{}-{}'.format(p, t) for p, t in zip(df['pos'], df['Team'])]\n", "\n", " # remove un-usefull columns\n", " remove_cols(df)\n", "\n", " # set team name as the df index\n", " df = df.set_index('Team')\n", "\n", " # set columns to int values\n", " df = apply_int(df)\n", " \n", " return df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Collect data in a dict as `{league : its_table_data_frame}`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "leagues = {}\n", "for league, url in urls.items():\n", " df = league_df(url)\n", " leagues[league] = df" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Premier League', 'Bundesliga', 'Ligue 1', 'Serie A', 'La Liga']\n" ] } ], "source": [ "print(leagues.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Generate Interactive Plots of league tables (using Plotly)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import matplotlib\n", "matplotlib.style.use('ggplot')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import plotly.plotly as py\n", "import plotly.graph_objs as go\n", "py.sign_in('username', 'api_key')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bubble_2D" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# ref: https://plot.ly/python/matplotlib-to-plotly-tutorial/#Bubble-Charts\n", "\n", "def bubble_2d(df, league='Soccer League'):\n", " \n", " mpl_fig = plt.figure() # (!) set new mpl figure object\n", " ax = mpl_fig.add_subplot(111) # add axis\n", "\n", " plt.xlabel('Points')\n", " plt.ylabel('Goals Scored')\n", " plt.title(league)\n", "\n", " scatter = ax.scatter(\n", " df['Pts'],\n", " df['GS'],\n", " c=df['GS'], # using some color scale\n", " s=np.sqrt(df['Pts']**5),\n", " linewidths=2,\n", " edgecolor='w',\n", " alpha=0.6\n", " )\n", "\n", " for i_X, X in df.iterrows():\n", " plt.text(\n", " X['Pts'],\n", " X['GS'],\n", " i_X, # team name\n", " size=8,\n", " horizontalalignment='center'\n", " )\n", " return mpl_fig\n", "\n", "# # Test\n", "# fig = bubble_2d(df, league)\n", "# py.iplot_mpl(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Collect leagues' bubble_2d figures" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [], "source": [ "figs_2d = []\n", "for l, d in leagues.items():\n", " fig = bubble_2d(d, l)\n", " figs_2d.append(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## bubble_3D" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# https://plot.ly/~jorgesantos/402/cufflinks-bubble-3d-chart/\n", "\n", "def bubble_3d(df, league='Soccer League'):\n", " \n", " traces = []\n", "\n", " for row in df.iterrows():\n", " \n", " team, score = row\n", " \n", " trace = go.Scatter3d(\n", " x= score.GA,\n", " y= score.GS,\n", " z= score.Pts,\n", " \n", " marker= go.Marker(\n", " line=go.Line(\n", " width=0.5\n", " ),\n", " size= score.Pts * 1.5, # [bubble size],\n", " symbol='dot'\n", " ),\n", " opacity=0.7,\n", " mode='markers',\n", " name=team,\n", " text= team, # [team names]\n", " )\n", " # add team's Scatter3d trace to list of Data\n", " traces.append(trace)\n", "\n", " data = go.Data(traces)\n", "\n", " layout = go.Layout(\n", " scene=go.Scene(\n", " xaxis=go.XAxis(\n", " title='Goals Against (x)',\n", " ),\n", " yaxis=go.YAxis(\n", " title='Goals Scored (y)',\n", " ),\n", " zaxis=go.ZAxis(\n", " title='Points (z)'\n", " ),\n", " ),\n", " title=league\n", " )\n", "\n", " fig = go.Figure(data=data, layout=layout)\n", " return fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Collect leagues' bubble_3d figures" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "figs_3d = []\n", "for l, d in leagues.items():\n", " fig = bubble_3d(d, l)\n", " figs_3d.append(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Boxplot" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# ref: https://plot.ly/python/box-plots/\n", "\n", "def boxplot(df, league='Soccer League'):\n", "\n", " traces = []\n", " for c in [a for a in df.columns if a is not 'P']:\n", " trace = go.Box(\n", " y = df[c].values,\n", " name = c,\n", " )\n", " traces.append(trace)\n", " data = go.Data(traces)\n", " layout = go.Layout(\n", " title=league\n", " )\n", " fig = go.Figure(data=data, layout=layout)\n", " return fig\n", "\n", "# # TEST\n", "# fig = boxplot(df, 'La Liga')\n", "# py.iplot(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Collect leagues' boxplot figures" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "figs_box = []\n", "for l, d in leagues.items():\n", " fig = boxplot(d, l)\n", " figs_box.append(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Densities" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def density(df, league):\n", " fig, ax = plt.subplots()\n", " cols = [c for c in df.columns if c is not 'P']\n", " df = df[cols]\n", " df.plot(kind='kde', ax=ax, title=league)\n", " return fig\n", "\n", "# # Test\n", "# fig = density(df, league)\n", "# py.iplot_mpl(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Collect leagues' density figures" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [], "source": [ "figs_kde = []\n", "for l, d in leagues.items():\n", " fig = density(d, l)\n", " figs_kde.append(fig)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }