{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note: Github is having trouble rendering some of the LaTeX formulas and equation numbers. Please view in nbviewer.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Colley's Matrix Method\n",
    "\n",
    "## Introduction\n",
    "\n",
    "Welcome! This is the first in a two part series on Colley's Matrix Method for creating a resume rating system. In this first part we will gain access to Colley's brilliantly clean way to rate the resume of every FBS team in college football. Any system that attempts to rank all 130 FBS teams will have it's problems, but I believe that this is the best system that can be created under Colley's strict rules for keeping our resume ratings unbiased. \n",
    "\n",
    "In the second part, we'll have some fun and break most of Colley's rules. A resume rating simply attempts to measure what every team has *achieved* relative to one another. It is not concerned with being predictive. In the second part we will introduce some simple priors and hyperparameters to move the resume ratings in the direction of power ratings -- ones that better match common knowledge about the wide range of team ability in college football."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Background\n",
    "\n",
    "The Colley Matrix Method is a resume rating system that was a part of the Official Bowl Championship Series Ranking from 2001 to 2013.  To be a good *resume rating system* means a few things to Colley, as he explains [here](https://www.colleyrankings.com/matrate.pdf):\n",
    "- eliminates any bias toward conference, history or tradition,\n",
    "- eliminates the need to invoke some ad hoc means of deflating runaway scores, and\n",
    "- eliminates any other ad hoc adjustments, such as home/away tweaks.\n",
    "\n",
    "\n",
    "Following these self imposed restrictions, Colley begins by giving every FBS team a starting rating of 1/2 and by taking into account only their wins and losses will arrive at their final rating. What makes his method so powerful is that he uses a simple mathematical method to account for the fact that we don't just want to look at a team's win percentage to rate them. We want to account for *strength of schedule*, the ability of teams played on a given schedule. \n",
    "\n",
    "Consider teams A,B and C. If *A* beats *B*, and *B* beats *C*, then we would say that *A* has a transitive win over *C*. It is natural to want to consider transitive wins when ranking teams, because beating a team with a winning record is better than beating a team with no wins at all. In a system that only cares about wins and losses, strength of schedule is simply a proper valuation of transitive wins and losses.  Colley found a way to account for strength of schedule by looking at the complete college picture of who beat whom and who lost to whom. What's amazing is that what sounds like a complicated spider-web of tracing these transitive wins and losses can be completly encapsulated into a simple formula."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A Little Math\n",
    "\n",
    "Colley does an excellent job describing his system and its motivation [here](https://www.colleyrankings.com/matrate.pdf), which I will now abbreviate. This derivation will be very important to us in the second part of this series. Lets consider the following simple rating for a team,\n",
    "\n",
    "\\begin{equation} r = \\frac{1 + n_w}{2+n_{tot}} \n",
    "\\end{equation}\n",
    "\n",
    "where $n_w$ is their number of wins, and $n_{tot}$ is the number of games they have played. Notice that a team's rating must be between 0 and 1. A team that has played no games begins with a rating of $ r = \\frac{1+0}{2+0} = \\frac{1}{2}$. If they play 10 games in a season and win 7 they will have a rating of $ r = \\frac{1 + 7}{2+10} = \\frac{2}{3}$. This seems reasonable. Now it just takes a moderate amount of algebra to account for strength of schedule. Let's multiply both sides by the denominator. \n",
    "\n",
    "\\begin{equation} (2+ n_{tot}) r = 1 + n_w \n",
    "\\end{equation}\n",
    "\n",
    "It wouldn't be algebra without a clever identity, so let's add one now. \n",
    "\n",
    "\\begin{equation} n_w = \\frac{n_w - n_\\ell}{2} + \\frac{n_w + n_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "We've added a new symbol, $n_\\ell$, the number of losses. Let's now replace $n_w$ in equation (2) with what we have in equation (3) \n",
    "\n",
    "\\begin{equation} (2+ n_{tot}) r = 1 + \\frac{n_w - n_\\ell}{2} + \\frac{n_w + n_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "Let's move some stuff to the other side.\n",
    "\n",
    "\\begin{equation} (2+ n_{tot}) r - \\frac{n_w + n_\\ell}{2} = 1 + \\frac{n_w - n_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "Notice that every game is a win or a loss. $n_{tot} = n_w + n_\\ell$.\n",
    "\n",
    "\\begin{equation} (2+ n_{tot}) r - \\frac{n_{tot}}{2} = 1 + \\frac{n_w - n_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "Bring the $n_{tot}$ terms together on the left hand side.\n",
    "\n",
    "\\begin{equation} 2r + n_{tot}(r - \\frac{1}{2}) = 1 + \\frac{n_w - n_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "Now remember that multiplication is simply repeated addition. \n",
    "\n",
    "\\begin{equation} 2r + \\displaystyle\\sum^{n_{tot}}(r - \\frac{1}{2}) = 1 + \\frac{n_w - n_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "The $\\Sigma$ symbol means we will sum $n_{tot}$ times. \n",
    "\n",
    "*whew!* That was a lot of small steps that really added up. Lets take a step back and interpret our equation. If zero games have been played, everything goes away except for $ 2r = 1$ which recovers a rating of $\\frac{1}{2}$. As more games are played, the right hand side either increases or decreases by a half depending on if it is a win or a loss. In order to maintain equality, the rating r on the left hand side has to increase or decrease to match. \n",
    "\n",
    "Notice that the summation on the left hand side is over every game played. For every game we take the difference between the team's rating, *r*, and the average rating of an opponent, $\\frac{1}{2}$. Colley's insight was that instead of taking the difference from the *average* rating, we can actually take the difference from the rating of the teams they have played. In order to do this we need a little more notation. Adding a superscript $i$ will denote that a given symbol pertains to team *i*.\n",
    "\n",
    "\\begin{equation} 2r^i + \\displaystyle\\sum^{n^i_{tot}}(r^i - \\frac{1}{2}) = 1 + \\frac{n^i_w - n^i_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "Lets use a subscript of *j* for each team played by team *i*. Then $r^i_j$ is rating of the $j^{th}$ team played by team *i*. Let's replace the $\\frac{1}{2}$ term with these $r^i_j$.\n",
    "\n",
    "\\begin{equation} 2r^i + \\displaystyle\\sum_{j=1}^{n^i_{tot}}(r^i - r^i_j) = 1 + \\frac{n^i_w - n^i_\\ell}{2} \n",
    "\\end{equation}\n",
    "\n",
    "Every team will have one of these equations, so we can package the whole system as a matrix equation.\n",
    "\n",
    "\\begin{gather}\n",
    "    \\begin{bmatrix}\n",
    "       2+n^1_{tot} & -n^{1,2} & \\ldots &  -n^{1,M} \\\\\n",
    "        -n^{2,1} & 2+n^2_{tot} & \\ldots &  -n^{2,M} \\\\ \n",
    "       \\vdots & \\vdots & \\ddots &  \\vdots \\\\\n",
    "       -n^{M,1} & -n^{M,2} & \\ldots  &2+n^2_{tot} \\\\ \n",
    "   \\end{bmatrix}\n",
    "   \\begin{bmatrix}\n",
    "       r^1 \\\\\n",
    "        r^2 \\\\ \n",
    "       \\vdots \\\\\n",
    "       r^M \\\\ \n",
    "   \\end{bmatrix}=\n",
    "   \\begin{bmatrix}\n",
    "       1 + \\frac{n^1_w - n^1_\\ell}{2} \\\\\n",
    "       1+ \\frac{n^2_w - n^2_\\ell}{2} \\\\ \n",
    "       \\vdots \\\\\n",
    "       1+ \\frac{n^M_w - n^M_\\ell}{2} \\\\ \n",
    "   \\end{bmatrix}\n",
    " \\end{gather}\n",
    " \n",
    " Here we assume that we have M teams. The diagonal counts 2 plus the number of games played by team *i*.  The off diagonal counts how many times team *i* has played team *j*. Note that $n^{i,j} = n^{j,i}$, so this matrix is symmetric. The *r* column vector has the ratings we want to calculate, and the column vector after the equals accounts for the total wins and losses. Now all we need to do is build this matrix and use a solver to get those ratings!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the matrix\n",
    "Finally some python! collegefootballdata.com has an excellent API for getting all of the games we want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>season</th>\n",
       "      <th>week</th>\n",
       "      <th>season_type</th>\n",
       "      <th>start_date</th>\n",
       "      <th>start_time_tbd</th>\n",
       "      <th>neutral_site</th>\n",
       "      <th>conference_game</th>\n",
       "      <th>attendance</th>\n",
       "      <th>venue_id</th>\n",
       "      <th>...</th>\n",
       "      <th>home_points</th>\n",
       "      <th>home_line_scores</th>\n",
       "      <th>home_post_win_prob</th>\n",
       "      <th>away_id</th>\n",
       "      <th>away_team</th>\n",
       "      <th>away_conference</th>\n",
       "      <th>away_points</th>\n",
       "      <th>away_line_scores</th>\n",
       "      <th>away_post_win_prob</th>\n",
       "      <th>excitement_index</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>401110723</td>\n",
       "      <td>2019</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2019-08-24T23:00:00.000Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>66543.0</td>\n",
       "      <td>4013</td>\n",
       "      <td>...</td>\n",
       "      <td>24</td>\n",
       "      <td>[7, 0, 10, 7]</td>\n",
       "      <td>0.905953</td>\n",
       "      <td>2390</td>\n",
       "      <td>Miami</td>\n",
       "      <td>ACC</td>\n",
       "      <td>20</td>\n",
       "      <td>[3, 10, 0, 7]</td>\n",
       "      <td>0.094047</td>\n",
       "      <td>8.767910</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>401114164</td>\n",
       "      <td>2019</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2019-08-25T02:30:00.000Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>22396.0</td>\n",
       "      <td>3610</td>\n",
       "      <td>...</td>\n",
       "      <td>45</td>\n",
       "      <td>[14, 14, 7, 10]</td>\n",
       "      <td>0.688630</td>\n",
       "      <td>12</td>\n",
       "      <td>Arizona</td>\n",
       "      <td>Pac-12</td>\n",
       "      <td>38</td>\n",
       "      <td>[0, 21, 14, 3]</td>\n",
       "      <td>0.311370</td>\n",
       "      <td>7.842417</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>401117855</td>\n",
       "      <td>2019</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2019-08-29T23:00:00.000Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>19648.0</td>\n",
       "      <td>3892</td>\n",
       "      <td>...</td>\n",
       "      <td>24</td>\n",
       "      <td>[7, 3, 14, 0]</td>\n",
       "      <td>0.728942</td>\n",
       "      <td>2681</td>\n",
       "      <td>Wagner</td>\n",
       "      <td>None</td>\n",
       "      <td>21</td>\n",
       "      <td>[0, 0, 14, 7]</td>\n",
       "      <td>0.271058</td>\n",
       "      <td>1.834351</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>401119255</td>\n",
       "      <td>2019</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2019-08-29T23:00:00.000Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>18412.0</td>\n",
       "      <td>3965</td>\n",
       "      <td>...</td>\n",
       "      <td>38</td>\n",
       "      <td>[21, 7, 10, 0]</td>\n",
       "      <td>0.999788</td>\n",
       "      <td>2523</td>\n",
       "      <td>Robert Morris</td>\n",
       "      <td>None</td>\n",
       "      <td>10</td>\n",
       "      <td>[7, 3, 0, 0]</td>\n",
       "      <td>0.000212</td>\n",
       "      <td>0.118588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>401119254</td>\n",
       "      <td>2019</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2019-08-29T23:00:00.000Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>17620.0</td>\n",
       "      <td>3700</td>\n",
       "      <td>...</td>\n",
       "      <td>46</td>\n",
       "      <td>[13, 17, 7, 9]</td>\n",
       "      <td>0.999979</td>\n",
       "      <td>2415</td>\n",
       "      <td>Morgan State</td>\n",
       "      <td>None</td>\n",
       "      <td>3</td>\n",
       "      <td>[0, 3, 0, 0]</td>\n",
       "      <td>0.000021</td>\n",
       "      <td>0.472968</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          id  season  week season_type                start_date  \\\n",
       "0  401110723    2019     1     regular  2019-08-24T23:00:00.000Z   \n",
       "1  401114164    2019     1     regular  2019-08-25T02:30:00.000Z   \n",
       "2  401117855    2019     1     regular  2019-08-29T23:00:00.000Z   \n",
       "3  401119255    2019     1     regular  2019-08-29T23:00:00.000Z   \n",
       "4  401119254    2019     1     regular  2019-08-29T23:00:00.000Z   \n",
       "\n",
       "   start_time_tbd  neutral_site  conference_game  attendance  venue_id  ...  \\\n",
       "0             NaN          True            False     66543.0      4013  ...   \n",
       "1             NaN         False            False     22396.0      3610  ...   \n",
       "2             NaN         False            False     19648.0      3892  ...   \n",
       "3             NaN         False            False     18412.0      3965  ...   \n",
       "4             NaN         False            False     17620.0      3700  ...   \n",
       "\n",
       "  home_points  home_line_scores home_post_win_prob away_id      away_team  \\\n",
       "0          24     [7, 0, 10, 7]           0.905953    2390          Miami   \n",
       "1          45   [14, 14, 7, 10]           0.688630      12        Arizona   \n",
       "2          24     [7, 3, 14, 0]           0.728942    2681         Wagner   \n",
       "3          38    [21, 7, 10, 0]           0.999788    2523  Robert Morris   \n",
       "4          46    [13, 17, 7, 9]           0.999979    2415   Morgan State   \n",
       "\n",
       "  away_conference  away_points  away_line_scores away_post_win_prob  \\\n",
       "0             ACC           20     [3, 10, 0, 7]           0.094047   \n",
       "1          Pac-12           38    [0, 21, 14, 3]           0.311370   \n",
       "2            None           21     [0, 0, 14, 7]           0.271058   \n",
       "3            None           10      [7, 3, 0, 0]           0.000212   \n",
       "4            None            3      [0, 3, 0, 0]           0.000021   \n",
       "\n",
       "  excitement_index  \n",
       "0         8.767910  \n",
       "1         7.842417  \n",
       "2         1.834351  \n",
       "3         0.118588  \n",
       "4         0.472968  \n",
       "\n",
       "[5 rows x 24 columns]"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import requests\n",
    "import numpy as np\n",
    "\n",
    "year = 2019\n",
    "\n",
    "response = requests.get(r'https://api.collegefootballdata.com/games?'\n",
    "                            'year={year}&seasonType=both'.format(year = year))\n",
    "games = pd.read_json(response.text)\n",
    "\n",
    "games.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! Now, lets simplify. The next three lines do three things:\n",
    "1. Take just the FBS games (no FCS games)\n",
    "2. Drop any unplayed or canceled games\n",
    "3. Take just the columns we need"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>home_team</th>\n",
       "      <th>home_points</th>\n",
       "      <th>away_team</th>\n",
       "      <th>away_points</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Florida</td>\n",
       "      <td>24</td>\n",
       "      <td>Miami</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Hawai'i</td>\n",
       "      <td>45</td>\n",
       "      <td>Arizona</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>24</td>\n",
       "      <td>UCLA</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Clemson</td>\n",
       "      <td>52</td>\n",
       "      <td>Georgia Tech</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Tulane</td>\n",
       "      <td>42</td>\n",
       "      <td>Florida International</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     home_team  home_points              away_team  away_points\n",
       "0      Florida           24                  Miami           20\n",
       "1      Hawai'i           45                Arizona           38\n",
       "5   Cincinnati           24                   UCLA           14\n",
       "9      Clemson           52           Georgia Tech           14\n",
       "11      Tulane           42  Florida International           14"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "games = games[(~games['home_conference'].isnull()) & (~games['away_conference'].isnull())]\n",
    "games = games[(games['home_points'] > 0) | (games['away_points'] > 0)]\n",
    "games = games[['home_team','home_points','away_team','away_points']]\n",
    "\n",
    "games.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That looks better! Let's add a $\\pm1$ for whether the home or away team weans, and a column of ones."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>home_team</th>\n",
       "      <th>home_points</th>\n",
       "      <th>away_team</th>\n",
       "      <th>away_points</th>\n",
       "      <th>home_win</th>\n",
       "      <th>ones</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Florida</td>\n",
       "      <td>24</td>\n",
       "      <td>Miami</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Hawai'i</td>\n",
       "      <td>45</td>\n",
       "      <td>Arizona</td>\n",
       "      <td>38</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>24</td>\n",
       "      <td>UCLA</td>\n",
       "      <td>14</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Clemson</td>\n",
       "      <td>52</td>\n",
       "      <td>Georgia Tech</td>\n",
       "      <td>14</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Tulane</td>\n",
       "      <td>42</td>\n",
       "      <td>Florida International</td>\n",
       "      <td>14</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     home_team  home_points              away_team  away_points  home_win  \\\n",
       "0      Florida           24                  Miami           20         1   \n",
       "1      Hawai'i           45                Arizona           38         1   \n",
       "5   Cincinnati           24                   UCLA           14         1   \n",
       "9      Clemson           52           Georgia Tech           14         1   \n",
       "11      Tulane           42  Florida International           14         1   \n",
       "\n",
       "    ones  \n",
       "0      1  \n",
       "1      1  \n",
       "5      1  \n",
       "9      1  \n",
       "11     1  "
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "games['home_win'] = -1+ 2*(games['home_points'] > games['away_points']).astype(int)\n",
    "games['ones'] = 1\n",
    "\n",
    "games.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It will be useful to have a list of the teams so lets get that now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Air Force</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Akron</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Alabama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Appalachian State</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Arizona</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                team\n",
       "0          Air Force\n",
       "1              Akron\n",
       "2            Alabama\n",
       "3  Appalachian State\n",
       "4            Arizona"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "teams = pd.DataFrame(games['home_team'].append(games['away_team']).unique(),columns = ['team'])\n",
    "teams = teams.sort_values(by = ['team']).reset_index(drop = True)\n",
    "\n",
    "teams.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay! Now lets get the vector on the right hand side of the matrix equation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>str_of_rec</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>home_team</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Air Force</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Akron</th>\n",
       "      <td>-5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Alabama</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Appalachian State</th>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Arizona</th>\n",
       "      <td>-1.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   str_of_rec\n",
       "home_team                    \n",
       "Air Force                 5.0\n",
       "Akron                    -5.0\n",
       "Alabama                   5.0\n",
       "Appalachian State         6.5\n",
       "Arizona                  -1.5"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "colley_vec = 1+(games[['home_team','home_win']].groupby('home_team').sum()\\\n",
    "         -games[['away_team','home_win']].groupby('away_team').sum())/2\n",
    "colley_vec = colley_vec.rename(columns = {'home_win':'str_of_rec'})\n",
    "\n",
    "colley_vec.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Creating the matrix takes a couple clever moves. First we will make a vector that counts games played and use that to create the diagonal of the colley matrix. We'll only look at a few teams since this matrix is 130x130."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>team</th>\n",
       "      <th>Michigan</th>\n",
       "      <th>Wisconsin</th>\n",
       "      <th>Ohio State</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Michigan</th>\n",
       "      <td>15.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wisconsin</th>\n",
       "      <td>0.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio State</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "team        Michigan  Wisconsin  Ohio State\n",
       "team                                       \n",
       "Michigan        15.0        0.0         0.0\n",
       "Wisconsin        0.0       16.0         0.0\n",
       "Ohio State       0.0        0.0        16.0"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "games_played = (games[['home_team','ones']].groupby('home_team').sum()+games[['away_team','ones']].groupby('away_team').sum())\n",
    "diag = pd.DataFrame(2*np.identity(len(colley_vec))+np.diag(games_played['ones']),teams['team'],teams['team'])\n",
    "\n",
    "diag.loc[['Michigan','Wisconsin','Ohio State'],['Michigan','Wisconsin','Ohio State']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to create the off-diagonal entries, we will pivot on our dataframe twice, once for counting games for the home team, and once more for the away team. Adding this to our diagonal gives the Colley Matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>team</th>\n",
       "      <th>Michigan</th>\n",
       "      <th>Wisconsin</th>\n",
       "      <th>Ohio State</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Michigan</th>\n",
       "      <td>15.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Wisconsin</th>\n",
       "      <td>-1.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>-2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio State</th>\n",
       "      <td>-1.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "team        Michigan  Wisconsin  Ohio State\n",
       "team                                       \n",
       "Michigan        15.0       -1.0        -1.0\n",
       "Wisconsin       -1.0       16.0        -2.0\n",
       "Ohio State      -1.0       -2.0        16.0"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "piv1 = pd.pivot_table(games,values = 'ones',index = 'home_team', \\\n",
    "                      columns = 'away_team', aggfunc = np.sum).fillna(0)\n",
    "\n",
    "piv2 = pd.pivot_table(games,values = 'ones',index = 'away_team', \\\n",
    "                      columns = 'home_team', aggfunc = np.sum).fillna(0)\n",
    "    \n",
    "colley_mat = diag - piv1 - piv2\n",
    "\n",
    "colley_mat.loc[['Michigan','Wisconsin','Ohio State'],['Michigan','Wisconsin','Ohio State']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! We can see that each team played one another at least once, and Wisconsin and Ohio State played each other twice.\n",
    "\n",
    "We just run a matrix solver at this point and we'll have our ratings!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rating</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>LSU</th>\n",
       "      <td>1.064182</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio State</th>\n",
       "      <td>0.986428</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Clemson</th>\n",
       "      <td>0.943394</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Georgia</th>\n",
       "      <td>0.926277</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Penn State</th>\n",
       "      <td>0.891403</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Florida</th>\n",
       "      <td>0.876903</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td>0.869208</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Notre Dame</th>\n",
       "      <td>0.850672</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              rating\n",
       "team                \n",
       "LSU         1.064182\n",
       "Ohio State  0.986428\n",
       "Clemson     0.943394\n",
       "Georgia     0.926277\n",
       "Penn State  0.891403\n",
       "Florida     0.876903\n",
       "Oregon      0.869208\n",
       "Notre Dame  0.850672"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "colley_inv = pd.DataFrame(np.linalg.pinv(colley_mat.values),colley_mat.columns,colley_mat.index)\n",
    "ratings = colley_inv.dot(colley_vec)\n",
    "ratings.rename(columns={'str_of_rec':'rating'},inplace=True)\n",
    "\n",
    "ratings = ratings.sort_values(by = ['rating'], ascending = False)\n",
    "\n",
    "ratings.head(8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Awesome! It looks reasonable too! We can compare this to Colley's ratings to see if we're right. As of 2007, Colley added in a roundabout way of including FCS teams, but our ratings should be very close to his. We can check 2006 and see that they agree up to four decimal places, which is good enough for me!\n",
    "\n",
    "Next time, we'll take this resume ranking system and see what we can do to make it more representative of team's power. Colley's Matrix Method is a compelling way for accounting for strength of schedule. If we can find a way to add in more information than simply wins and losses, we may be able to create some pretty reliable power ratings!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}