{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Offensive Tackles in Football\n", "\n", "So I have a brother who is currently 16 and plays high school football as an offensive tackle. He is 6 feet 2 inches and weights 250 pounds. Now that is a big kid. His dream is to play professional football one day.\n", "\n", "The first step to playing professional football is to play college football for hopefully a good team. Not knowing much about what it takes to get recruited by a top college football team, I thought I would look for some data. Fortunately, ESPN has height and weight data on the [top 100 offensive tackles](http://espn.go.com/college-sports/football/recruiting/playerrankings/_/position/offensive-tackle/class/2015/view/position) being recruited out of high school. This little project will look at the height and weight of top recruited offensive tackles and how these values are associated with that player's rank.\n", "\n", "#Get and Clean the Data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "import urllib2\n", "import pandas as pd\n", "from pandas import DataFrame, Series\n", "%matplotlib inline\n", "from __future__ import division\n", "from matplotlib import pyplot as plt\n", "import seaborn as sns\n", "sns.set(style='ticks', palette='Set2')\n", "import statsmodels.api as sm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets get the data from ESPN." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "html = urllib2.urlopen('http://espn.go.com/college-sports/football/recruiting/playerrankings/_/view/position/order/true/position/offensive-guard')\n", "text = html.read()\n", "soup = BeautifulSoup(text.replace('ISO-8859-1', 'utf-8'))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ht_wgt = []\n", "for tr in soup.findAll('tr')[1:]:\n", " tds = tr.findAll('td')\n", " height = tds[4].text\n", " weight = tds[5].text\n", " grade = tds[7].text\n", " ht_wgt.append([height, weight, grade])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A quick sanity check to make sure we got 100 players" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "100" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#should have 100\n", "len(ht_wgt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now lets drop our data into a Pandas data frame and take a look." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
06'5''33087
16'5''28085
26'3''34084
36'3''31284
46'5''31583
\n", "
" ], "text/plain": [ " height weight grade\n", "0 6'5'' 330 87\n", "1 6'5'' 280 85\n", "2 6'3'' 340 84\n", "3 6'3'' 312 84\n", "4 6'5'' 315 83" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = DataFrame(ht_wgt, columns=['height', 'weight', 'grade'])\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets clean up the data to get the values as integers and convert the height to inches. I also created a mean zero grade just to bring the grades closer to zero." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "