{ "metadata": { "name": "", "signature": "sha256:f812e99ef46774caf5996f42fab95867ac2bbdc6ff780236e360b3c4d4645bab" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 2015 Phillies Games Broadcast on National Television\n", "\n", "I like watching the Phillies. I do not have cable. Some Phillies games are broadcast on national television. This is how I made a list of those games.\n", "\n", "## [Pandas](http://pandas.pydata.org/)\n", "\n", "Pandas is a data analysis tool for the [Python](https://www.python.org/) programming language. It can do a tremendous amount of really powerful data analysis and visualization. It's a gun in this CSV knife fight." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A downloadable [CSV schedule](http://philadelphia.phillies.mlb.com/schedule/downloadable.jsp?c_id=phi&year=2015) is available from [mlb.com](http://mlb.com). Here is a [direct link](http://mlb.mlb.com/soa/ical/schedule.csv?team_id=143&season=2015&game_type=%27R%27) to the Phillies schedule.\n", "\n", "The CSV schedule will be used to instantiate a Pandas [DataFrame](http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.DataFrame.html) object." ] }, { "cell_type": "code", "collapsed": false, "input": [ "schedule = pd.DataFrame.from_csv(\"phillies-2015.csv\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 48 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What does the schedule metadata look like?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "schedule.info()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "DatetimeIndex: 197 entries, 2015-03-01 00:00:00 to 2015-10-04 00:00:00\n", "Data columns (total 16 columns):\n", "START_TIME 197 non-null object\n", "START_TIME_ET 197 non-null object\n", "SUBJECT 197 non-null object\n", "LOCATION 197 non-null object\n", "DESCRIPTION 193 non-null object\n", "END_DATE 197 non-null object\n", "END_DATE_ET 197 non-null object\n", "END_TIME 197 non-null object\n", "END_TIME_ET 197 non-null object\n", "REMINDER_OFF 197 non-null bool\n", "REMINDER_ON 197 non-null bool\n", "REMINDER_DATE 197 non-null object\n", "REMINDER_TIME 197 non-null object\n", "REMINDER_TIME_ET 197 non-null object\n", "SHOWTIMEAS_FREE 197 non-null object\n", "SHOWTIMEAS_BUSY 197 non-null object\n", "dtypes: bool(2), object(14)" ] } ], "prompt_number": 49 }, { "cell_type": "markdown", "metadata": {}, "source": [ "197 games and 16 columns of data for each game. \n", "\n", "## What does the schedule data itself look like?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "schedule.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
START_TIMESTART_TIME_ETSUBJECTLOCATIONDESCRIPTIONEND_DATEEND_DATE_ETEND_TIMEEND_TIME_ETREMINDER_OFFREMINDER_ONREMINDER_DATEREMINDER_TIMEREMINDER_TIME_ETSHOWTIMEAS_FREESHOWTIMEAS_BUSY
START_DATE
2015-03-01 01:05 PM 01:05 PM Spartans at Phillies Bright House Field NaN 03/01/15 03/01/15 04:05 PM 04:05 PM False True 03/01/15 12:05 PM 12:05 PM FREE BUSY
2015-03-03 01:05 PM 01:05 PM Yankees at Phillies Bright House Field Local TV: MLBN (delay) -- CSN ----- Local Radi... 03/03/15 03/03/15 04:05 PM 04:05 PM False True 03/03/15 12:05 PM 12:05 PM FREE BUSY
2015-03-04 01:05 PM 01:05 PM Phillies at Yankees George M. Steinbrenner Field Local TV: MLBN ----- Local Radio: MLB.com 03/04/15 03/04/15 04:05 PM 04:05 PM False True 03/04/15 12:05 PM 12:05 PM FREE BUSY
2015-03-05 01:05 PM 01:05 PM Phillies at Astros Osceola County Stadium Local Radio: MLB.com 03/05/15 03/05/15 04:05 PM 04:05 PM False True 03/05/15 12:05 PM 12:05 PM FREE BUSY
2015-03-06 01:05 PM 01:05 PM Yankees at Phillies Bright House Field Local TV: TCN -- MLBN 03/06/15 03/06/15 04:05 PM 04:05 PM False True 03/06/15 12:05 PM 12:05 PM FREE BUSY
\n", "

5 rows \u00d7 16 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 50, "text": [ " START_TIME START_TIME_ET SUBJECT \\\n", "START_DATE \n", "2015-03-01 01:05 PM 01:05 PM Spartans at Phillies \n", "2015-03-03 01:05 PM 01:05 PM Yankees at Phillies \n", "2015-03-04 01:05 PM 01:05 PM Phillies at Yankees \n", "2015-03-05 01:05 PM 01:05 PM Phillies at Astros \n", "2015-03-06 01:05 PM 01:05 PM Yankees at Phillies \n", "\n", " LOCATION \\\n", "START_DATE \n", "2015-03-01 Bright House Field \n", "2015-03-03 Bright House Field \n", "2015-03-04 George M. Steinbrenner Field \n", "2015-03-05 Osceola County Stadium \n", "2015-03-06 Bright House Field \n", "\n", " DESCRIPTION END_DATE \\\n", "START_DATE \n", "2015-03-01 NaN 03/01/15 \n", "2015-03-03 Local TV: MLBN (delay) -- CSN ----- Local Radi... 03/03/15 \n", "2015-03-04 Local TV: MLBN ----- Local Radio: MLB.com 03/04/15 \n", "2015-03-05 Local Radio: MLB.com 03/05/15 \n", "2015-03-06 Local TV: TCN -- MLBN 03/06/15 \n", "\n", " END_DATE_ET END_TIME END_TIME_ET REMINDER_OFF REMINDER_ON \\\n", "START_DATE \n", "2015-03-01 03/01/15 04:05 PM 04:05 PM False True \n", "2015-03-03 03/03/15 04:05 PM 04:05 PM False True \n", "2015-03-04 03/04/15 04:05 PM 04:05 PM False True \n", "2015-03-05 03/05/15 04:05 PM 04:05 PM False True \n", "2015-03-06 03/06/15 04:05 PM 04:05 PM False True \n", "\n", " REMINDER_DATE REMINDER_TIME REMINDER_TIME_ET SHOWTIMEAS_FREE \\\n", "START_DATE \n", "2015-03-01 03/01/15 12:05 PM 12:05 PM FREE \n", "2015-03-03 03/03/15 12:05 PM 12:05 PM FREE \n", "2015-03-04 03/04/15 12:05 PM 12:05 PM FREE \n", "2015-03-05 03/05/15 12:05 PM 12:05 PM FREE \n", "2015-03-06 03/06/15 12:05 PM 12:05 PM FREE \n", "\n", " SHOWTIMEAS_BUSY \n", "START_DATE \n", "2015-03-01 BUSY \n", "2015-03-03 BUSY \n", "2015-03-04 BUSY \n", "2015-03-05 BUSY \n", "2015-03-06 BUSY \n", "\n", "[5 rows x 16 columns]" ] } ], "prompt_number": 50 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleaning up the schedule\n", "\n", "The `DESCRIPTION` column contains the broadcast information. Less interesting columns can be removed." ] }, { "cell_type": "code", "collapsed": false, "input": [ "schedule.drop([\"REMINDER_OFF\", \n", " \"REMINDER_ON\",\n", " \"START_TIME_ET\",\n", " \"END_DATE\",\n", " \"END_DATE_ET\",\n", " \"END_TIME\",\n", " \"END_TIME_ET\",\n", " \"REMINDER_TIME\",\n", " \"REMINDER_TIME_ET\",\n", " \"SHOWTIMEAS_FREE\",\n", " \"SHOWTIMEAS_BUSY\",\n", " \"REMINDER_DATE\"], axis=1, inplace=True)\n", "schedule.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
START_TIMESUBJECTLOCATIONDESCRIPTION
START_DATE
2015-03-01 01:05 PM Spartans at Phillies Bright House Field NaN
2015-03-03 01:05 PM Yankees at Phillies Bright House Field Local TV: MLBN (delay) -- CSN ----- Local Radi...
2015-03-04 01:05 PM Phillies at Yankees George M. Steinbrenner Field Local TV: MLBN ----- Local Radio: MLB.com
2015-03-05 01:05 PM Phillies at Astros Osceola County Stadium Local Radio: MLB.com
2015-03-06 01:05 PM Yankees at Phillies Bright House Field Local TV: TCN -- MLBN
\n", "

5 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 51, "text": [ " START_TIME SUBJECT LOCATION \\\n", "START_DATE \n", "2015-03-01 01:05 PM Spartans at Phillies Bright House Field \n", "2015-03-03 01:05 PM Yankees at Phillies Bright House Field \n", "2015-03-04 01:05 PM Phillies at Yankees George M. Steinbrenner Field \n", "2015-03-05 01:05 PM Phillies at Astros Osceola County Stadium \n", "2015-03-06 01:05 PM Yankees at Phillies Bright House Field \n", "\n", " DESCRIPTION \n", "START_DATE \n", "2015-03-01 NaN \n", "2015-03-03 Local TV: MLBN (delay) -- CSN ----- Local Radi... \n", "2015-03-04 Local TV: MLBN ----- Local Radio: MLB.com \n", "2015-03-05 Local Radio: MLB.com \n", "2015-03-06 Local TV: TCN -- MLBN \n", "\n", "[5 rows x 4 columns]" ] } ], "prompt_number": 51 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What are all of the stations that games are broadcast on this season?\n", "\n", "The `DESCRIPTION` column is nice because it mentions the stations that games are broadcast on. Sometimes a game is broadcast on two channels at once. There is also radio broadcast information that I'm not interested in right now." ] }, { "cell_type": "code", "collapsed": false, "input": [ "schedule.DESCRIPTION.head()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 52, "text": [ "START_DATE\n", "2015-03-01 NaN\n", "2015-03-03 Local TV: MLBN (delay) -- CSN ----- Local Radi...\n", "2015-03-04 Local TV: MLBN ----- Local Radio: MLB.com\n", "2015-03-05 Local Radio: MLB.com\n", "2015-03-06 Local TV: TCN -- MLBN\n", "Name: DESCRIPTION, dtype: object" ] } ], "prompt_number": 52 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parse television station broadcast channels from `DESCRIPTION`\n", "\n", "Thankfully, the `DESCRIPTION` column data is parseable. Getting a list of television broadcast stations for each game is not too difficult." ] }, { "cell_type": "code", "collapsed": false, "input": [ "description = schedule.DESCRIPTION[1]\n", "print description" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Local TV: MLBN (delay) -- CSN ----- Local Radio: 94 WIP -- 1210 WPHT\n" ] } ], "prompt_number": 54 }, { "cell_type": "code", "collapsed": false, "input": [ "def tv_stations_from_description(description):\n", " \"\"\"Return a list of television stations embedded in the given description.\"\"\"\n", " return [station.strip() for station in description.split(\":\")[1].split(\"-----\")[0].split(\"--\")] if str(description) != \"nan\" else []\n", "\n", "result = tv_stations_from_description(description)\n", "\n", "print result\n", "assert(len(result) == 2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['MLBN (delay)', 'CSN']\n" ] } ], "prompt_number": 60 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Picking a game broadcast on a single channel to test the parsing function." ] }, { "cell_type": "code", "collapsed": false, "input": [ "description = schedule.DESCRIPTION[1]\n", "print description\n", "result = tv_stations_from_description(description)\n", "print result\n", "assert(len(result) == 2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Local TV: MLBN (delay) -- CSN ----- Local Radio: 94 WIP -- 1210 WPHT\n", "['MLBN (delay)', 'CSN']\n" ] } ], "prompt_number": 62 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Applying this function to the DataFrame yields a [`Series`](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#series) of all television stations on which the Phillies are broadcast this season." ] }, { "cell_type": "code", "collapsed": false, "input": [ "stations_series = schedule.DESCRIPTION.apply(\n", " lambda description: [station.strip() for station in description.split(\":\")[1].split(\"-----\")[0].split(\"--\")] if str(description) != \"nan\" else [])\n", "stations_series" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 63, "text": [ "START_DATE\n", "2015-03-01 []\n", "2015-03-03 [MLBN (delay), CSN]\n", "2015-03-04 [MLBN]\n", "2015-03-05 [MLB.com]\n", "2015-03-06 [TCN, MLBN]\n", "2015-03-07 [TCN, MLBN]\n", "2015-03-08 [94 WIP, 1210 WPHT]\n", "2015-03-09 [MLB.com]\n", "2015-03-10 [TCN, MLBN]\n", "2015-03-11 [TCN]\n", "2015-03-12 [MLB.com]\n", "2015-03-13 [MLBN (delay), TCN]\n", "2015-03-14 [94 WIP, 1210 WPHT]\n", "2015-03-15 []\n", "2015-03-15 [CSN]\n", "...\n", "2015-09-18 [CSN]\n", "2015-09-19 [CSN]\n", "2015-09-20 [CSN]\n", "2015-09-22 [CSN]\n", "2015-09-23 [CSN]\n", "2015-09-24 [CSN]\n", "2015-09-25 [CSN]\n", "2015-09-26 [CSN]\n", "2015-09-27 [CSN]\n", "2015-09-29 [CSN]\n", "2015-09-30 [CSN]\n", "2015-10-01 [CSN]\n", "2015-10-02 [CSN]\n", "2015-10-03 [CSN]\n", "2015-10-04 [CSN]\n", "Name: DESCRIPTION, Length: 197" ] } ], "prompt_number": 63 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating a `set` of stations from that `Series` will yield a concise list of distinct television broadcast stations." ] }, { "cell_type": "code", "collapsed": false, "input": [ "set([station for stations in stations_series.values for station in stations])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 64, "text": [ "{'1210 WPHT',\n", " '94 WIP',\n", " 'CSN',\n", " 'ESPN',\n", " 'MLB.com',\n", " 'MLBN',\n", " 'MLBN (delay)',\n", " 'NBC 10',\n", " 'SBP 1480',\n", " 'TCN'}" ] } ], "prompt_number": 64 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 197 Phillies games are broadcast on 10 television channels. Unfortunately only 2 of those 7 stations are available without a cable television subscription. This means that I can only watch games on NBC and FOX.\n", "\n", "## The Phillies national television broadcast schedule\n", "\n", "Filtering the `DESCRIPTION` column to national television broadcast stations yields only the games which I can watch over the air with my [HD antenna](http://amzn.to/1r5eZmQ)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "schedule[(schedule.DESCRIPTION.str.contains(\"NBC 10\")) | \n", " (schedule.DESCRIPTION.str.contains(\"FOX\"))]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
START_TIMESUBJECTLOCATIONDESCRIPTION
START_DATE
2015-04-13 01:10 PM Phillies at Mets Citi Field Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-05-16 07:05 PM D-backs at Phillies Citizens Bank Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-05-22 07:05 PM Phillies at Nationals Nationals Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-05-29 07:05 PM Rockies at Phillies Citizens Bank Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-06-05 07:05 PM Giants at Phillies Citizens Bank Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-06-19 07:05 PM Cardinals at Phillies Citizens Bank Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-06-24 01:05 PM Phillies at Yankees Yankee Stadium Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-07-18 07:05 PM Marlins at Phillies Citizens Bank Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-08-01 07:05 PM Braves at Phillies Citizens Bank Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-08-15 07:10 PM Phillies at Brewers Miller Park Local TV: NBC 10 ----- Local Radio: SBP 1480
2015-08-29 07:05 PM Padres at Phillies Citizens Bank Park Local TV: NBC 10 ----- Local Radio: SBP 1480
\n", "

11 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 65, "text": [ " START_TIME SUBJECT LOCATION \\\n", "START_DATE \n", "2015-04-13 01:10 PM Phillies at Mets Citi Field \n", "2015-05-16 07:05 PM D-backs at Phillies Citizens Bank Park \n", "2015-05-22 07:05 PM Phillies at Nationals Nationals Park \n", "2015-05-29 07:05 PM Rockies at Phillies Citizens Bank Park \n", "2015-06-05 07:05 PM Giants at Phillies Citizens Bank Park \n", "2015-06-19 07:05 PM Cardinals at Phillies Citizens Bank Park \n", "2015-06-24 01:05 PM Phillies at Yankees Yankee Stadium \n", "2015-07-18 07:05 PM Marlins at Phillies Citizens Bank Park \n", "2015-08-01 07:05 PM Braves at Phillies Citizens Bank Park \n", "2015-08-15 07:10 PM Phillies at Brewers Miller Park \n", "2015-08-29 07:05 PM Padres at Phillies Citizens Bank Park \n", "\n", " DESCRIPTION \n", "START_DATE \n", "2015-04-13 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-05-16 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-05-22 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-05-29 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-06-05 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-06-19 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-06-24 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-07-18 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-08-01 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-08-15 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "2015-08-29 Local TV: NBC 10 ----- Local Radio: SBP 1480 \n", "\n", "[11 rows x 4 columns]" ] } ], "prompt_number": 65 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that I have the possibility to watch 13 out of 197 Phillies games this season which is roughly 7%." ] } ], "metadata": {} } ] }