# 2016 Phillies Games Broadcast on National Television

I like watching the Phillies.  I do not have cable.  Some Phillies games are broadcast on national television.  This is how I made a list of those games.

## [Pandas](http://pandas.pydata.org/)

Pandas is a data analysis tool for the [Python](https://www.python.org/) programming language.  It can do a tremendous amount of really powerful data analysis and visualization.  It's a gun in this CSV knife fight.

In [1]:
import pandas as pd

A downloadable [CSV schedule](http://philadelphia.phillies.mlb.com/schedule/downloadable.jsp#csv-format) is available from [mlb.com](http://mlb.com).  Here is a [direct link](http://mlb.mlb.com/ticketing-client/csv/EventTicketPromotionPrice.tiksrv?team_id=108&display_in=singlegame&ticket_category=Tickets&site_section=Default&sub_category=Default&leave_empty_games=true&event_type=T&event_type=Y) to the Phillies schedule.

The CSV schedule will be used to instantiate a Pandas [DataFrame](http://pandas.pydata.org/pandas-docs/version/0.13.1/generated/pandas.DataFrame.html) object.

In [2]:
schedule = pd.DataFrame.from_csv("phillies-2016.csv")

## What does the schedule metadata look like?

In [3]:
schedule.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 190 entries, 2016-03-07 to 2016-10-02
Data columns (total 16 columns):
START TIME          189 non-null object
START TIME ET       189 non-null object
SUBJECT             190 non-null object
LOCATION            190 non-null object
DESCRIPTION         187 non-null object
END DATE            190 non-null object
END DATE ET         190 non-null object
END TIME            189 non-null object
END TIME ET         189 non-null object
REMINDER OFF        190 non-null bool
REMINDER ON         190 non-null bool
REMINDER DATE       190 non-null object
REMINDER TIME       189 non-null object
REMINDER TIME ET    189 non-null object
SHOWTIMEAS FREE     190 non-null object
SHOWTIMEAS BUSY     190 non-null object
dtypes: bool(2), object(14)
memory usage: 22.6+ KB


190 games and 16 columns of data for each game.  

## What does the schedule data itself look like?

In [4]:
schedule.head()

Unnamed: 0_level_0,START TIME,START TIME ET,SUBJECT,LOCATION,DESCRIPTION,END DATE,END DATE ET,END TIME,END TIME ET,REMINDER OFF,REMINDER ON,REMINDER DATE,REMINDER TIME,REMINDER TIME ET,SHOWTIMEAS FREE,SHOWTIMEAS BUSY
START DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2016-03-07,01:05 PM,01:05 PM,Phillies at Pirates,McKechnie Field - Bradenton,Local TV: MLB.TV ----- Local Radio: MLB.com,03/07/16,03/07/16,04:05 PM,04:05 PM,False,True,03/07/16,12:05 PM,12:05 PM,FREE,BUSY
2016-03-08,01:05 PM,01:05 PM,Pirates at Phillies,Bright House Field - Clearwater,Local TV: TCN- MLB.TV,03/08/16,03/08/16,04:05 PM,04:05 PM,False,True,03/08/16,12:05 PM,12:05 PM,FREE,BUSY
2016-03-09,01:05 PM,01:05 PM,Phillies at Twins,CenturyLink Sports Complex - Fort Myers,,03/09/16,03/09/16,04:05 PM,04:05 PM,False,True,03/09/16,12:05 PM,12:05 PM,FREE,BUSY
2016-03-09,01:05 PM,01:05 PM,Orioles at Phillies,Bright House Field - Clearwater,Local TV: TCN- MLB.TV,03/09/16,03/09/16,04:05 PM,04:05 PM,False,True,03/09/16,12:05 PM,12:05 PM,FREE,BUSY
2016-03-10,01:05 PM,01:05 PM,Tigers at Phillies,Bright House Field - Clearwater,Local TV: TCN- MLBN- MLB.TV,03/10/16,03/10/16,04:05 PM,04:05 PM,False,True,03/10/16,12:05 PM,12:05 PM,FREE,BUSY


## Cleaning up the schedule

The `DESCRIPTION` column contains the broadcast information.  Less interesting columns can be removed.

In [6]:
schedule.drop(["REMINDER OFF", 
             "REMINDER ON",
             "START TIME ET",
             "END DATE",
             "END DATE ET",
             "END TIME",
             "END TIME ET",
             "REMINDER TIME",
             "REMINDER TIME ET",
             "SHOWTIMEAS FREE",
             "SHOWTIMEAS BUSY",
             "REMINDER DATE"], axis=1, inplace=True)
schedule.head()

Unnamed: 0_level_0,START TIME,SUBJECT,LOCATION,DESCRIPTION
START DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016-03-07,01:05 PM,Phillies at Pirates,McKechnie Field - Bradenton,Local TV: MLB.TV ----- Local Radio: MLB.com
2016-03-08,01:05 PM,Pirates at Phillies,Bright House Field - Clearwater,Local TV: TCN- MLB.TV
2016-03-09,01:05 PM,Phillies at Twins,CenturyLink Sports Complex - Fort Myers,
2016-03-09,01:05 PM,Orioles at Phillies,Bright House Field - Clearwater,Local TV: TCN- MLB.TV
2016-03-10,01:05 PM,Tigers at Phillies,Bright House Field - Clearwater,Local TV: TCN- MLBN- MLB.TV


## What are all of the stations that games are broadcast on this season?

The `DESCRIPTION` column is nice because it mentions the stations that games are broadcast on.  Sometimes a game is broadcast on two channels at once.  There is also radio broadcast information that I'm not interested in right now.

In [11]:
schedule.DESCRIPTION.head(50)

START DATE
2016-03-07        Local TV: MLB.TV ----- Local Radio: MLB.com
2016-03-08                              Local TV: TCN- MLB.TV
2016-03-09                                                NaN
2016-03-09                              Local TV: TCN- MLB.TV
2016-03-10                        Local TV: TCN- MLBN- MLB.TV
2016-03-11                               Local Radio: MLB.com
2016-03-12    Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP
2016-03-13         Local TV: MLB.TV ----- Local Radio: 94 WIP
2016-03-14                               Local Radio: MLB.com
2016-03-15                               Local Radio: MLB.com
2016-03-17                              Local TV: TCN- MLB.TV
2016-03-18                              Local TV: TCN- MLB.TV
2016-03-19         Local TV: MLB.TV ----- Local Radio: 94 WIP
2016-03-20    Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP
2016-03-21                               Local Radio: MLB.com
2016-03-22                              Local TV: TCN- MLB.

### Parse television station broadcast channels from `DESCRIPTION`

Thankfully, the `DESCRIPTION` column data is parseable.  Getting a list of television broadcast stations for each game is not _too_ difficult.

In [73]:
description = schedule.DESCRIPTION[6]
print description

Local TV: CSN- MLB.TV ----- Local Radio: 94 WIP


Grab the rough station string with a regular expression.

In [123]:
import re

TV_STATION_RE = re.compile(r"""Local\s+TV:\s+    # TV token
                               (?P<stations>.*)  # Group everything following it lazily as stations
                               """, re.X)

Use that to pull them out and do some text wrangling.

In [None]:
def tv_stations_from_description(description):
    """Return a list of television stations embedded in the given description."""
    tv_stations = []
    result = re.search(TV_STATION_RE, str(description))
    if result:
        media_delimiter = "-----"
        tv_station_str = result.group("stations").split(media_delimiter)[0]
        tv_stations = tv_station_str.split("- ")
        tv_stations = [s.strip() for s in tv_stations]
    return tv_stations

Test it out on all of the descriptions.

In [126]:
tv_stations = set()
for d in schedule.DESCRIPTION:
    tv_stations |= set(tv_stations_from_description(d))
tv_stations    

{'CSN', 'ESPN2', 'MLB.TV', 'MLBN', 'NBC 10', 'TCN'}

Applying this function to the DataFrame yields a [`Series`](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#series) of all television stations on which the Phillies are broadcast this season.

In [127]:
stations_series = schedule.DESCRIPTION.apply(lambda d: tv_stations_from_description(d))
stations_series

START DATE
2016-03-07               [MLB.TV]
2016-03-08          [TCN, MLB.TV]
2016-03-09                     []
2016-03-09          [TCN, MLB.TV]
2016-03-10    [TCN, MLBN, MLB.TV]
2016-03-11                     []
2016-03-12          [CSN, MLB.TV]
2016-03-13               [MLB.TV]
2016-03-14                     []
2016-03-15                     []
2016-03-17          [TCN, MLB.TV]
2016-03-18          [TCN, MLB.TV]
2016-03-19               [MLB.TV]
2016-03-20          [CSN, MLB.TV]
2016-03-21                     []
2016-03-22          [TCN, MLB.TV]
2016-03-23                     []
2016-03-24               [MLB.TV]
2016-03-25          [CSN, MLB.TV]
2016-03-26          [CSN, MLB.TV]
2016-03-27               [MLB.TV]
2016-03-28                     []
2016-03-29    [TCN, MLBN, MLB.TV]
2016-03-30          [TCN, MLB.TV]
2016-03-31          [TCN, MLB.TV]
2016-04-01          [TCN, MLB.TV]
2016-04-02          [TCN, MLB.TV]
2016-04-04                  [CSN]
2016-04-06           [CSN, ESPN2]
201

Double check the `set` of stations from that `Series`.

In [129]:
set([station for stations in stations_series.values for station in stations])

{'CSN', 'ESPN2', 'MLB.TV', 'MLBN', 'NBC 10', 'TCN'}

The 190 Phillies games are broadcast on 6 television channels.  Unfortunately only 1 of those 6 stations are available without a cable television subscription.  This means that I can only watch games on NBC.

## The Phillies national television broadcast schedule

Filtering the `DESCRIPTION` column to national television broadcast stations yields only the games which I can watch over the air with my [HD antenna](http://amzn.to/1r5eZmQ).

In [117]:
national_broadcast_schedule = schedule[schedule.DESCRIPTION.str.contains("NBC 10") == True]
national_broadcast_schedule

Unnamed: 0_level_0,START TIME,SUBJECT,LOCATION,DESCRIPTION
START DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016-04-11,03:05 PM,Padres at Phillies,Citizens Bank Park - Philadelphia,Local TV: NBC 10
2016-06-03,07:05 PM,Brewers at Phillies,Citizens Bank Park - Philadelphia,Local TV: NBC 10
2016-06-10,07:05 PM,Phillies at Nationals,Nationals Park - Washington,Local TV: NBC 10
2016-06-17,07:05 PM,D-backs at Phillies,Citizens Bank Park - Philadelphia,Local TV: NBC 10
2016-06-23,01:10 PM,Phillies at Twins,Target Field - Minneapolis,Local TV: NBC 10
2016-07-15,07:05 PM,Mets at Phillies,Citizens Bank Park - Philadelphia,Local TV: NBC 10
2016-07-16,07:05 PM,Mets at Phillies,Citizens Bank Park - Philadelphia,Local TV: NBC 10
2016-07-22,07:05 PM,Phillies at Pirates,PNC Park - Pittsburgh,Local TV: NBC 10
2016-07-30,07:10 PM,Phillies at Braves,Turner Field - Atlanta,Local TV: NBC 10
2016-08-04,01:05 PM,Giants at Phillies,Citizens Bank Park - Philadelphia,Local TV: NBC 10


In [118]:
national_broadcast_schedule.describe()

Unnamed: 0,START TIME,SUBJECT,LOCATION,DESCRIPTION
count,10,10,10,10
unique,5,9,5,1
top,07:05 PM,Mets at Phillies,Citizens Bank Park - Philadelphia,Local TV: NBC 10
freq,6,2,6,10


This means that I have the possibility to watch 10 out of 190 Phillies games this season which is roughly 5%.