{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# *Designing and Creating Database*\n",
"\n",
"We will be working with a file of Major League Baseball games from [Retrosheet](http://www.retrosheet.org/). Retrosheet compiles detailed statistics on baseball games from the 1800s through to today. \n",
"\n",
"### Aim\n",
"\n",
"***Here, we'll create a Database of Major League Baseball games by compiling data from various sources.***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main file we will be working from `game_log.csv`, has been produced by combining 127 separate CSV files from retrosheet, and has been pre-cleaned to remove some inconsistencies. The game log has hundreds of data points on each game which we will normalize into several separate tables using SQL, providing a robust database of game-level statistics.\n",
"\n",
"In addition to the main file, we have also included three 'helper' files, also sourced from Retrosheet:\n",
"\n",
">park_codes.csv
\n",
"person_codes.csv
\n",
"team_codes.csv\n",
"\n",
"These three helper files in some cases contain extra data, but will also make things easier as they will form the basis for three of our normalized tables.\n",
"\n",
"Information regarding the columns of `game_log.csv` can be found in `game_log_fields.txt`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting to know Data\n",
"\n",
"- Using pandas, we will read in each of the four CSV files: \n",
" - game_log.csv\n",
" - park_codes.csv \n",
" - person_codes.csv\n",
" - team_codes.csv. \n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"pd.set_option('max_columns', 180)\n",
"pd.set_option('max_rows', 200000)\n",
"pd.set_option('max_colwidth', 5000)\n",
"\n",
"games = pd.read_csv('game_log.csv',low_memory=False)\n",
"parks = pd.read_csv('park_codes.csv')\n",
"persons = pd.read_csv('person_codes.csv')\n",
"teams = pd.read_csv('team_codes.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exploratory Analysis\n",
"\n",
"Now, we will do some exploratory analysis, to understand the Data and it's columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`games DataFrame`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(171907, 161)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" number_of_game | \n",
" day_of_week | \n",
" v_name | \n",
" v_league | \n",
" v_game_number | \n",
" h_name | \n",
" h_league | \n",
" h_game_number | \n",
" v_score | \n",
" h_score | \n",
" length_outs | \n",
" day_night | \n",
" completion | \n",
" forefeit | \n",
" protest | \n",
" park_id | \n",
" attendance | \n",
" length_minutes | \n",
" v_line_score | \n",
" h_line_score | \n",
" v_at_bats | \n",
" v_hits | \n",
" v_doubles | \n",
" v_triples | \n",
" v_homeruns | \n",
" v_rbi | \n",
" v_sacrifice_hits | \n",
" v_sacrifice_flies | \n",
" v_hit_by_pitch | \n",
" v_walks | \n",
" v_intentional_walks | \n",
" v_strikeouts | \n",
" v_stolen_bases | \n",
" v_caught_stealing | \n",
" v_grounded_into_double | \n",
" v_first_catcher_interference | \n",
" v_left_on_base | \n",
" v_pitchers_used | \n",
" v_individual_earned_runs | \n",
" v_team_earned_runs | \n",
" v_wild_pitches | \n",
" v_balks | \n",
" v_putouts | \n",
" v_assists | \n",
" v_errors | \n",
" v_passed_balls | \n",
" v_double_plays | \n",
" v_triple_plays | \n",
" h_at_bats | \n",
" h_hits | \n",
" h_doubles | \n",
" h_triples | \n",
" h_homeruns | \n",
" h_rbi | \n",
" h_sacrifice_hits | \n",
" h_sacrifice_flies | \n",
" h_hit_by_pitch | \n",
" h_walks | \n",
" h_intentional_walks | \n",
" h_strikeouts | \n",
" h_stolen_bases | \n",
" h_caught_stealing | \n",
" h_grounded_into_double | \n",
" h_first_catcher_interference | \n",
" h_left_on_base | \n",
" h_pitchers_used | \n",
" h_individual_earned_runs | \n",
" h_team_earned_runs | \n",
" h_wild_pitches | \n",
" h_balks | \n",
" h_putouts | \n",
" h_assists | \n",
" h_errors | \n",
" h_passed_balls | \n",
" h_double_plays | \n",
" h_triple_plays | \n",
" hp_umpire_id | \n",
" hp_umpire_name | \n",
" 1b_umpire_id | \n",
" 1b_umpire_name | \n",
" 2b_umpire_id | \n",
" 2b_umpire_name | \n",
" 3b_umpire_id | \n",
" 3b_umpire_name | \n",
" lf_umpire_id | \n",
" lf_umpire_name | \n",
" rf_umpire_id | \n",
" rf_umpire_name | \n",
" v_manager_id | \n",
" v_manager_name | \n",
" h_manager_id | \n",
" h_manager_name | \n",
" winning_pitcher_id | \n",
" winning_pitcher_name | \n",
" losing_pitcher_id | \n",
" losing_pitcher_name | \n",
" saving_pitcher_id | \n",
" saving_pitcher_name | \n",
" winning_rbi_batter_id | \n",
" winning_rbi_batter_id_name | \n",
" v_starting_pitcher_id | \n",
" v_starting_pitcher_name | \n",
" h_starting_pitcher_id | \n",
" h_starting_pitcher_name | \n",
" v_player_1_id | \n",
" v_player_1_name | \n",
" v_player_1_def_pos | \n",
" v_player_2_id | \n",
" v_player_2_name | \n",
" v_player_2_def_pos | \n",
" v_player_3_id | \n",
" v_player_3_name | \n",
" v_player_3_def_pos | \n",
" v_player_4_id | \n",
" v_player_4_name | \n",
" v_player_4_def_pos | \n",
" v_player_5_id | \n",
" v_player_5_name | \n",
" v_player_5_def_pos | \n",
" v_player_6_id | \n",
" v_player_6_name | \n",
" v_player_6_def_pos | \n",
" v_player_7_id | \n",
" v_player_7_name | \n",
" v_player_7_def_pos | \n",
" v_player_8_id | \n",
" v_player_8_name | \n",
" v_player_8_def_pos | \n",
" v_player_9_id | \n",
" v_player_9_name | \n",
" v_player_9_def_pos | \n",
" h_player_1_id | \n",
" h_player_1_name | \n",
" h_player_1_def_pos | \n",
" h_player_2_id | \n",
" h_player_2_name | \n",
" h_player_2_def_pos | \n",
" h_player_3_id | \n",
" h_player_3_name | \n",
" h_player_3_def_pos | \n",
" h_player_4_id | \n",
" h_player_4_name | \n",
" h_player_4_def_pos | \n",
" h_player_5_id | \n",
" h_player_5_name | \n",
" h_player_5_def_pos | \n",
" h_player_6_id | \n",
" h_player_6_name | \n",
" h_player_6_def_pos | \n",
" h_player_7_id | \n",
" h_player_7_name | \n",
" h_player_7_def_pos | \n",
" h_player_8_id | \n",
" h_player_8_name | \n",
" h_player_8_def_pos | \n",
" h_player_9_id | \n",
" h_player_9_name | \n",
" h_player_9_def_pos | \n",
" additional_info | \n",
" acquisition_info | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 18710504 | \n",
" 0 | \n",
" Thu | \n",
" CL1 | \n",
" NaN | \n",
" 1 | \n",
" FW1 | \n",
" NaN | \n",
" 1 | \n",
" 0 | \n",
" 2 | \n",
" 54.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" FOR01 | \n",
" 200.0 | \n",
" 120.0 | \n",
" 000000000 | \n",
" 010010000 | \n",
" 30.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" NaN | \n",
" 6.0 | \n",
" 1.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 4.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 9.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 31.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" NaN | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 3.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" boakj901 | \n",
" John Boake | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" lennb101 | \n",
" Bill Lennon | \n",
" mathb101 | \n",
" Bobby Mathews | \n",
" prata101 | \n",
" Al Pratt | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" prata101 | \n",
" Al Pratt | \n",
" mathb101 | \n",
" Bobby Mathews | \n",
" whitd102 | \n",
" Deacon White | \n",
" 2.0 | \n",
" kimbg101 | \n",
" Gene Kimball | \n",
" 4.0 | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" 7.0 | \n",
" allia101 | \n",
" Art Allison | \n",
" 8.0 | \n",
" white104 | \n",
" Elmer White | \n",
" 9.0 | \n",
" prata101 | \n",
" Al Pratt | \n",
" 1.0 | \n",
" sutte101 | \n",
" Ezra Sutton | \n",
" 5.0 | \n",
" carlj102 | \n",
" Jim Carleton | \n",
" 3.0 | \n",
" bassj101 | \n",
" John Bass | \n",
" 6.0 | \n",
" selmf101 | \n",
" Frank Sellman | \n",
" 5.0 | \n",
" mathb101 | \n",
" Bobby Mathews | \n",
" 1.0 | \n",
" foraj101 | \n",
" Jim Foran | \n",
" 3.0 | \n",
" goldw101 | \n",
" Wally Goldsmith | \n",
" 6.0 | \n",
" lennb101 | \n",
" Bill Lennon | \n",
" 2.0 | \n",
" caret101 | \n",
" Tom Carey | \n",
" 4.0 | \n",
" mince101 | \n",
" Ed Mincher | \n",
" 7.0 | \n",
" mcdej101 | \n",
" James McDermott | \n",
" 8.0 | \n",
" kellb105 | \n",
" Bill Kelly | \n",
" 9.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
" 1 | \n",
" 18710505 | \n",
" 0 | \n",
" Fri | \n",
" BS1 | \n",
" NaN | \n",
" 1 | \n",
" WS3 | \n",
" NaN | \n",
" 1 | \n",
" 20 | \n",
" 18 | \n",
" 54.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" WAS01 | \n",
" 5000.0 | \n",
" 145.0 | \n",
" 107000435 | \n",
" 640113030 | \n",
" 41.0 | \n",
" 13.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 13.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 18.0 | \n",
" NaN | \n",
" 5.0 | \n",
" 3.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 12.0 | \n",
" 1.0 | \n",
" 6.0 | \n",
" 6.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 13.0 | \n",
" 10.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 49.0 | \n",
" 14.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 11.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 10.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 1.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 14.0 | \n",
" 1.0 | \n",
" 7.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 20.0 | \n",
" 10.0 | \n",
" 2.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" dobsh901 | \n",
" Henry Dobson | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" wrigh101 | \n",
" Harry Wright | \n",
" younn801 | \n",
" Nick Young | \n",
" spala101 | \n",
" Al Spalding | \n",
" braia102 | \n",
" Asa Brainard | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" spala101 | \n",
" Al Spalding | \n",
" braia102 | \n",
" Asa Brainard | \n",
" wrigg101 | \n",
" George Wright | \n",
" 6.0 | \n",
" barnr102 | \n",
" Ross Barnes | \n",
" 4.0 | \n",
" birdd102 | \n",
" Dave Birdsall | \n",
" 9.0 | \n",
" mcvec101 | \n",
" Cal McVey | \n",
" 2.0 | \n",
" wrigh101 | \n",
" Harry Wright | \n",
" 8.0 | \n",
" goulc101 | \n",
" Charlie Gould | \n",
" 3.0 | \n",
" schah101 | \n",
" Harry Schafer | \n",
" 5.0 | \n",
" conef101 | \n",
" Fred Cone | \n",
" 7.0 | \n",
" spala101 | \n",
" Al Spalding | \n",
" 1.0 | \n",
" watef102 | \n",
" Fred Waterman | \n",
" 5.0 | \n",
" forcd101 | \n",
" Davy Force | \n",
" 6.0 | \n",
" mille105 | \n",
" Everett Mills | \n",
" 3.0 | \n",
" allid101 | \n",
" Doug Allison | \n",
" 2.0 | \n",
" hallg101 | \n",
" George Hall | \n",
" 7.0 | \n",
" leona101 | \n",
" Andy Leonard | \n",
" 4.0 | \n",
" braia102 | \n",
" Asa Brainard | \n",
" 1.0 | \n",
" burrh101 | \n",
" Henry Burroughs | \n",
" 9.0 | \n",
" berth101 | \n",
" Henry Berthrong | \n",
" 8.0 | \n",
" HTBF | \n",
" Y | \n",
"
\n",
" \n",
" 2 | \n",
" 18710506 | \n",
" 0 | \n",
" Sat | \n",
" CL1 | \n",
" NaN | \n",
" 2 | \n",
" RC1 | \n",
" NaN | \n",
" 1 | \n",
" 12 | \n",
" 4 | \n",
" 54.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" RCK01 | \n",
" 1000.0 | \n",
" 140.0 | \n",
" 610020003 | \n",
" 010020100 | \n",
" 49.0 | \n",
" 11.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 8.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" 1.0 | \n",
" 0.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 10.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 12.0 | \n",
" 8.0 | \n",
" 5.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 36.0 | \n",
" 7.0 | \n",
" 2.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" 3.0 | \n",
" 5.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 5.0 | \n",
" 1.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 12.0 | \n",
" 13.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" mawnj901 | \n",
" J.H. Manny | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" hasts101 | \n",
" Scott Hastings | \n",
" prata101 | \n",
" Al Pratt | \n",
" fishc102 | \n",
" Cherokee Fisher | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" prata101 | \n",
" Al Pratt | \n",
" fishc102 | \n",
" Cherokee Fisher | \n",
" whitd102 | \n",
" Deacon White | \n",
" 2.0 | \n",
" kimbg101 | \n",
" Gene Kimball | \n",
" 4.0 | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" 7.0 | \n",
" allia101 | \n",
" Art Allison | \n",
" 8.0 | \n",
" white104 | \n",
" Elmer White | \n",
" 9.0 | \n",
" prata101 | \n",
" Al Pratt | \n",
" 1.0 | \n",
" sutte101 | \n",
" Ezra Sutton | \n",
" 5.0 | \n",
" carlj102 | \n",
" Jim Carleton | \n",
" 3.0 | \n",
" bassj101 | \n",
" John Bass | \n",
" 6.0 | \n",
" mackd101 | \n",
" Denny Mack | \n",
" 3.0 | \n",
" addyb101 | \n",
" Bob Addy | \n",
" 4.0 | \n",
" fishc102 | \n",
" Cherokee Fisher | \n",
" 1.0 | \n",
" hasts101 | \n",
" Scott Hastings | \n",
" 8.0 | \n",
" ham-r101 | \n",
" Ralph Ham | \n",
" 5.0 | \n",
" ansoc101 | \n",
" Cap Anson | \n",
" 2.0 | \n",
" sagep101 | \n",
" Pony Sager | \n",
" 6.0 | \n",
" birdg101 | \n",
" George Bird | \n",
" 7.0 | \n",
" stirg101 | \n",
" Gat Stires | \n",
" 9.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
" 3 | \n",
" 18710508 | \n",
" 0 | \n",
" Mon | \n",
" CL1 | \n",
" NaN | \n",
" 3 | \n",
" CH1 | \n",
" NaN | \n",
" 1 | \n",
" 12 | \n",
" 14 | \n",
" 54.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" CHI01 | \n",
" 5000.0 | \n",
" 150.0 | \n",
" 101403111 | \n",
" 077000000 | \n",
" 46.0 | \n",
" 15.0 | \n",
" 2.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 10.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" 1.0 | \n",
" 0.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 7.0 | \n",
" 1.0 | \n",
" 6.0 | \n",
" 6.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 15.0 | \n",
" 11.0 | \n",
" 6.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 43.0 | \n",
" 11.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 8.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 4.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 1.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 6.0 | \n",
" 1.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 14.0 | \n",
" 7.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" willg901 | \n",
" Gardner Willard | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" woodj106 | \n",
" Jimmy Wood | \n",
" zettg101 | \n",
" George Zettlein | \n",
" prata101 | \n",
" Al Pratt | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" prata101 | \n",
" Al Pratt | \n",
" zettg101 | \n",
" George Zettlein | \n",
" whitd102 | \n",
" Deacon White | \n",
" 2.0 | \n",
" kimbg101 | \n",
" Gene Kimball | \n",
" 4.0 | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" 7.0 | \n",
" allia101 | \n",
" Art Allison | \n",
" 8.0 | \n",
" white104 | \n",
" Elmer White | \n",
" 9.0 | \n",
" prata101 | \n",
" Al Pratt | \n",
" 1.0 | \n",
" sutte101 | \n",
" Ezra Sutton | \n",
" 5.0 | \n",
" carlj102 | \n",
" Jim Carleton | \n",
" 3.0 | \n",
" bassj101 | \n",
" John Bass | \n",
" 6.0 | \n",
" mcatb101 | \n",
" Bub McAtee | \n",
" 3.0 | \n",
" kingm101 | \n",
" Marshall King | \n",
" 8.0 | \n",
" hodec101 | \n",
" Charlie Hodes | \n",
" 2.0 | \n",
" woodj106 | \n",
" Jimmy Wood | \n",
" 4.0 | \n",
" simmj101 | \n",
" Joe Simmons | \n",
" 9.0 | \n",
" folet101 | \n",
" Tom Foley | \n",
" 7.0 | \n",
" duffe101 | \n",
" Ed Duffy | \n",
" 6.0 | \n",
" pinke101 | \n",
" Ed Pinkham | \n",
" 5.0 | \n",
" zettg101 | \n",
" George Zettlein | \n",
" 1.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
" 4 | \n",
" 18710509 | \n",
" 0 | \n",
" Tue | \n",
" BS1 | \n",
" NaN | \n",
" 2 | \n",
" TRO | \n",
" NaN | \n",
" 1 | \n",
" 9 | \n",
" 5 | \n",
" 54.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" TRO01 | \n",
" 3250.0 | \n",
" 145.0 | \n",
" 000002232 | \n",
" 101003000 | \n",
" 46.0 | \n",
" 17.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 6.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 0.0 | \n",
" 1.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 12.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 12.0 | \n",
" 5.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 36.0 | \n",
" 9.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" NaN | \n",
" 0.0 | \n",
" 2.0 | \n",
" NaN | \n",
" -1.0 | \n",
" NaN | \n",
" 7.0 | \n",
" 1.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 11.0 | \n",
" 7.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" leroi901 | \n",
" Isaac Leroy | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" wrigh101 | \n",
" Harry Wright | \n",
" pikel101 | \n",
" Lip Pike | \n",
" spala101 | \n",
" Al Spalding | \n",
" mcmuj101 | \n",
" John McMullin | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" spala101 | \n",
" Al Spalding | \n",
" mcmuj101 | \n",
" John McMullin | \n",
" wrigg101 | \n",
" George Wright | \n",
" 6.0 | \n",
" barnr102 | \n",
" Ross Barnes | \n",
" 4.0 | \n",
" birdd102 | \n",
" Dave Birdsall | \n",
" 9.0 | \n",
" mcvec101 | \n",
" Cal McVey | \n",
" 2.0 | \n",
" wrigh101 | \n",
" Harry Wright | \n",
" 8.0 | \n",
" goulc101 | \n",
" Charlie Gould | \n",
" 3.0 | \n",
" schah101 | \n",
" Harry Schafer | \n",
" 5.0 | \n",
" conef101 | \n",
" Fred Cone | \n",
" 7.0 | \n",
" spala101 | \n",
" Al Spalding | \n",
" 1.0 | \n",
" flync101 | \n",
" Clipper Flynn | \n",
" 9.0 | \n",
" mcgem101 | \n",
" Mike McGeary | \n",
" 2.0 | \n",
" yorkt101 | \n",
" Tom York | \n",
" 8.0 | \n",
" mcmuj101 | \n",
" John McMullin | \n",
" 1.0 | \n",
" kings101 | \n",
" Steve King | \n",
" 7.0 | \n",
" beave101 | \n",
" Edward Beavens | \n",
" 4.0 | \n",
" bells101 | \n",
" Steve Bellan | \n",
" 5.0 | \n",
" pikel101 | \n",
" Lip Pike | \n",
" 3.0 | \n",
" cravb101 | \n",
" Bill Craver | \n",
" 6.0 | \n",
" HTBF | \n",
" Y | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date number_of_game day_of_week v_name v_league v_game_number h_name \\\n",
"0 18710504 0 Thu CL1 NaN 1 FW1 \n",
"1 18710505 0 Fri BS1 NaN 1 WS3 \n",
"2 18710506 0 Sat CL1 NaN 2 RC1 \n",
"3 18710508 0 Mon CL1 NaN 3 CH1 \n",
"4 18710509 0 Tue BS1 NaN 2 TRO \n",
"\n",
" h_league h_game_number v_score h_score length_outs day_night completion \\\n",
"0 NaN 1 0 2 54.0 D NaN \n",
"1 NaN 1 20 18 54.0 D NaN \n",
"2 NaN 1 12 4 54.0 D NaN \n",
"3 NaN 1 12 14 54.0 D NaN \n",
"4 NaN 1 9 5 54.0 D NaN \n",
"\n",
" forefeit protest park_id attendance length_minutes v_line_score \\\n",
"0 NaN NaN FOR01 200.0 120.0 000000000 \n",
"1 NaN NaN WAS01 5000.0 145.0 107000435 \n",
"2 NaN NaN RCK01 1000.0 140.0 610020003 \n",
"3 NaN NaN CHI01 5000.0 150.0 101403111 \n",
"4 NaN NaN TRO01 3250.0 145.0 000002232 \n",
"\n",
" h_line_score v_at_bats v_hits v_doubles v_triples v_homeruns v_rbi \\\n",
"0 010010000 30.0 4.0 1.0 0.0 0.0 0.0 \n",
"1 640113030 41.0 13.0 1.0 2.0 0.0 13.0 \n",
"2 010020100 49.0 11.0 1.0 1.0 0.0 8.0 \n",
"3 077000000 46.0 15.0 2.0 1.0 2.0 10.0 \n",
"4 101003000 46.0 17.0 4.0 1.0 0.0 6.0 \n",
"\n",
" v_sacrifice_hits v_sacrifice_flies v_hit_by_pitch v_walks \\\n",
"0 0.0 0.0 0.0 1.0 \n",
"1 0.0 0.0 0.0 18.0 \n",
"2 0.0 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 2.0 \n",
"\n",
" v_intentional_walks v_strikeouts v_stolen_bases v_caught_stealing \\\n",
"0 NaN 6.0 1.0 NaN \n",
"1 NaN 5.0 3.0 NaN \n",
"2 NaN 1.0 0.0 NaN \n",
"3 NaN 1.0 0.0 NaN \n",
"4 NaN 0.0 1.0 NaN \n",
"\n",
" v_grounded_into_double v_first_catcher_interference v_left_on_base \\\n",
"0 -1.0 NaN 4.0 \n",
"1 -1.0 NaN 12.0 \n",
"2 -1.0 NaN 10.0 \n",
"3 -1.0 NaN 7.0 \n",
"4 -1.0 NaN 12.0 \n",
"\n",
" v_pitchers_used v_individual_earned_runs v_team_earned_runs \\\n",
"0 1.0 1.0 1.0 \n",
"1 1.0 6.0 6.0 \n",
"2 1.0 0.0 0.0 \n",
"3 1.0 6.0 6.0 \n",
"4 1.0 2.0 2.0 \n",
"\n",
" v_wild_pitches v_balks v_putouts v_assists v_errors v_passed_balls \\\n",
"0 0.0 0.0 27.0 9.0 0.0 3.0 \n",
"1 1.0 0.0 27.0 13.0 10.0 1.0 \n",
"2 2.0 0.0 27.0 12.0 8.0 5.0 \n",
"3 0.0 0.0 27.0 15.0 11.0 6.0 \n",
"4 0.0 0.0 27.0 12.0 5.0 0.0 \n",
"\n",
" v_double_plays v_triple_plays h_at_bats h_hits h_doubles h_triples \\\n",
"0 0.0 0.0 31.0 4.0 1.0 0.0 \n",
"1 2.0 0.0 49.0 14.0 2.0 0.0 \n",
"2 0.0 0.0 36.0 7.0 2.0 1.0 \n",
"3 0.0 0.0 43.0 11.0 2.0 0.0 \n",
"4 1.0 0.0 36.0 9.0 0.0 0.0 \n",
"\n",
" h_homeruns h_rbi h_sacrifice_hits h_sacrifice_flies h_hit_by_pitch \\\n",
"0 0.0 2.0 0.0 0.0 0.0 \n",
"1 0.0 11.0 0.0 0.0 0.0 \n",
"2 0.0 2.0 0.0 0.0 0.0 \n",
"3 0.0 8.0 0.0 0.0 0.0 \n",
"4 0.0 2.0 0.0 0.0 0.0 \n",
"\n",
" h_walks h_intentional_walks h_strikeouts h_stolen_bases \\\n",
"0 1.0 NaN 0.0 0.0 \n",
"1 10.0 NaN 2.0 1.0 \n",
"2 0.0 NaN 3.0 5.0 \n",
"3 4.0 NaN 2.0 1.0 \n",
"4 3.0 NaN 0.0 2.0 \n",
"\n",
" h_caught_stealing h_grounded_into_double h_first_catcher_interference \\\n",
"0 NaN -1.0 NaN \n",
"1 NaN -1.0 NaN \n",
"2 NaN -1.0 NaN \n",
"3 NaN -1.0 NaN \n",
"4 NaN -1.0 NaN \n",
"\n",
" h_left_on_base h_pitchers_used h_individual_earned_runs \\\n",
"0 3.0 1.0 0.0 \n",
"1 14.0 1.0 7.0 \n",
"2 5.0 1.0 3.0 \n",
"3 6.0 1.0 4.0 \n",
"4 7.0 1.0 3.0 \n",
"\n",
" h_team_earned_runs h_wild_pitches h_balks h_putouts h_assists \\\n",
"0 0.0 0.0 0.0 27.0 3.0 \n",
"1 7.0 0.0 0.0 27.0 20.0 \n",
"2 3.0 1.0 0.0 27.0 12.0 \n",
"3 4.0 0.0 0.0 27.0 14.0 \n",
"4 3.0 1.0 0.0 27.0 11.0 \n",
"\n",
" h_errors h_passed_balls h_double_plays h_triple_plays hp_umpire_id \\\n",
"0 3.0 1.0 1.0 0.0 boakj901 \n",
"1 10.0 2.0 3.0 0.0 dobsh901 \n",
"2 13.0 3.0 0.0 0.0 mawnj901 \n",
"3 7.0 2.0 0.0 0.0 willg901 \n",
"4 7.0 3.0 0.0 0.0 leroi901 \n",
"\n",
" hp_umpire_name 1b_umpire_id 1b_umpire_name 2b_umpire_id 2b_umpire_name \\\n",
"0 John Boake NaN NaN NaN NaN \n",
"1 Henry Dobson NaN NaN NaN NaN \n",
"2 J.H. Manny NaN NaN NaN NaN \n",
"3 Gardner Willard NaN NaN NaN NaN \n",
"4 Isaac Leroy NaN NaN NaN NaN \n",
"\n",
" 3b_umpire_id 3b_umpire_name lf_umpire_id lf_umpire_name rf_umpire_id \\\n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN NaN \n",
"\n",
" rf_umpire_name v_manager_id v_manager_name h_manager_id h_manager_name \\\n",
"0 NaN paboc101 Charlie Pabor lennb101 Bill Lennon \n",
"1 NaN wrigh101 Harry Wright younn801 Nick Young \n",
"2 NaN paboc101 Charlie Pabor hasts101 Scott Hastings \n",
"3 NaN paboc101 Charlie Pabor woodj106 Jimmy Wood \n",
"4 NaN wrigh101 Harry Wright pikel101 Lip Pike \n",
"\n",
" winning_pitcher_id winning_pitcher_name losing_pitcher_id \\\n",
"0 mathb101 Bobby Mathews prata101 \n",
"1 spala101 Al Spalding braia102 \n",
"2 prata101 Al Pratt fishc102 \n",
"3 zettg101 George Zettlein prata101 \n",
"4 spala101 Al Spalding mcmuj101 \n",
"\n",
" losing_pitcher_name saving_pitcher_id saving_pitcher_name \\\n",
"0 Al Pratt NaN NaN \n",
"1 Asa Brainard NaN NaN \n",
"2 Cherokee Fisher NaN NaN \n",
"3 Al Pratt NaN NaN \n",
"4 John McMullin NaN NaN \n",
"\n",
" winning_rbi_batter_id winning_rbi_batter_id_name v_starting_pitcher_id \\\n",
"0 NaN NaN prata101 \n",
"1 NaN NaN spala101 \n",
"2 NaN NaN prata101 \n",
"3 NaN NaN prata101 \n",
"4 NaN NaN spala101 \n",
"\n",
" v_starting_pitcher_name h_starting_pitcher_id h_starting_pitcher_name \\\n",
"0 Al Pratt mathb101 Bobby Mathews \n",
"1 Al Spalding braia102 Asa Brainard \n",
"2 Al Pratt fishc102 Cherokee Fisher \n",
"3 Al Pratt zettg101 George Zettlein \n",
"4 Al Spalding mcmuj101 John McMullin \n",
"\n",
" v_player_1_id v_player_1_name v_player_1_def_pos v_player_2_id \\\n",
"0 whitd102 Deacon White 2.0 kimbg101 \n",
"1 wrigg101 George Wright 6.0 barnr102 \n",
"2 whitd102 Deacon White 2.0 kimbg101 \n",
"3 whitd102 Deacon White 2.0 kimbg101 \n",
"4 wrigg101 George Wright 6.0 barnr102 \n",
"\n",
" v_player_2_name v_player_2_def_pos v_player_3_id v_player_3_name \\\n",
"0 Gene Kimball 4.0 paboc101 Charlie Pabor \n",
"1 Ross Barnes 4.0 birdd102 Dave Birdsall \n",
"2 Gene Kimball 4.0 paboc101 Charlie Pabor \n",
"3 Gene Kimball 4.0 paboc101 Charlie Pabor \n",
"4 Ross Barnes 4.0 birdd102 Dave Birdsall \n",
"\n",
" v_player_3_def_pos v_player_4_id v_player_4_name v_player_4_def_pos \\\n",
"0 7.0 allia101 Art Allison 8.0 \n",
"1 9.0 mcvec101 Cal McVey 2.0 \n",
"2 7.0 allia101 Art Allison 8.0 \n",
"3 7.0 allia101 Art Allison 8.0 \n",
"4 9.0 mcvec101 Cal McVey 2.0 \n",
"\n",
" v_player_5_id v_player_5_name v_player_5_def_pos v_player_6_id \\\n",
"0 white104 Elmer White 9.0 prata101 \n",
"1 wrigh101 Harry Wright 8.0 goulc101 \n",
"2 white104 Elmer White 9.0 prata101 \n",
"3 white104 Elmer White 9.0 prata101 \n",
"4 wrigh101 Harry Wright 8.0 goulc101 \n",
"\n",
" v_player_6_name v_player_6_def_pos v_player_7_id v_player_7_name \\\n",
"0 Al Pratt 1.0 sutte101 Ezra Sutton \n",
"1 Charlie Gould 3.0 schah101 Harry Schafer \n",
"2 Al Pratt 1.0 sutte101 Ezra Sutton \n",
"3 Al Pratt 1.0 sutte101 Ezra Sutton \n",
"4 Charlie Gould 3.0 schah101 Harry Schafer \n",
"\n",
" v_player_7_def_pos v_player_8_id v_player_8_name v_player_8_def_pos \\\n",
"0 5.0 carlj102 Jim Carleton 3.0 \n",
"1 5.0 conef101 Fred Cone 7.0 \n",
"2 5.0 carlj102 Jim Carleton 3.0 \n",
"3 5.0 carlj102 Jim Carleton 3.0 \n",
"4 5.0 conef101 Fred Cone 7.0 \n",
"\n",
" v_player_9_id v_player_9_name v_player_9_def_pos h_player_1_id \\\n",
"0 bassj101 John Bass 6.0 selmf101 \n",
"1 spala101 Al Spalding 1.0 watef102 \n",
"2 bassj101 John Bass 6.0 mackd101 \n",
"3 bassj101 John Bass 6.0 mcatb101 \n",
"4 spala101 Al Spalding 1.0 flync101 \n",
"\n",
" h_player_1_name h_player_1_def_pos h_player_2_id h_player_2_name \\\n",
"0 Frank Sellman 5.0 mathb101 Bobby Mathews \n",
"1 Fred Waterman 5.0 forcd101 Davy Force \n",
"2 Denny Mack 3.0 addyb101 Bob Addy \n",
"3 Bub McAtee 3.0 kingm101 Marshall King \n",
"4 Clipper Flynn 9.0 mcgem101 Mike McGeary \n",
"\n",
" h_player_2_def_pos h_player_3_id h_player_3_name h_player_3_def_pos \\\n",
"0 1.0 foraj101 Jim Foran 3.0 \n",
"1 6.0 mille105 Everett Mills 3.0 \n",
"2 4.0 fishc102 Cherokee Fisher 1.0 \n",
"3 8.0 hodec101 Charlie Hodes 2.0 \n",
"4 2.0 yorkt101 Tom York 8.0 \n",
"\n",
" h_player_4_id h_player_4_name h_player_4_def_pos h_player_5_id \\\n",
"0 goldw101 Wally Goldsmith 6.0 lennb101 \n",
"1 allid101 Doug Allison 2.0 hallg101 \n",
"2 hasts101 Scott Hastings 8.0 ham-r101 \n",
"3 woodj106 Jimmy Wood 4.0 simmj101 \n",
"4 mcmuj101 John McMullin 1.0 kings101 \n",
"\n",
" h_player_5_name h_player_5_def_pos h_player_6_id h_player_6_name \\\n",
"0 Bill Lennon 2.0 caret101 Tom Carey \n",
"1 George Hall 7.0 leona101 Andy Leonard \n",
"2 Ralph Ham 5.0 ansoc101 Cap Anson \n",
"3 Joe Simmons 9.0 folet101 Tom Foley \n",
"4 Steve King 7.0 beave101 Edward Beavens \n",
"\n",
" h_player_6_def_pos h_player_7_id h_player_7_name h_player_7_def_pos \\\n",
"0 4.0 mince101 Ed Mincher 7.0 \n",
"1 4.0 braia102 Asa Brainard 1.0 \n",
"2 2.0 sagep101 Pony Sager 6.0 \n",
"3 7.0 duffe101 Ed Duffy 6.0 \n",
"4 4.0 bells101 Steve Bellan 5.0 \n",
"\n",
" h_player_8_id h_player_8_name h_player_8_def_pos h_player_9_id \\\n",
"0 mcdej101 James McDermott 8.0 kellb105 \n",
"1 burrh101 Henry Burroughs 9.0 berth101 \n",
"2 birdg101 George Bird 7.0 stirg101 \n",
"3 pinke101 Ed Pinkham 5.0 zettg101 \n",
"4 pikel101 Lip Pike 3.0 cravb101 \n",
"\n",
" h_player_9_name h_player_9_def_pos additional_info acquisition_info \n",
"0 Bill Kelly 9.0 NaN Y \n",
"1 Henry Berthrong 8.0 HTBF Y \n",
"2 Gat Stires 9.0 NaN Y \n",
"3 George Zettlein 1.0 NaN Y \n",
"4 Bill Craver 6.0 HTBF Y "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(games.shape)\n",
"games.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" number_of_game | \n",
" day_of_week | \n",
" v_name | \n",
" v_league | \n",
" v_game_number | \n",
" h_name | \n",
" h_league | \n",
" h_game_number | \n",
" v_score | \n",
" h_score | \n",
" length_outs | \n",
" day_night | \n",
" completion | \n",
" forefeit | \n",
" protest | \n",
" park_id | \n",
" attendance | \n",
" length_minutes | \n",
" v_line_score | \n",
" h_line_score | \n",
" v_at_bats | \n",
" v_hits | \n",
" v_doubles | \n",
" v_triples | \n",
" v_homeruns | \n",
" v_rbi | \n",
" v_sacrifice_hits | \n",
" v_sacrifice_flies | \n",
" v_hit_by_pitch | \n",
" v_walks | \n",
" v_intentional_walks | \n",
" v_strikeouts | \n",
" v_stolen_bases | \n",
" v_caught_stealing | \n",
" v_grounded_into_double | \n",
" v_first_catcher_interference | \n",
" v_left_on_base | \n",
" v_pitchers_used | \n",
" v_individual_earned_runs | \n",
" v_team_earned_runs | \n",
" v_wild_pitches | \n",
" v_balks | \n",
" v_putouts | \n",
" v_assists | \n",
" v_errors | \n",
" v_passed_balls | \n",
" v_double_plays | \n",
" v_triple_plays | \n",
" h_at_bats | \n",
" h_hits | \n",
" h_doubles | \n",
" h_triples | \n",
" h_homeruns | \n",
" h_rbi | \n",
" h_sacrifice_hits | \n",
" h_sacrifice_flies | \n",
" h_hit_by_pitch | \n",
" h_walks | \n",
" h_intentional_walks | \n",
" h_strikeouts | \n",
" h_stolen_bases | \n",
" h_caught_stealing | \n",
" h_grounded_into_double | \n",
" h_first_catcher_interference | \n",
" h_left_on_base | \n",
" h_pitchers_used | \n",
" h_individual_earned_runs | \n",
" h_team_earned_runs | \n",
" h_wild_pitches | \n",
" h_balks | \n",
" h_putouts | \n",
" h_assists | \n",
" h_errors | \n",
" h_passed_balls | \n",
" h_double_plays | \n",
" h_triple_plays | \n",
" hp_umpire_id | \n",
" hp_umpire_name | \n",
" 1b_umpire_id | \n",
" 1b_umpire_name | \n",
" 2b_umpire_id | \n",
" 2b_umpire_name | \n",
" 3b_umpire_id | \n",
" 3b_umpire_name | \n",
" lf_umpire_id | \n",
" lf_umpire_name | \n",
" rf_umpire_id | \n",
" rf_umpire_name | \n",
" v_manager_id | \n",
" v_manager_name | \n",
" h_manager_id | \n",
" h_manager_name | \n",
" winning_pitcher_id | \n",
" winning_pitcher_name | \n",
" losing_pitcher_id | \n",
" losing_pitcher_name | \n",
" saving_pitcher_id | \n",
" saving_pitcher_name | \n",
" winning_rbi_batter_id | \n",
" winning_rbi_batter_id_name | \n",
" v_starting_pitcher_id | \n",
" v_starting_pitcher_name | \n",
" h_starting_pitcher_id | \n",
" h_starting_pitcher_name | \n",
" v_player_1_id | \n",
" v_player_1_name | \n",
" v_player_1_def_pos | \n",
" v_player_2_id | \n",
" v_player_2_name | \n",
" v_player_2_def_pos | \n",
" v_player_3_id | \n",
" v_player_3_name | \n",
" v_player_3_def_pos | \n",
" v_player_4_id | \n",
" v_player_4_name | \n",
" v_player_4_def_pos | \n",
" v_player_5_id | \n",
" v_player_5_name | \n",
" v_player_5_def_pos | \n",
" v_player_6_id | \n",
" v_player_6_name | \n",
" v_player_6_def_pos | \n",
" v_player_7_id | \n",
" v_player_7_name | \n",
" v_player_7_def_pos | \n",
" v_player_8_id | \n",
" v_player_8_name | \n",
" v_player_8_def_pos | \n",
" v_player_9_id | \n",
" v_player_9_name | \n",
" v_player_9_def_pos | \n",
" h_player_1_id | \n",
" h_player_1_name | \n",
" h_player_1_def_pos | \n",
" h_player_2_id | \n",
" h_player_2_name | \n",
" h_player_2_def_pos | \n",
" h_player_3_id | \n",
" h_player_3_name | \n",
" h_player_3_def_pos | \n",
" h_player_4_id | \n",
" h_player_4_name | \n",
" h_player_4_def_pos | \n",
" h_player_5_id | \n",
" h_player_5_name | \n",
" h_player_5_def_pos | \n",
" h_player_6_id | \n",
" h_player_6_name | \n",
" h_player_6_def_pos | \n",
" h_player_7_id | \n",
" h_player_7_name | \n",
" h_player_7_def_pos | \n",
" h_player_8_id | \n",
" h_player_8_name | \n",
" h_player_8_def_pos | \n",
" h_player_9_id | \n",
" h_player_9_name | \n",
" h_player_9_def_pos | \n",
" additional_info | \n",
" acquisition_info | \n",
"
\n",
" \n",
" \n",
" \n",
" 171902 | \n",
" 20161002 | \n",
" 0 | \n",
" Sun | \n",
" MIL | \n",
" NL | \n",
" 162 | \n",
" COL | \n",
" NL | \n",
" 162 | \n",
" 6 | \n",
" 4 | \n",
" 60.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" DEN02 | \n",
" 27762.0 | \n",
" 203.0 | \n",
" 0200000202 | \n",
" 1100100010 | \n",
" 39.0 | \n",
" 10.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 6.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 12.0 | \n",
" 2.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 8.0 | \n",
" 7.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 30.0 | \n",
" 12.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 41.0 | \n",
" 13.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 11.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 12.0 | \n",
" 5.0 | \n",
" 6.0 | \n",
" 6.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 30.0 | \n",
" 13.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" barrs901 | \n",
" Scott Barry | \n",
" woodt901 | \n",
" Tom Woodring | \n",
" randt901 | \n",
" Tony Randazzo | \n",
" ortir901 | \n",
" Roberto Ortiz | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" counc001 | \n",
" Craig Counsell | \n",
" weisw001 | \n",
" Walt Weiss | \n",
" thort001 | \n",
" Tyler Thornburg | \n",
" rusic001 | \n",
" Chris Rusin | \n",
" knebc001 | \n",
" Corey Knebel | \n",
" susaa001 | \n",
" Andrew Susac | \n",
" cravt001 | \n",
" Tyler Cravy | \n",
" marqg001 | \n",
" German Marquez | \n",
" villj001 | \n",
" Jonathan Villar | \n",
" 5.0 | \n",
" genns001 | \n",
" Scooter Gennett | \n",
" 4.0 | \n",
" cartc002 | \n",
" Chris Carter | \n",
" 3.0 | \n",
" santd002 | \n",
" Domingo Santana | \n",
" 9.0 | \n",
" pereh001 | \n",
" Hernan Perez | \n",
" 8.0 | \n",
" arcio002 | \n",
" Orlando Arcia | \n",
" 6.0 | \n",
" susaa001 | \n",
" Andrew Susac | \n",
" 2.0 | \n",
" elmoj001 | \n",
" Jake Elmore | \n",
" 7.0 | \n",
" cravt001 | \n",
" Tyler Cravy | \n",
" 1.0 | \n",
" blacc001 | \n",
" Charlie Blackmon | \n",
" 8.0 | \n",
" dahld001 | \n",
" David Dahl | \n",
" 7.0 | \n",
" arenn001 | \n",
" Nolan Arenado | \n",
" 5.0 | \n",
" gonzc001 | \n",
" Carlos Gonzalez | \n",
" 9.0 | \n",
" murpt002 | \n",
" Tom Murphy | \n",
" 2.0 | \n",
" pattj005 | \n",
" Jordan Patterson | \n",
" 3.0 | \n",
" valap001 | \n",
" Pat Valaika | \n",
" 4.0 | \n",
" adamc001 | \n",
" Cristhian Adames | \n",
" 6.0 | \n",
" marqg001 | \n",
" German Marquez | \n",
" 1.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
" 171903 | \n",
" 20161002 | \n",
" 0 | \n",
" Sun | \n",
" NYN | \n",
" NL | \n",
" 162 | \n",
" PHI | \n",
" NL | \n",
" 162 | \n",
" 2 | \n",
" 5 | \n",
" 51.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" PHI13 | \n",
" 36935.0 | \n",
" 159.0 | \n",
" 000001100 | \n",
" 00100031x | \n",
" 33.0 | \n",
" 8.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 9.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 6.0 | \n",
" 6.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 24.0 | \n",
" 12.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 33.0 | \n",
" 10.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 7.0 | \n",
" 5.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" barkl901 | \n",
" Lance Barksdale | \n",
" herna901 | \n",
" Angel Hernandez | \n",
" barrt901 | \n",
" Ted Barrett | \n",
" littw901 | \n",
" Will Little | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" collt801 | \n",
" Terry Collins | \n",
" mackp101 | \n",
" Pete Mackanin | \n",
" murrc002 | \n",
" Colton Murray | \n",
" goede001 | \n",
" Erik Goeddel | \n",
" nerih001 | \n",
" Hector Neris | \n",
" hernc005 | \n",
" Cesar Hernandez | \n",
" ynoag001 | \n",
" Gabriel Ynoa | \n",
" eickj001 | \n",
" Jerad Eickhoff | \n",
" granc001 | \n",
" Curtis Granderson | \n",
" 8.0 | \n",
" cabra002 | \n",
" Asdrubal Cabrera | \n",
" 6.0 | \n",
" brucj001 | \n",
" Jay Bruce | \n",
" 9.0 | \n",
" dudal001 | \n",
" Lucas Duda | \n",
" 3.0 | \n",
" johnk003 | \n",
" Kelly Johnson | \n",
" 4.0 | \n",
" confm001 | \n",
" Michael Conforto | \n",
" 7.0 | \n",
" campe001 | \n",
" Eric Campbell | \n",
" 5.0 | \n",
" plawk001 | \n",
" Kevin Plawecki | \n",
" 2.0 | \n",
" ynoag001 | \n",
" Gabriel Ynoa | \n",
" 1.0 | \n",
" hernc005 | \n",
" Cesar Hernandez | \n",
" 4.0 | \n",
" parej002 | \n",
" Jimmy Paredes | \n",
" 7.0 | \n",
" herro001 | \n",
" Odubel Herrera | \n",
" 8.0 | \n",
" franm004 | \n",
" Maikel Franco | \n",
" 5.0 | \n",
" howar001 | \n",
" Ryan Howard | \n",
" 3.0 | \n",
" ruppc001 | \n",
" Cameron Rupp | \n",
" 2.0 | \n",
" blana001 | \n",
" Andres Blanco | \n",
" 6.0 | \n",
" altha001 | \n",
" Aaron Altherr | \n",
" 9.0 | \n",
" eickj001 | \n",
" Jerad Eickhoff | \n",
" 1.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
" 171904 | \n",
" 20161002 | \n",
" 0 | \n",
" Sun | \n",
" LAN | \n",
" NL | \n",
" 162 | \n",
" SFN | \n",
" NL | \n",
" 162 | \n",
" 1 | \n",
" 7 | \n",
" 51.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" SFO03 | \n",
" 41445.0 | \n",
" 184.0 | \n",
" 000100000 | \n",
" 23000002x | \n",
" 30.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 4.0 | \n",
" 7.0 | \n",
" 7.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 24.0 | \n",
" 5.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 39.0 | \n",
" 16.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 11.0 | \n",
" 2.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 12.0 | \n",
" 2.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" knigb901 | \n",
" Brian Knight | \n",
" westj901 | \n",
" Joe West | \n",
" fleta901 | \n",
" Andy Fletcher | \n",
" danlk901 | \n",
" Kerwin Danley | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" robed001 | \n",
" Dave Roberts | \n",
" bochb002 | \n",
" Bruce Bochy | \n",
" moorm003 | \n",
" Matt Moore | \n",
" maedk001 | \n",
" Kenta Maeda | \n",
" NaN | \n",
" NaN | \n",
" poseb001 | \n",
" Buster Posey | \n",
" maedk001 | \n",
" Kenta Maeda | \n",
" moorm003 | \n",
" Matt Moore | \n",
" kendh001 | \n",
" Howie Kendrick | \n",
" 7.0 | \n",
" turnj001 | \n",
" Justin Turner | \n",
" 5.0 | \n",
" seagc001 | \n",
" Corey Seager | \n",
" 6.0 | \n",
" puigy001 | \n",
" Yasiel Puig | \n",
" 9.0 | \n",
" gonza003 | \n",
" Adrian Gonzalez | \n",
" 3.0 | \n",
" grany001 | \n",
" Yasmani Grandal | \n",
" 2.0 | \n",
" pedej001 | \n",
" Joc Pederson | \n",
" 8.0 | \n",
" utlec001 | \n",
" Chase Utley | \n",
" 4.0 | \n",
" maedk001 | \n",
" Kenta Maeda | \n",
" 1.0 | \n",
" spand001 | \n",
" Denard Span | \n",
" 8.0 | \n",
" beltb001 | \n",
" Brandon Belt | \n",
" 3.0 | \n",
" poseb001 | \n",
" Buster Posey | \n",
" 2.0 | \n",
" pench001 | \n",
" Hunter Pence | \n",
" 9.0 | \n",
" crawb001 | \n",
" Brandon Crawford | \n",
" 6.0 | \n",
" pagaa001 | \n",
" Angel Pagan | \n",
" 7.0 | \n",
" panij002 | \n",
" Joe Panik | \n",
" 4.0 | \n",
" gillc001 | \n",
" Conor Gillaspie | \n",
" 5.0 | \n",
" moorm003 | \n",
" Matt Moore | \n",
" 1.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
" 171905 | \n",
" 20161002 | \n",
" 0 | \n",
" Sun | \n",
" PIT | \n",
" NL | \n",
" 162 | \n",
" SLN | \n",
" NL | \n",
" 162 | \n",
" 4 | \n",
" 10 | \n",
" 51.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" STL10 | \n",
" 44615.0 | \n",
" 192.0 | \n",
" 000020200 | \n",
" 00100360x | \n",
" 35.0 | \n",
" 9.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 11.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 8.0 | \n",
" 6.0 | \n",
" 8.0 | \n",
" 8.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 24.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 36.0 | \n",
" 12.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 10.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 5.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 8.0 | \n",
" 3.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" cuzzp901 | \n",
" Phil Cuzzi | \n",
" ticht901 | \n",
" Todd Tichenor | \n",
" vanol901 | \n",
" Larry Vanover | \n",
" marqa901 | \n",
" Alfonso Marquez | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" hurdc001 | \n",
" Clint Hurdle | \n",
" mathm001 | \n",
" Mike Matheny | \n",
" broxj001 | \n",
" Jonathan Broxton | \n",
" nicaj001 | \n",
" Juan Nicasio | \n",
" NaN | \n",
" NaN | \n",
" piscs001 | \n",
" Stephen Piscotty | \n",
" voger001 | \n",
" Ryan Vogelsong | \n",
" waina001 | \n",
" Adam Wainwright | \n",
" jasoj001 | \n",
" John Jaso | \n",
" 3.0 | \n",
" polag001 | \n",
" Gregory Polanco | \n",
" 9.0 | \n",
" mccua001 | \n",
" Andrew McCutchen | \n",
" 8.0 | \n",
" kangj001 | \n",
" Jung Ho Kang | \n",
" 5.0 | \n",
" joycm001 | \n",
" Matt Joyce | \n",
" 7.0 | \n",
" hansa001 | \n",
" Alen Hanson | \n",
" 4.0 | \n",
" fryee001 | \n",
" Eric Fryer | \n",
" 2.0 | \n",
" florp001 | \n",
" Pedro Florimon | \n",
" 6.0 | \n",
" voger001 | \n",
" Ryan Vogelsong | \n",
" 1.0 | \n",
" carpm002 | \n",
" Matt Carpenter | \n",
" 3.0 | \n",
" diaza003 | \n",
" Aledmys Diaz | \n",
" 6.0 | \n",
" moliy001 | \n",
" Yadier Molina | \n",
" 2.0 | \n",
" piscs001 | \n",
" Stephen Piscotty | \n",
" 9.0 | \n",
" peraj001 | \n",
" Jhonny Peralta | \n",
" 5.0 | \n",
" mossb001 | \n",
" Brandon Moss | \n",
" 7.0 | \n",
" gyorj001 | \n",
" Jedd Gyorko | \n",
" 4.0 | \n",
" gricr001 | \n",
" Randal Grichuk | \n",
" 8.0 | \n",
" waina001 | \n",
" Adam Wainwright | \n",
" 1.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
" 171906 | \n",
" 20161002 | \n",
" 0 | \n",
" Sun | \n",
" MIA | \n",
" NL | \n",
" 161 | \n",
" WAS | \n",
" NL | \n",
" 162 | \n",
" 7 | \n",
" 10 | \n",
" 51.0 | \n",
" D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" WAS11 | \n",
" 28730.0 | \n",
" 216.0 | \n",
" 000230020 | \n",
" 03023002x | \n",
" 38.0 | \n",
" 14.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 7.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 2.0 | \n",
" 10.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 8.0 | \n",
" 7.0 | \n",
" 10.0 | \n",
" 10.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 24.0 | \n",
" 11.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 30.0 | \n",
" 10.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 10.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 8.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 7.0 | \n",
" 6.0 | \n",
" 7.0 | \n",
" 7.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 11.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" tumpj901 | \n",
" John Tumpane | \n",
" porta901 | \n",
" Alan Porter | \n",
" onorb901 | \n",
" Brian O'Nora | \n",
" kellj901 | \n",
" Jeff Kellogg | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" mattd001 | \n",
" Don Mattingly | \n",
" baked002 | \n",
" Dusty Baker | \n",
" schem001 | \n",
" Max Scherzer | \n",
" brica001 | \n",
" Austin Brice | \n",
" melam001 | \n",
" Mark Melancon | \n",
" difow001 | \n",
" Wilmer Difo | \n",
" koeht001 | \n",
" Tom Koehler | \n",
" schem001 | \n",
" Max Scherzer | \n",
" gordd002 | \n",
" Dee Gordon | \n",
" 4.0 | \n",
" telit001 | \n",
" Tomas Telis | \n",
" 2.0 | \n",
" pradm001 | \n",
" Martin Prado | \n",
" 5.0 | \n",
" yelic001 | \n",
" Christian Yelich | \n",
" 8.0 | \n",
" bourj002 | \n",
" Justin Bour | \n",
" 3.0 | \n",
" scrux001 | \n",
" Xavier Scruggs | \n",
" 7.0 | \n",
" hoodd001 | \n",
" Destin Hood | \n",
" 9.0 | \n",
" hecha001 | \n",
" Adeiny Hechavarria | \n",
" 6.0 | \n",
" koeht001 | \n",
" Tom Koehler | \n",
" 1.0 | \n",
" turnt001 | \n",
" Trea Turner | \n",
" 8.0 | \n",
" reveb001 | \n",
" Ben Revere | \n",
" 7.0 | \n",
" harpb003 | \n",
" Bryce Harper | \n",
" 9.0 | \n",
" zimmr001 | \n",
" Ryan Zimmerman | \n",
" 3.0 | \n",
" drews001 | \n",
" Stephen Drew | \n",
" 5.0 | \n",
" difow001 | \n",
" Wilmer Difo | \n",
" 4.0 | \n",
" espid001 | \n",
" Danny Espinosa | \n",
" 6.0 | \n",
" lobaj001 | \n",
" Jose Lobaton | \n",
" 2.0 | \n",
" schem001 | \n",
" Max Scherzer | \n",
" 1.0 | \n",
" NaN | \n",
" Y | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date number_of_game day_of_week v_name v_league v_game_number \\\n",
"171902 20161002 0 Sun MIL NL 162 \n",
"171903 20161002 0 Sun NYN NL 162 \n",
"171904 20161002 0 Sun LAN NL 162 \n",
"171905 20161002 0 Sun PIT NL 162 \n",
"171906 20161002 0 Sun MIA NL 161 \n",
"\n",
" h_name h_league h_game_number v_score h_score length_outs \\\n",
"171902 COL NL 162 6 4 60.0 \n",
"171903 PHI NL 162 2 5 51.0 \n",
"171904 SFN NL 162 1 7 51.0 \n",
"171905 SLN NL 162 4 10 51.0 \n",
"171906 WAS NL 162 7 10 51.0 \n",
"\n",
" day_night completion forefeit protest park_id attendance \\\n",
"171902 D NaN NaN NaN DEN02 27762.0 \n",
"171903 D NaN NaN NaN PHI13 36935.0 \n",
"171904 D NaN NaN NaN SFO03 41445.0 \n",
"171905 D NaN NaN NaN STL10 44615.0 \n",
"171906 D NaN NaN NaN WAS11 28730.0 \n",
"\n",
" length_minutes v_line_score h_line_score v_at_bats v_hits \\\n",
"171902 203.0 0200000202 1100100010 39.0 10.0 \n",
"171903 159.0 000001100 00100031x 33.0 8.0 \n",
"171904 184.0 000100000 23000002x 30.0 4.0 \n",
"171905 192.0 000020200 00100360x 35.0 9.0 \n",
"171906 216.0 000230020 03023002x 38.0 14.0 \n",
"\n",
" v_doubles v_triples v_homeruns v_rbi v_sacrifice_hits \\\n",
"171902 4.0 1.0 2.0 6.0 0.0 \n",
"171903 3.0 0.0 0.0 2.0 0.0 \n",
"171904 0.0 0.0 0.0 1.0 0.0 \n",
"171905 0.0 0.0 1.0 4.0 0.0 \n",
"171906 1.0 1.0 2.0 7.0 1.0 \n",
"\n",
" v_sacrifice_flies v_hit_by_pitch v_walks v_intentional_walks \\\n",
"171902 0.0 1.0 4.0 0.0 \n",
"171903 0.0 0.0 2.0 0.0 \n",
"171904 0.0 0.0 2.0 0.0 \n",
"171905 0.0 0.0 4.0 0.0 \n",
"171906 0.0 0.0 3.0 2.0 \n",
"\n",
" v_strikeouts v_stolen_bases v_caught_stealing \\\n",
"171902 12.0 2.0 1.0 \n",
"171903 9.0 1.0 1.0 \n",
"171904 7.0 0.0 0.0 \n",
"171905 11.0 0.0 1.0 \n",
"171906 10.0 1.0 1.0 \n",
"\n",
" v_grounded_into_double v_first_catcher_interference v_left_on_base \\\n",
"171902 0.0 0.0 8.0 \n",
"171903 1.0 0.0 6.0 \n",
"171904 1.0 0.0 4.0 \n",
"171905 0.0 0.0 8.0 \n",
"171906 1.0 0.0 8.0 \n",
"\n",
" v_pitchers_used v_individual_earned_runs v_team_earned_runs \\\n",
"171902 7.0 4.0 4.0 \n",
"171903 6.0 3.0 3.0 \n",
"171904 7.0 7.0 7.0 \n",
"171905 6.0 8.0 8.0 \n",
"171906 7.0 10.0 10.0 \n",
"\n",
" v_wild_pitches v_balks v_putouts v_assists v_errors \\\n",
"171902 1.0 0.0 30.0 12.0 1.0 \n",
"171903 0.0 0.0 24.0 12.0 3.0 \n",
"171904 0.0 0.0 24.0 5.0 1.0 \n",
"171905 0.0 0.0 24.0 2.0 2.0 \n",
"171906 1.0 0.0 24.0 11.0 0.0 \n",
"\n",
" v_passed_balls v_double_plays v_triple_plays h_at_bats h_hits \\\n",
"171902 0.0 0.0 0.0 41.0 13.0 \n",
"171903 1.0 2.0 0.0 33.0 10.0 \n",
"171904 0.0 0.0 0.0 39.0 16.0 \n",
"171905 0.0 0.0 0.0 36.0 12.0 \n",
"171906 0.0 1.0 0.0 30.0 10.0 \n",
"\n",
" h_doubles h_triples h_homeruns h_rbi h_sacrifice_hits \\\n",
"171902 4.0 0.0 1.0 4.0 1.0 \n",
"171903 1.0 0.0 0.0 3.0 0.0 \n",
"171904 3.0 1.0 0.0 7.0 0.0 \n",
"171905 2.0 0.0 1.0 10.0 0.0 \n",
"171906 2.0 0.0 1.0 10.0 1.0 \n",
"\n",
" h_sacrifice_flies h_hit_by_pitch h_walks h_intentional_walks \\\n",
"171902 0.0 1.0 3.0 0.0 \n",
"171903 1.0 0.0 2.0 0.0 \n",
"171904 0.0 0.0 4.0 1.0 \n",
"171905 2.0 0.0 4.0 0.0 \n",
"171906 1.0 1.0 8.0 0.0 \n",
"\n",
" h_strikeouts h_stolen_bases h_caught_stealing \\\n",
"171902 11.0 0.0 1.0 \n",
"171903 3.0 0.0 0.0 \n",
"171904 11.0 2.0 1.0 \n",
"171905 5.0 0.0 0.0 \n",
"171906 3.0 2.0 0.0 \n",
"\n",
" h_grounded_into_double h_first_catcher_interference h_left_on_base \\\n",
"171902 0.0 0.0 12.0 \n",
"171903 2.0 0.0 7.0 \n",
"171904 0.0 0.0 12.0 \n",
"171905 0.0 0.0 8.0 \n",
"171906 1.0 0.0 7.0 \n",
"\n",
" h_pitchers_used h_individual_earned_runs h_team_earned_runs \\\n",
"171902 5.0 6.0 6.0 \n",
"171903 5.0 2.0 2.0 \n",
"171904 2.0 1.0 1.0 \n",
"171905 3.0 4.0 4.0 \n",
"171906 6.0 7.0 7.0 \n",
"\n",
" h_wild_pitches h_balks h_putouts h_assists h_errors \\\n",
"171902 0.0 0.0 30.0 13.0 0.0 \n",
"171903 0.0 0.0 27.0 7.0 0.0 \n",
"171904 0.0 0.0 27.0 7.0 0.0 \n",
"171905 0.0 0.0 27.0 7.0 0.0 \n",
"171906 1.0 0.0 27.0 11.0 0.0 \n",
"\n",
" h_passed_balls h_double_plays h_triple_plays hp_umpire_id \\\n",
"171902 0.0 0.0 0.0 barrs901 \n",
"171903 0.0 1.0 0.0 barkl901 \n",
"171904 0.0 1.0 0.0 knigb901 \n",
"171905 0.0 1.0 0.0 cuzzp901 \n",
"171906 0.0 1.0 0.0 tumpj901 \n",
"\n",
" hp_umpire_name 1b_umpire_id 1b_umpire_name 2b_umpire_id \\\n",
"171902 Scott Barry woodt901 Tom Woodring randt901 \n",
"171903 Lance Barksdale herna901 Angel Hernandez barrt901 \n",
"171904 Brian Knight westj901 Joe West fleta901 \n",
"171905 Phil Cuzzi ticht901 Todd Tichenor vanol901 \n",
"171906 John Tumpane porta901 Alan Porter onorb901 \n",
"\n",
" 2b_umpire_name 3b_umpire_id 3b_umpire_name lf_umpire_id \\\n",
"171902 Tony Randazzo ortir901 Roberto Ortiz NaN \n",
"171903 Ted Barrett littw901 Will Little NaN \n",
"171904 Andy Fletcher danlk901 Kerwin Danley NaN \n",
"171905 Larry Vanover marqa901 Alfonso Marquez NaN \n",
"171906 Brian O'Nora kellj901 Jeff Kellogg NaN \n",
"\n",
" lf_umpire_name rf_umpire_id rf_umpire_name v_manager_id \\\n",
"171902 NaN NaN NaN counc001 \n",
"171903 NaN NaN NaN collt801 \n",
"171904 NaN NaN NaN robed001 \n",
"171905 NaN NaN NaN hurdc001 \n",
"171906 NaN NaN NaN mattd001 \n",
"\n",
" v_manager_name h_manager_id h_manager_name winning_pitcher_id \\\n",
"171902 Craig Counsell weisw001 Walt Weiss thort001 \n",
"171903 Terry Collins mackp101 Pete Mackanin murrc002 \n",
"171904 Dave Roberts bochb002 Bruce Bochy moorm003 \n",
"171905 Clint Hurdle mathm001 Mike Matheny broxj001 \n",
"171906 Don Mattingly baked002 Dusty Baker schem001 \n",
"\n",
" winning_pitcher_name losing_pitcher_id losing_pitcher_name \\\n",
"171902 Tyler Thornburg rusic001 Chris Rusin \n",
"171903 Colton Murray goede001 Erik Goeddel \n",
"171904 Matt Moore maedk001 Kenta Maeda \n",
"171905 Jonathan Broxton nicaj001 Juan Nicasio \n",
"171906 Max Scherzer brica001 Austin Brice \n",
"\n",
" saving_pitcher_id saving_pitcher_name winning_rbi_batter_id \\\n",
"171902 knebc001 Corey Knebel susaa001 \n",
"171903 nerih001 Hector Neris hernc005 \n",
"171904 NaN NaN poseb001 \n",
"171905 NaN NaN piscs001 \n",
"171906 melam001 Mark Melancon difow001 \n",
"\n",
" winning_rbi_batter_id_name v_starting_pitcher_id \\\n",
"171902 Andrew Susac cravt001 \n",
"171903 Cesar Hernandez ynoag001 \n",
"171904 Buster Posey maedk001 \n",
"171905 Stephen Piscotty voger001 \n",
"171906 Wilmer Difo koeht001 \n",
"\n",
" v_starting_pitcher_name h_starting_pitcher_id h_starting_pitcher_name \\\n",
"171902 Tyler Cravy marqg001 German Marquez \n",
"171903 Gabriel Ynoa eickj001 Jerad Eickhoff \n",
"171904 Kenta Maeda moorm003 Matt Moore \n",
"171905 Ryan Vogelsong waina001 Adam Wainwright \n",
"171906 Tom Koehler schem001 Max Scherzer \n",
"\n",
" v_player_1_id v_player_1_name v_player_1_def_pos v_player_2_id \\\n",
"171902 villj001 Jonathan Villar 5.0 genns001 \n",
"171903 granc001 Curtis Granderson 8.0 cabra002 \n",
"171904 kendh001 Howie Kendrick 7.0 turnj001 \n",
"171905 jasoj001 John Jaso 3.0 polag001 \n",
"171906 gordd002 Dee Gordon 4.0 telit001 \n",
"\n",
" v_player_2_name v_player_2_def_pos v_player_3_id v_player_3_name \\\n",
"171902 Scooter Gennett 4.0 cartc002 Chris Carter \n",
"171903 Asdrubal Cabrera 6.0 brucj001 Jay Bruce \n",
"171904 Justin Turner 5.0 seagc001 Corey Seager \n",
"171905 Gregory Polanco 9.0 mccua001 Andrew McCutchen \n",
"171906 Tomas Telis 2.0 pradm001 Martin Prado \n",
"\n",
" v_player_3_def_pos v_player_4_id v_player_4_name \\\n",
"171902 3.0 santd002 Domingo Santana \n",
"171903 9.0 dudal001 Lucas Duda \n",
"171904 6.0 puigy001 Yasiel Puig \n",
"171905 8.0 kangj001 Jung Ho Kang \n",
"171906 5.0 yelic001 Christian Yelich \n",
"\n",
" v_player_4_def_pos v_player_5_id v_player_5_name v_player_5_def_pos \\\n",
"171902 9.0 pereh001 Hernan Perez 8.0 \n",
"171903 3.0 johnk003 Kelly Johnson 4.0 \n",
"171904 9.0 gonza003 Adrian Gonzalez 3.0 \n",
"171905 5.0 joycm001 Matt Joyce 7.0 \n",
"171906 8.0 bourj002 Justin Bour 3.0 \n",
"\n",
" v_player_6_id v_player_6_name v_player_6_def_pos v_player_7_id \\\n",
"171902 arcio002 Orlando Arcia 6.0 susaa001 \n",
"171903 confm001 Michael Conforto 7.0 campe001 \n",
"171904 grany001 Yasmani Grandal 2.0 pedej001 \n",
"171905 hansa001 Alen Hanson 4.0 fryee001 \n",
"171906 scrux001 Xavier Scruggs 7.0 hoodd001 \n",
"\n",
" v_player_7_name v_player_7_def_pos v_player_8_id v_player_8_name \\\n",
"171902 Andrew Susac 2.0 elmoj001 Jake Elmore \n",
"171903 Eric Campbell 5.0 plawk001 Kevin Plawecki \n",
"171904 Joc Pederson 8.0 utlec001 Chase Utley \n",
"171905 Eric Fryer 2.0 florp001 Pedro Florimon \n",
"171906 Destin Hood 9.0 hecha001 Adeiny Hechavarria \n",
"\n",
" v_player_8_def_pos v_player_9_id v_player_9_name v_player_9_def_pos \\\n",
"171902 7.0 cravt001 Tyler Cravy 1.0 \n",
"171903 2.0 ynoag001 Gabriel Ynoa 1.0 \n",
"171904 4.0 maedk001 Kenta Maeda 1.0 \n",
"171905 6.0 voger001 Ryan Vogelsong 1.0 \n",
"171906 6.0 koeht001 Tom Koehler 1.0 \n",
"\n",
" h_player_1_id h_player_1_name h_player_1_def_pos h_player_2_id \\\n",
"171902 blacc001 Charlie Blackmon 8.0 dahld001 \n",
"171903 hernc005 Cesar Hernandez 4.0 parej002 \n",
"171904 spand001 Denard Span 8.0 beltb001 \n",
"171905 carpm002 Matt Carpenter 3.0 diaza003 \n",
"171906 turnt001 Trea Turner 8.0 reveb001 \n",
"\n",
" h_player_2_name h_player_2_def_pos h_player_3_id h_player_3_name \\\n",
"171902 David Dahl 7.0 arenn001 Nolan Arenado \n",
"171903 Jimmy Paredes 7.0 herro001 Odubel Herrera \n",
"171904 Brandon Belt 3.0 poseb001 Buster Posey \n",
"171905 Aledmys Diaz 6.0 moliy001 Yadier Molina \n",
"171906 Ben Revere 7.0 harpb003 Bryce Harper \n",
"\n",
" h_player_3_def_pos h_player_4_id h_player_4_name \\\n",
"171902 5.0 gonzc001 Carlos Gonzalez \n",
"171903 8.0 franm004 Maikel Franco \n",
"171904 2.0 pench001 Hunter Pence \n",
"171905 2.0 piscs001 Stephen Piscotty \n",
"171906 9.0 zimmr001 Ryan Zimmerman \n",
"\n",
" h_player_4_def_pos h_player_5_id h_player_5_name \\\n",
"171902 9.0 murpt002 Tom Murphy \n",
"171903 5.0 howar001 Ryan Howard \n",
"171904 9.0 crawb001 Brandon Crawford \n",
"171905 9.0 peraj001 Jhonny Peralta \n",
"171906 3.0 drews001 Stephen Drew \n",
"\n",
" h_player_5_def_pos h_player_6_id h_player_6_name \\\n",
"171902 2.0 pattj005 Jordan Patterson \n",
"171903 3.0 ruppc001 Cameron Rupp \n",
"171904 6.0 pagaa001 Angel Pagan \n",
"171905 5.0 mossb001 Brandon Moss \n",
"171906 5.0 difow001 Wilmer Difo \n",
"\n",
" h_player_6_def_pos h_player_7_id h_player_7_name h_player_7_def_pos \\\n",
"171902 3.0 valap001 Pat Valaika 4.0 \n",
"171903 2.0 blana001 Andres Blanco 6.0 \n",
"171904 7.0 panij002 Joe Panik 4.0 \n",
"171905 7.0 gyorj001 Jedd Gyorko 4.0 \n",
"171906 4.0 espid001 Danny Espinosa 6.0 \n",
"\n",
" h_player_8_id h_player_8_name h_player_8_def_pos h_player_9_id \\\n",
"171902 adamc001 Cristhian Adames 6.0 marqg001 \n",
"171903 altha001 Aaron Altherr 9.0 eickj001 \n",
"171904 gillc001 Conor Gillaspie 5.0 moorm003 \n",
"171905 gricr001 Randal Grichuk 8.0 waina001 \n",
"171906 lobaj001 Jose Lobaton 2.0 schem001 \n",
"\n",
" h_player_9_name h_player_9_def_pos additional_info acquisition_info \n",
"171902 German Marquez 1.0 NaN Y \n",
"171903 Jerad Eickhoff 1.0 NaN Y \n",
"171904 Matt Moore 1.0 NaN Y \n",
"171905 Adam Wainwright 1.0 NaN Y \n",
"171906 Max Scherzer 1.0 NaN Y "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks like the game log has a record of over 170,000 games. It looks like these games are chronologically ordered and occur between 1871 and 2016.\n",
"\n",
"For each game we have:\n",
"\n",
"- general information on the game\n",
"- team level stats for each team\n",
"- a list of players from each team, numbered, with their defensive positions\n",
"- the umpires that officiated the game\n",
"- some 'awards', like winning and losing pitcher\n",
"\n",
"We have a \"game_log_fields.txt\" file that tell us that the player number corresponds with the order in which they batted.\n",
"\n",
"It's worth noting that there is no natural primary key column for this table."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`parks DataFrame`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(252, 9)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" park_id | \n",
" name | \n",
" aka | \n",
" city | \n",
" state | \n",
" start | \n",
" end | \n",
" league | \n",
" notes | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ALB01 | \n",
" Riverside Park | \n",
" NaN | \n",
" Albany | \n",
" NY | \n",
" 09/11/1880 | \n",
" 05/30/1882 | \n",
" NL | \n",
" TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882 | \n",
"
\n",
" \n",
" 1 | \n",
" ALT01 | \n",
" Columbia Park | \n",
" NaN | \n",
" Altoona | \n",
" PA | \n",
" 04/30/1884 | \n",
" 05/31/1884 | \n",
" UA | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" ANA01 | \n",
" Angel Stadium of Anaheim | \n",
" Edison Field; Anaheim Stadium | \n",
" Anaheim | \n",
" CA | \n",
" 04/19/1966 | \n",
" NaN | \n",
" AL | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" ARL01 | \n",
" Arlington Stadium | \n",
" NaN | \n",
" Arlington | \n",
" TX | \n",
" 04/21/1972 | \n",
" 10/03/1993 | \n",
" AL | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" ARL02 | \n",
" Rangers Ballpark in Arlington | \n",
" The Ballpark in Arlington; Ameriquest Fl | \n",
" Arlington | \n",
" TX | \n",
" 04/11/1994 | \n",
" NaN | \n",
" AL | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" park_id name \\\n",
"0 ALB01 Riverside Park \n",
"1 ALT01 Columbia Park \n",
"2 ANA01 Angel Stadium of Anaheim \n",
"3 ARL01 Arlington Stadium \n",
"4 ARL02 Rangers Ballpark in Arlington \n",
"\n",
" aka city state start \\\n",
"0 NaN Albany NY 09/11/1880 \n",
"1 NaN Altoona PA 04/30/1884 \n",
"2 Edison Field; Anaheim Stadium Anaheim CA 04/19/1966 \n",
"3 NaN Arlington TX 04/21/1972 \n",
"4 The Ballpark in Arlington; Ameriquest Fl Arlington TX 04/11/1994 \n",
"\n",
" end league notes \n",
"0 05/30/1882 NL TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882 \n",
"1 05/31/1884 UA NaN \n",
"2 NaN AL NaN \n",
"3 10/03/1993 AL NaN \n",
"4 NaN AL NaN "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(parks.shape)\n",
"parks.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This seems to be a list of all baseball parks. There are IDs which seem to match with the game log, as well as names, nicknames, city and league."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`persons DataFrame`"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(20494, 7)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" last | \n",
" first | \n",
" player_debut | \n",
" mgr_debut | \n",
" coach_debut | \n",
" ump_debut | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aardd001 | \n",
" Aardsma | \n",
" David | \n",
" 04/06/2004 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" aaroh101 | \n",
" Aaron | \n",
" Hank | \n",
" 04/13/1954 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" aarot101 | \n",
" Aaron | \n",
" Tommie | \n",
" 04/10/1962 | \n",
" NaN | \n",
" 04/06/1979 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" aased001 | \n",
" Aase | \n",
" Don | \n",
" 07/26/1977 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" abada001 | \n",
" Abad | \n",
" Andy | \n",
" 09/10/2001 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id last first player_debut mgr_debut coach_debut ump_debut\n",
"0 aardd001 Aardsma David 04/06/2004 NaN NaN NaN\n",
"1 aaroh101 Aaron Hank 04/13/1954 NaN NaN NaN\n",
"2 aarot101 Aaron Tommie 04/10/1962 NaN 04/06/1979 NaN\n",
"3 aased001 Aase Don 07/26/1977 NaN NaN NaN\n",
"4 abada001 Abad Andy 09/10/2001 NaN NaN NaN"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(persons.shape)\n",
"persons.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This seems to be a list of people with IDs. The IDs look like they match up with those used in the game log. There are debut dates, for players, managers, coaches and umpires. We can see that some people might have been one or more of these roles.\n",
"\n",
"It also looks like coaches and managers are two different things in baseball. After some research, managers are what would be called a 'coach' or 'head coach' in other sports, and coaches are more specialized, like base coaches. It also seems like coaches aren't recorded in the game log."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`teams DataFrame`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(150, 8)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" team_id | \n",
" league | \n",
" start | \n",
" end | \n",
" city | \n",
" nickname | \n",
" franch_id | \n",
" seq | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ALT | \n",
" UA | \n",
" 1884 | \n",
" 1884 | \n",
" Altoona | \n",
" Mountain Cities | \n",
" ALT | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" ARI | \n",
" NL | \n",
" 1998 | \n",
" 0 | \n",
" Arizona | \n",
" Diamondbacks | \n",
" ARI | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" BFN | \n",
" NL | \n",
" 1879 | \n",
" 1885 | \n",
" Buffalo | \n",
" Bisons | \n",
" BFN | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" BFP | \n",
" PL | \n",
" 1890 | \n",
" 1890 | \n",
" Buffalo | \n",
" Bisons | \n",
" BFP | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" BL1 | \n",
" NaN | \n",
" 1872 | \n",
" 1874 | \n",
" Baltimore | \n",
" Canaries | \n",
" BL1 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" team_id league start end city nickname franch_id seq\n",
"0 ALT UA 1884 1884 Altoona Mountain Cities ALT 1\n",
"1 ARI NL 1998 0 Arizona Diamondbacks ARI 1\n",
"2 BFN NL 1879 1885 Buffalo Bisons BFN 1\n",
"3 BFP PL 1890 1890 Buffalo Bisons BFP 1\n",
"4 BL1 NaN 1872 1874 Baltimore Canaries BL1 1"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(teams.shape)\n",
"teams.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"This seems to be a list of all teams, with team_ids which seem to match the game log. Interestingly, there is a franch_id, let's take a look at this:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BS1 4\n",
"MLA 3\n",
"PHA 3\n",
"SE1 3\n",
"TRN 3\n",
"LAA 3\n",
"BR3 3\n",
"WS2 2\n",
"CN2 2\n",
"SL2 2\n",
"PT1 2\n",
"BLA 2\n",
"IND 2\n",
"NY2 2\n",
"PH1 2\n",
"BL2 2\n",
"WS9 2\n",
"CH2 2\n",
"SLU 2\n",
"CL3 2\n",
"HR1 2\n",
"FLO 2\n",
"LS2 2\n",
"SL4 2\n",
"MON 2\n",
"BSP 2\n",
"WS1 2\n",
"PRO 1\n",
"HOU 1\n",
"CL2 1\n",
"CH1 1\n",
"WOR 1\n",
"NY4 1\n",
"KCF 1\n",
"CHP 1\n",
"PH4 1\n",
"CN3 1\n",
"CHA 1\n",
"PH2 1\n",
"NYP 1\n",
"CNU 1\n",
"ARI 1\n",
"BLF 1\n",
"RC1 1\n",
"CN1 1\n",
"KEO 1\n",
"IN3 1\n",
"KCN 1\n",
"TOR 1\n",
"TL1 1\n",
"PTF 1\n",
"BR4 1\n",
"BFP 1\n",
"WSU 1\n",
"BR1 1\n",
"PH3 1\n",
"CLP 1\n",
"BSU 1\n",
"BRP 1\n",
"NH1 1\n",
"CL1 1\n",
"KCU 1\n",
"CL6 1\n",
"BLU 1\n",
"SLF 1\n",
"WIL 1\n",
"TRO 1\n",
"FW1 1\n",
"RC2 1\n",
"WS8 1\n",
"PHP 1\n",
"WS4 1\n",
"CHU 1\n",
"IN1 1\n",
"BUF 1\n",
"WS3 1\n",
"ELI 1\n",
"SR2 1\n",
"ALT 1\n",
"RIC 1\n",
"PTP 1\n",
"PTU 1\n",
"TBA 1\n",
"BL4 1\n",
"ML3 1\n",
"WS5 1\n",
"SPU 1\n",
"TL2 1\n",
"MLU 1\n",
"SL1 1\n",
"DTN 1\n",
"CHF 1\n",
"DET 1\n",
"BOS 1\n",
"CLE 1\n",
"LS1 1\n",
"PHI 1\n",
"BFN 1\n",
"BRF 1\n",
"IN2 1\n",
"MID 1\n",
"SEA 1\n",
"NYN 1\n",
"BR2 1\n",
"SR1 1\n",
"CL5 1\n",
"SDN 1\n",
"COL 1\n",
"WS6 1\n",
"BL1 1\n",
"KC2 1\n",
"WS7 1\n",
"PHU 1\n",
"ML1 1\n",
"KCA 1\n",
"Name: franch_id, dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"teams['franch_id'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We might have franch_id occurring a few times for some teams, let's look at the first one in more detail."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" team_id | \n",
" league | \n",
" start | \n",
" end | \n",
" city | \n",
" nickname | \n",
" franch_id | \n",
" seq | \n",
"
\n",
" \n",
" \n",
" \n",
" 21 | \n",
" BS1 | \n",
" NaN | \n",
" 1871 | \n",
" 1875 | \n",
" Boston | \n",
" Braves | \n",
" BS1 | \n",
" 1 | \n",
"
\n",
" \n",
" 22 | \n",
" BSN | \n",
" NL | \n",
" 1876 | \n",
" 1952 | \n",
" Boston | \n",
" Braves | \n",
" BS1 | \n",
" 2 | \n",
"
\n",
" \n",
" 23 | \n",
" MLN | \n",
" NL | \n",
" 1953 | \n",
" 1965 | \n",
" Milwaukee | \n",
" Braves | \n",
" BS1 | \n",
" 3 | \n",
"
\n",
" \n",
" 24 | \n",
" ATL | \n",
" NL | \n",
" 1966 | \n",
" 0 | \n",
" Atlanta | \n",
" Braves | \n",
" BS1 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" team_id league start end city nickname franch_id seq\n",
"21 BS1 NaN 1871 1875 Boston Braves BS1 1\n",
"22 BSN NL 1876 1952 Boston Braves BS1 2\n",
"23 MLN NL 1953 1965 Milwaukee Braves BS1 3\n",
"24 ATL NL 1966 0 Atlanta Braves BS1 4"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"teams[teams['franch_id'] == 'BS1']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It appears that teams move between leagues and cities. The team_id changes when this happens, franch_id (which is probably 'Franchise') helps us tie all of this together.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Defensive Positions\n",
"\n",
"In the game log, each player has a defensive position listed, which seems to be a number between 1-10. Doing some research around this, I found [this article](http://probaseballinsider.com/baseball-instruction/baseball-basics/baseball-basics-positions/) which gives us a list of names for each numbered position:\n",
"\n",
"> 1. Pitcher\n",
"2. Catcher\n",
"3. 1st Base\n",
"4. 2nd Base\n",
"5. 3rd Base\n",
"6. Shortstop\n",
"7. Left Field\n",
"8. Center Field\n",
"9. Right Field\n",
"\n",
"The 10th position isn't included, it may be a way of describing a designated hitter that does not field. I can find a retrosheet page that indicates that position 0 is used for this, but we don't have any position 0 in our data. I have chosen to make this an 'Unknown Position' so I'm not including data based on a hunch."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Leagues\n",
"\n",
"Wikipedia tells us there are currently two leagues - the American (AL) and National (NL). Let's start by finding out what leagues are listed in the main game log:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NL 88867\n",
"AL 74712\n",
"AA 5039\n",
"FL 1243\n",
"NaN 1086\n",
"PL 532\n",
"UA 428\n",
"Name: h_league, dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games['h_league'].value_counts(dropna=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks like most of our games fall into the two current leagues, but that there are four other leagues. Let's write a quick function to get some info on the years of these leagues:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"nan appered from nan to nan\n",
"NL appered from 1876 to 2016\n",
"AA appered from 1882 to 1891\n",
"UA appered from 1884 to 1884\n",
"PL appered from 1890 to 1890\n",
"AL appered from 1901 to 2016\n",
"FL appered from 1914 to 1915\n"
]
}
],
"source": [
"import datetime as dt\n",
"import numpy as np\n",
"\n",
"def league_info(l):\n",
" league_games = games[games['h_league'] == l]\n",
" first_appr = league_games['date'].min()\n",
" last_appr = league_games['date'].max()\n",
" try:\n",
" first_appr = dt.datetime.strptime(str(first_appr), \"%Y%m%d\").strftime(\"%Y\")\n",
" last_appr = dt.datetime.strptime(str(last_appr), \"%Y%m%d\").strftime(\"%Y\")\n",
" except ValueError:\n",
" pass\n",
" print(\"{} appered from {} to {}\".format(l, first_appr, last_appr))\n",
" \n",
"for i in games['h_league'].unique():\n",
" league_info(i)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Now we have some years which will help us do some research. After some googling we come up with:\n",
"\n",
"- NL: [National League](https://en.wikipedia.org/wiki/National_League)\n",
"- AL: [American League](https://en.wikipedia.org/wiki/American_League)\n",
"- AA: [American Association](https://en.wikipedia.org/wiki/American_Association_%2819th_century%29)\n",
"- FL: [Federal League](https://en.wikipedia.org/wiki/Federal_League)\n",
"- PL: [Players League](https://en.wikipedia.org/wiki/Players%27_League)\n",
"- UA: [Union Association](https://en.wikipedia.org/wiki/Union_Association)\n",
"\n",
"It also looks like we have about 1000 games where the home team doesn't have a value for league."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Importing Data into SQLite\n",
"\n",
"- We will create two functions to interact with database more easily:\n",
" - *run_command()*\n",
" - *run_query()*"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"import sqlite3\n",
"db = 'mlb.db'\n",
"\n",
"def run_query(q):\n",
" with sqlite3.connect(db) as conn:\n",
" return pd.read_sql(q, conn)\n",
" \n",
"def run_command(c):\n",
" with sqlite3.connect(db) as conn:\n",
" conn.isolation_level = None\n",
" conn.execute(c)\n",
" \n",
"def show_tables():\n",
" q = '''\n",
" SELECT \n",
" name,\n",
" type \n",
" FROM sqlite_master\n",
" WHERE type IN ('table', 'view') \n",
" '''\n",
" return run_query(q) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- We'll use `DataFrame.to_sql()` to create tables for each of our dataframes in a new SQLite database, `mlb.db`"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"table_list = {\n",
" \"game_log\" : games,\n",
" \"park_codes\" : parks,\n",
" \"person_codes\" : persons,\n",
" \"team_codes\" : teams\n",
"}\n",
"\n",
"with sqlite3.connect(db) as conn:\n",
" for key, val in table_list.items():\n",
" conn.execute('DROP TABLE IF EXISTS {};'.format(key))\n",
" val.to_sql(key, conn, index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Since `game_log` DataFrame doesn't have a unique column. We will create a new column `game_id` in it's corresponding table, which will act as a primary key, "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" game_id | \n",
" h_name | \n",
" date | \n",
" number_of_game | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" FW1187105040 | \n",
" FW1 | \n",
" 18710504 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" WS3187105050 | \n",
" WS3 | \n",
" 18710505 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" RC1187105060 | \n",
" RC1 | \n",
" 18710506 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" CH1187105080 | \n",
" CH1 | \n",
" 18710508 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" TRO187105090 | \n",
" TRO | \n",
" 18710509 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" game_id h_name date number_of_game\n",
"0 FW1187105040 FW1 18710504 0\n",
"1 WS3187105050 WS3 18710505 0\n",
"2 RC1187105060 RC1 18710506 0\n",
"3 CH1187105080 CH1 18710508 0\n",
"4 TRO187105090 TRO 18710509 0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = '''\n",
"ALTER TABLE game_log\n",
"add game_id TEXT;\n",
"'''\n",
"\n",
"try:\n",
" run_command(c1)\n",
"except:\n",
" pass\n",
"\n",
"c2 = '''\n",
"UPDATE game_log\n",
"SET game_id = h_name || date || number_of_game\n",
"WHERE game_id IS NULL;\n",
"'''\n",
"\n",
"run_command(c2)\n",
"\n",
"q1 = '''\n",
"SELECT \n",
" game_id,\n",
" h_name,\n",
" date,\n",
" number_of_game\n",
"FROM game_log\n",
"LIMIT 5;\n",
"'''\n",
"\n",
"run_query(q1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Viewing Tables in the Database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`game_log table`"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" number_of_game | \n",
" day_of_week | \n",
" v_name | \n",
" v_league | \n",
" v_game_number | \n",
" h_name | \n",
" h_league | \n",
" h_game_number | \n",
" v_score | \n",
" h_score | \n",
" length_outs | \n",
" day_night | \n",
" completion | \n",
" forefeit | \n",
" protest | \n",
" park_id | \n",
" attendance | \n",
" length_minutes | \n",
" v_line_score | \n",
" h_line_score | \n",
" v_at_bats | \n",
" v_hits | \n",
" v_doubles | \n",
" v_triples | \n",
" v_homeruns | \n",
" v_rbi | \n",
" v_sacrifice_hits | \n",
" v_sacrifice_flies | \n",
" v_hit_by_pitch | \n",
" v_walks | \n",
" v_intentional_walks | \n",
" v_strikeouts | \n",
" v_stolen_bases | \n",
" v_caught_stealing | \n",
" v_grounded_into_double | \n",
" v_first_catcher_interference | \n",
" v_left_on_base | \n",
" v_pitchers_used | \n",
" v_individual_earned_runs | \n",
" v_team_earned_runs | \n",
" v_wild_pitches | \n",
" v_balks | \n",
" v_putouts | \n",
" v_assists | \n",
" v_errors | \n",
" v_passed_balls | \n",
" v_double_plays | \n",
" v_triple_plays | \n",
" h_at_bats | \n",
" h_hits | \n",
" h_doubles | \n",
" h_triples | \n",
" h_homeruns | \n",
" h_rbi | \n",
" h_sacrifice_hits | \n",
" h_sacrifice_flies | \n",
" h_hit_by_pitch | \n",
" h_walks | \n",
" h_intentional_walks | \n",
" h_strikeouts | \n",
" h_stolen_bases | \n",
" h_caught_stealing | \n",
" h_grounded_into_double | \n",
" h_first_catcher_interference | \n",
" h_left_on_base | \n",
" h_pitchers_used | \n",
" h_individual_earned_runs | \n",
" h_team_earned_runs | \n",
" h_wild_pitches | \n",
" h_balks | \n",
" h_putouts | \n",
" h_assists | \n",
" h_errors | \n",
" h_passed_balls | \n",
" h_double_plays | \n",
" h_triple_plays | \n",
" hp_umpire_id | \n",
" hp_umpire_name | \n",
" 1b_umpire_id | \n",
" 1b_umpire_name | \n",
" 2b_umpire_id | \n",
" 2b_umpire_name | \n",
" 3b_umpire_id | \n",
" 3b_umpire_name | \n",
" lf_umpire_id | \n",
" lf_umpire_name | \n",
" rf_umpire_id | \n",
" rf_umpire_name | \n",
" v_manager_id | \n",
" v_manager_name | \n",
" h_manager_id | \n",
" h_manager_name | \n",
" winning_pitcher_id | \n",
" winning_pitcher_name | \n",
" losing_pitcher_id | \n",
" losing_pitcher_name | \n",
" saving_pitcher_id | \n",
" saving_pitcher_name | \n",
" winning_rbi_batter_id | \n",
" winning_rbi_batter_id_name | \n",
" v_starting_pitcher_id | \n",
" v_starting_pitcher_name | \n",
" h_starting_pitcher_id | \n",
" h_starting_pitcher_name | \n",
" v_player_1_id | \n",
" v_player_1_name | \n",
" v_player_1_def_pos | \n",
" v_player_2_id | \n",
" v_player_2_name | \n",
" v_player_2_def_pos | \n",
" v_player_3_id | \n",
" v_player_3_name | \n",
" v_player_3_def_pos | \n",
" v_player_4_id | \n",
" v_player_4_name | \n",
" v_player_4_def_pos | \n",
" v_player_5_id | \n",
" v_player_5_name | \n",
" v_player_5_def_pos | \n",
" v_player_6_id | \n",
" v_player_6_name | \n",
" v_player_6_def_pos | \n",
" v_player_7_id | \n",
" v_player_7_name | \n",
" v_player_7_def_pos | \n",
" v_player_8_id | \n",
" v_player_8_name | \n",
" v_player_8_def_pos | \n",
" v_player_9_id | \n",
" v_player_9_name | \n",
" v_player_9_def_pos | \n",
" h_player_1_id | \n",
" h_player_1_name | \n",
" h_player_1_def_pos | \n",
" h_player_2_id | \n",
" h_player_2_name | \n",
" h_player_2_def_pos | \n",
" h_player_3_id | \n",
" h_player_3_name | \n",
" h_player_3_def_pos | \n",
" h_player_4_id | \n",
" h_player_4_name | \n",
" h_player_4_def_pos | \n",
" h_player_5_id | \n",
" h_player_5_name | \n",
" h_player_5_def_pos | \n",
" h_player_6_id | \n",
" h_player_6_name | \n",
" h_player_6_def_pos | \n",
" h_player_7_id | \n",
" h_player_7_name | \n",
" h_player_7_def_pos | \n",
" h_player_8_id | \n",
" h_player_8_name | \n",
" h_player_8_def_pos | \n",
" h_player_9_id | \n",
" h_player_9_name | \n",
" h_player_9_def_pos | \n",
" additional_info | \n",
" acquisition_info | \n",
" game_id | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 18710504 | \n",
" 0 | \n",
" Thu | \n",
" CL1 | \n",
" None | \n",
" 1 | \n",
" FW1 | \n",
" None | \n",
" 1 | \n",
" 0 | \n",
" 2 | \n",
" 54.0 | \n",
" D | \n",
" None | \n",
" None | \n",
" None | \n",
" FOR01 | \n",
" 200.0 | \n",
" 120.0 | \n",
" 000000000 | \n",
" 010010000 | \n",
" 30.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" None | \n",
" 6.0 | \n",
" 1.0 | \n",
" None | \n",
" -1.0 | \n",
" None | \n",
" 4.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 9.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 31.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" None | \n",
" 0.0 | \n",
" 0.0 | \n",
" None | \n",
" -1.0 | \n",
" None | \n",
" 3.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" boakj901 | \n",
" John Boake | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" lennb101 | \n",
" Bill Lennon | \n",
" mathb101 | \n",
" Bobby Mathews | \n",
" prata101 | \n",
" Al Pratt | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" prata101 | \n",
" Al Pratt | \n",
" mathb101 | \n",
" Bobby Mathews | \n",
" whitd102 | \n",
" Deacon White | \n",
" 2.0 | \n",
" kimbg101 | \n",
" Gene Kimball | \n",
" 4.0 | \n",
" paboc101 | \n",
" Charlie Pabor | \n",
" 7.0 | \n",
" allia101 | \n",
" Art Allison | \n",
" 8.0 | \n",
" white104 | \n",
" Elmer White | \n",
" 9.0 | \n",
" prata101 | \n",
" Al Pratt | \n",
" 1.0 | \n",
" sutte101 | \n",
" Ezra Sutton | \n",
" 5.0 | \n",
" carlj102 | \n",
" Jim Carleton | \n",
" 3.0 | \n",
" bassj101 | \n",
" John Bass | \n",
" 6.0 | \n",
" selmf101 | \n",
" Frank Sellman | \n",
" 5.0 | \n",
" mathb101 | \n",
" Bobby Mathews | \n",
" 1.0 | \n",
" foraj101 | \n",
" Jim Foran | \n",
" 3.0 | \n",
" goldw101 | \n",
" Wally Goldsmith | \n",
" 6.0 | \n",
" lennb101 | \n",
" Bill Lennon | \n",
" 2.0 | \n",
" caret101 | \n",
" Tom Carey | \n",
" 4.0 | \n",
" mince101 | \n",
" Ed Mincher | \n",
" 7.0 | \n",
" mcdej101 | \n",
" James McDermott | \n",
" 8.0 | \n",
" kellb105 | \n",
" Bill Kelly | \n",
" 9.0 | \n",
" None | \n",
" Y | \n",
" FW1187105040 | \n",
"
\n",
" \n",
" 1 | \n",
" 18710505 | \n",
" 0 | \n",
" Fri | \n",
" BS1 | \n",
" None | \n",
" 1 | \n",
" WS3 | \n",
" None | \n",
" 1 | \n",
" 20 | \n",
" 18 | \n",
" 54.0 | \n",
" D | \n",
" None | \n",
" None | \n",
" None | \n",
" WAS01 | \n",
" 5000.0 | \n",
" 145.0 | \n",
" 107000435 | \n",
" 640113030 | \n",
" 41.0 | \n",
" 13.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 13.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 18.0 | \n",
" None | \n",
" 5.0 | \n",
" 3.0 | \n",
" None | \n",
" -1.0 | \n",
" None | \n",
" 12.0 | \n",
" 1.0 | \n",
" 6.0 | \n",
" 6.0 | \n",
" 1.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 13.0 | \n",
" 10.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 49.0 | \n",
" 14.0 | \n",
" 2.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 11.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 10.0 | \n",
" None | \n",
" 2.0 | \n",
" 1.0 | \n",
" None | \n",
" -1.0 | \n",
" None | \n",
" 14.0 | \n",
" 1.0 | \n",
" 7.0 | \n",
" 7.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.0 | \n",
" 20.0 | \n",
" 10.0 | \n",
" 2.0 | \n",
" 3.0 | \n",
" 0.0 | \n",
" dobsh901 | \n",
" Henry Dobson | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" wrigh101 | \n",
" Harry Wright | \n",
" younn801 | \n",
" Nick Young | \n",
" spala101 | \n",
" Al Spalding | \n",
" braia102 | \n",
" Asa Brainard | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" spala101 | \n",
" Al Spalding | \n",
" braia102 | \n",
" Asa Brainard | \n",
" wrigg101 | \n",
" George Wright | \n",
" 6.0 | \n",
" barnr102 | \n",
" Ross Barnes | \n",
" 4.0 | \n",
" birdd102 | \n",
" Dave Birdsall | \n",
" 9.0 | \n",
" mcvec101 | \n",
" Cal McVey | \n",
" 2.0 | \n",
" wrigh101 | \n",
" Harry Wright | \n",
" 8.0 | \n",
" goulc101 | \n",
" Charlie Gould | \n",
" 3.0 | \n",
" schah101 | \n",
" Harry Schafer | \n",
" 5.0 | \n",
" conef101 | \n",
" Fred Cone | \n",
" 7.0 | \n",
" spala101 | \n",
" Al Spalding | \n",
" 1.0 | \n",
" watef102 | \n",
" Fred Waterman | \n",
" 5.0 | \n",
" forcd101 | \n",
" Davy Force | \n",
" 6.0 | \n",
" mille105 | \n",
" Everett Mills | \n",
" 3.0 | \n",
" allid101 | \n",
" Doug Allison | \n",
" 2.0 | \n",
" hallg101 | \n",
" George Hall | \n",
" 7.0 | \n",
" leona101 | \n",
" Andy Leonard | \n",
" 4.0 | \n",
" braia102 | \n",
" Asa Brainard | \n",
" 1.0 | \n",
" burrh101 | \n",
" Henry Burroughs | \n",
" 9.0 | \n",
" berth101 | \n",
" Henry Berthrong | \n",
" 8.0 | \n",
" HTBF | \n",
" Y | \n",
" WS3187105050 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date number_of_game day_of_week v_name v_league v_game_number h_name \\\n",
"0 18710504 0 Thu CL1 None 1 FW1 \n",
"1 18710505 0 Fri BS1 None 1 WS3 \n",
"\n",
" h_league h_game_number v_score h_score length_outs day_night completion \\\n",
"0 None 1 0 2 54.0 D None \n",
"1 None 1 20 18 54.0 D None \n",
"\n",
" forefeit protest park_id attendance length_minutes v_line_score \\\n",
"0 None None FOR01 200.0 120.0 000000000 \n",
"1 None None WAS01 5000.0 145.0 107000435 \n",
"\n",
" h_line_score v_at_bats v_hits v_doubles v_triples v_homeruns v_rbi \\\n",
"0 010010000 30.0 4.0 1.0 0.0 0.0 0.0 \n",
"1 640113030 41.0 13.0 1.0 2.0 0.0 13.0 \n",
"\n",
" v_sacrifice_hits v_sacrifice_flies v_hit_by_pitch v_walks \\\n",
"0 0.0 0.0 0.0 1.0 \n",
"1 0.0 0.0 0.0 18.0 \n",
"\n",
" v_intentional_walks v_strikeouts v_stolen_bases v_caught_stealing \\\n",
"0 None 6.0 1.0 None \n",
"1 None 5.0 3.0 None \n",
"\n",
" v_grounded_into_double v_first_catcher_interference v_left_on_base \\\n",
"0 -1.0 None 4.0 \n",
"1 -1.0 None 12.0 \n",
"\n",
" v_pitchers_used v_individual_earned_runs v_team_earned_runs \\\n",
"0 1.0 1.0 1.0 \n",
"1 1.0 6.0 6.0 \n",
"\n",
" v_wild_pitches v_balks v_putouts v_assists v_errors v_passed_balls \\\n",
"0 0.0 0.0 27.0 9.0 0.0 3.0 \n",
"1 1.0 0.0 27.0 13.0 10.0 1.0 \n",
"\n",
" v_double_plays v_triple_plays h_at_bats h_hits h_doubles h_triples \\\n",
"0 0.0 0.0 31.0 4.0 1.0 0.0 \n",
"1 2.0 0.0 49.0 14.0 2.0 0.0 \n",
"\n",
" h_homeruns h_rbi h_sacrifice_hits h_sacrifice_flies h_hit_by_pitch \\\n",
"0 0.0 2.0 0.0 0.0 0.0 \n",
"1 0.0 11.0 0.0 0.0 0.0 \n",
"\n",
" h_walks h_intentional_walks h_strikeouts h_stolen_bases \\\n",
"0 1.0 None 0.0 0.0 \n",
"1 10.0 None 2.0 1.0 \n",
"\n",
" h_caught_stealing h_grounded_into_double h_first_catcher_interference \\\n",
"0 None -1.0 None \n",
"1 None -1.0 None \n",
"\n",
" h_left_on_base h_pitchers_used h_individual_earned_runs \\\n",
"0 3.0 1.0 0.0 \n",
"1 14.0 1.0 7.0 \n",
"\n",
" h_team_earned_runs h_wild_pitches h_balks h_putouts h_assists \\\n",
"0 0.0 0.0 0.0 27.0 3.0 \n",
"1 7.0 0.0 0.0 27.0 20.0 \n",
"\n",
" h_errors h_passed_balls h_double_plays h_triple_plays hp_umpire_id \\\n",
"0 3.0 1.0 1.0 0.0 boakj901 \n",
"1 10.0 2.0 3.0 0.0 dobsh901 \n",
"\n",
" hp_umpire_name 1b_umpire_id 1b_umpire_name 2b_umpire_id 2b_umpire_name \\\n",
"0 John Boake None None None None \n",
"1 Henry Dobson None None None None \n",
"\n",
" 3b_umpire_id 3b_umpire_name lf_umpire_id lf_umpire_name rf_umpire_id \\\n",
"0 None None None None None \n",
"1 None None None None None \n",
"\n",
" rf_umpire_name v_manager_id v_manager_name h_manager_id h_manager_name \\\n",
"0 None paboc101 Charlie Pabor lennb101 Bill Lennon \n",
"1 None wrigh101 Harry Wright younn801 Nick Young \n",
"\n",
" winning_pitcher_id winning_pitcher_name losing_pitcher_id \\\n",
"0 mathb101 Bobby Mathews prata101 \n",
"1 spala101 Al Spalding braia102 \n",
"\n",
" losing_pitcher_name saving_pitcher_id saving_pitcher_name \\\n",
"0 Al Pratt None None \n",
"1 Asa Brainard None None \n",
"\n",
" winning_rbi_batter_id winning_rbi_batter_id_name v_starting_pitcher_id \\\n",
"0 None None prata101 \n",
"1 None None spala101 \n",
"\n",
" v_starting_pitcher_name h_starting_pitcher_id h_starting_pitcher_name \\\n",
"0 Al Pratt mathb101 Bobby Mathews \n",
"1 Al Spalding braia102 Asa Brainard \n",
"\n",
" v_player_1_id v_player_1_name v_player_1_def_pos v_player_2_id \\\n",
"0 whitd102 Deacon White 2.0 kimbg101 \n",
"1 wrigg101 George Wright 6.0 barnr102 \n",
"\n",
" v_player_2_name v_player_2_def_pos v_player_3_id v_player_3_name \\\n",
"0 Gene Kimball 4.0 paboc101 Charlie Pabor \n",
"1 Ross Barnes 4.0 birdd102 Dave Birdsall \n",
"\n",
" v_player_3_def_pos v_player_4_id v_player_4_name v_player_4_def_pos \\\n",
"0 7.0 allia101 Art Allison 8.0 \n",
"1 9.0 mcvec101 Cal McVey 2.0 \n",
"\n",
" v_player_5_id v_player_5_name v_player_5_def_pos v_player_6_id \\\n",
"0 white104 Elmer White 9.0 prata101 \n",
"1 wrigh101 Harry Wright 8.0 goulc101 \n",
"\n",
" v_player_6_name v_player_6_def_pos v_player_7_id v_player_7_name \\\n",
"0 Al Pratt 1.0 sutte101 Ezra Sutton \n",
"1 Charlie Gould 3.0 schah101 Harry Schafer \n",
"\n",
" v_player_7_def_pos v_player_8_id v_player_8_name v_player_8_def_pos \\\n",
"0 5.0 carlj102 Jim Carleton 3.0 \n",
"1 5.0 conef101 Fred Cone 7.0 \n",
"\n",
" v_player_9_id v_player_9_name v_player_9_def_pos h_player_1_id \\\n",
"0 bassj101 John Bass 6.0 selmf101 \n",
"1 spala101 Al Spalding 1.0 watef102 \n",
"\n",
" h_player_1_name h_player_1_def_pos h_player_2_id h_player_2_name \\\n",
"0 Frank Sellman 5.0 mathb101 Bobby Mathews \n",
"1 Fred Waterman 5.0 forcd101 Davy Force \n",
"\n",
" h_player_2_def_pos h_player_3_id h_player_3_name h_player_3_def_pos \\\n",
"0 1.0 foraj101 Jim Foran 3.0 \n",
"1 6.0 mille105 Everett Mills 3.0 \n",
"\n",
" h_player_4_id h_player_4_name h_player_4_def_pos h_player_5_id \\\n",
"0 goldw101 Wally Goldsmith 6.0 lennb101 \n",
"1 allid101 Doug Allison 2.0 hallg101 \n",
"\n",
" h_player_5_name h_player_5_def_pos h_player_6_id h_player_6_name \\\n",
"0 Bill Lennon 2.0 caret101 Tom Carey \n",
"1 George Hall 7.0 leona101 Andy Leonard \n",
"\n",
" h_player_6_def_pos h_player_7_id h_player_7_name h_player_7_def_pos \\\n",
"0 4.0 mince101 Ed Mincher 7.0 \n",
"1 4.0 braia102 Asa Brainard 1.0 \n",
"\n",
" h_player_8_id h_player_8_name h_player_8_def_pos h_player_9_id \\\n",
"0 mcdej101 James McDermott 8.0 kellb105 \n",
"1 burrh101 Henry Burroughs 9.0 berth101 \n",
"\n",
" h_player_9_name h_player_9_def_pos additional_info acquisition_info \\\n",
"0 Bill Kelly 9.0 None Y \n",
"1 Henry Berthrong 8.0 HTBF Y \n",
"\n",
" game_id \n",
"0 FW1187105040 \n",
"1 WS3187105050 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"run_query(\"select * from game_log limit 2\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`park_codes table`"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" park_id | \n",
" name | \n",
" aka | \n",
" city | \n",
" state | \n",
" start | \n",
" end | \n",
" league | \n",
" notes | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ALB01 | \n",
" Riverside Park | \n",
" None | \n",
" Albany | \n",
" NY | \n",
" 09/11/1880 | \n",
" 05/30/1882 | \n",
" NL | \n",
" TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882 | \n",
"
\n",
" \n",
" 1 | \n",
" ALT01 | \n",
" Columbia Park | \n",
" None | \n",
" Altoona | \n",
" PA | \n",
" 04/30/1884 | \n",
" 05/31/1884 | \n",
" UA | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" park_id name aka city state start end league \\\n",
"0 ALB01 Riverside Park None Albany NY 09/11/1880 05/30/1882 NL \n",
"1 ALT01 Columbia Park None Altoona PA 04/30/1884 05/31/1884 UA \n",
"\n",
" notes \n",
"0 TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882 \n",
"1 None "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"run_query(\"select * from park_codes limit 2\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`person_codes table`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" last | \n",
" first | \n",
" player_debut | \n",
" mgr_debut | \n",
" coach_debut | \n",
" ump_debut | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aardd001 | \n",
" Aardsma | \n",
" David | \n",
" 04/06/2004 | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" aaroh101 | \n",
" Aaron | \n",
" Hank | \n",
" 04/13/1954 | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id last first player_debut mgr_debut coach_debut ump_debut\n",
"0 aardd001 Aardsma David 04/06/2004 None None None\n",
"1 aaroh101 Aaron Hank 04/13/1954 None None None"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"run_query(\"select * from person_codes limit 2\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`team_codes table`"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" team_id | \n",
" league | \n",
" start | \n",
" end | \n",
" city | \n",
" nickname | \n",
" franch_id | \n",
" seq | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ALT | \n",
" UA | \n",
" 1884 | \n",
" 1884 | \n",
" Altoona | \n",
" Mountain Cities | \n",
" ALT | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" ARI | \n",
" NL | \n",
" 1998 | \n",
" 0 | \n",
" Arizona | \n",
" Diamondbacks | \n",
" ARI | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" team_id league start end city nickname franch_id seq\n",
"0 ALT UA 1884 1884 Altoona Mountain Cities ALT 1\n",
"1 ARI NL 1998 0 Arizona Diamondbacks ARI 1"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"run_query(\"select * from team_codes limit 2\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting information about Columns in game_log table"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cid | \n",
" name | \n",
" type | \n",
" notnull | \n",
" dflt_value | \n",
" pk | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" date | \n",
" INTEGER | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" number_of_game | \n",
" INTEGER | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" day_of_week | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" v_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" v_league | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" v_game_number | \n",
" INTEGER | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 6 | \n",
" 6 | \n",
" h_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 7 | \n",
" 7 | \n",
" h_league | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" 8 | \n",
" h_game_number | \n",
" INTEGER | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" 9 | \n",
" v_score | \n",
" INTEGER | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 10 | \n",
" 10 | \n",
" h_score | \n",
" INTEGER | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 11 | \n",
" 11 | \n",
" length_outs | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 12 | \n",
" 12 | \n",
" day_night | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 13 | \n",
" 13 | \n",
" completion | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 14 | \n",
" 14 | \n",
" forefeit | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 15 | \n",
" 15 | \n",
" protest | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 16 | \n",
" 16 | \n",
" park_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 17 | \n",
" 17 | \n",
" attendance | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 18 | \n",
" 18 | \n",
" length_minutes | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 19 | \n",
" 19 | \n",
" v_line_score | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 20 | \n",
" 20 | \n",
" h_line_score | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 21 | \n",
" 21 | \n",
" v_at_bats | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 22 | \n",
" 22 | \n",
" v_hits | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 23 | \n",
" 23 | \n",
" v_doubles | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 24 | \n",
" 24 | \n",
" v_triples | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 25 | \n",
" 25 | \n",
" v_homeruns | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 26 | \n",
" 26 | \n",
" v_rbi | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 27 | \n",
" 27 | \n",
" v_sacrifice_hits | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 28 | \n",
" 28 | \n",
" v_sacrifice_flies | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 29 | \n",
" 29 | \n",
" v_hit_by_pitch | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 30 | \n",
" 30 | \n",
" v_walks | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 31 | \n",
" 31 | \n",
" v_intentional_walks | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 32 | \n",
" 32 | \n",
" v_strikeouts | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 33 | \n",
" 33 | \n",
" v_stolen_bases | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 34 | \n",
" 34 | \n",
" v_caught_stealing | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 35 | \n",
" 35 | \n",
" v_grounded_into_double | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 36 | \n",
" 36 | \n",
" v_first_catcher_interference | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 37 | \n",
" 37 | \n",
" v_left_on_base | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 38 | \n",
" 38 | \n",
" v_pitchers_used | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 39 | \n",
" 39 | \n",
" v_individual_earned_runs | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 40 | \n",
" 40 | \n",
" v_team_earned_runs | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 41 | \n",
" 41 | \n",
" v_wild_pitches | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 42 | \n",
" 42 | \n",
" v_balks | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 43 | \n",
" 43 | \n",
" v_putouts | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 44 | \n",
" 44 | \n",
" v_assists | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 45 | \n",
" 45 | \n",
" v_errors | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 46 | \n",
" 46 | \n",
" v_passed_balls | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 47 | \n",
" 47 | \n",
" v_double_plays | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 48 | \n",
" 48 | \n",
" v_triple_plays | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 49 | \n",
" 49 | \n",
" h_at_bats | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 50 | \n",
" 50 | \n",
" h_hits | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 51 | \n",
" 51 | \n",
" h_doubles | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 52 | \n",
" 52 | \n",
" h_triples | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 53 | \n",
" 53 | \n",
" h_homeruns | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 54 | \n",
" 54 | \n",
" h_rbi | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 55 | \n",
" 55 | \n",
" h_sacrifice_hits | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 56 | \n",
" 56 | \n",
" h_sacrifice_flies | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 57 | \n",
" 57 | \n",
" h_hit_by_pitch | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 58 | \n",
" 58 | \n",
" h_walks | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 59 | \n",
" 59 | \n",
" h_intentional_walks | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 60 | \n",
" 60 | \n",
" h_strikeouts | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 61 | \n",
" 61 | \n",
" h_stolen_bases | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 62 | \n",
" 62 | \n",
" h_caught_stealing | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 63 | \n",
" 63 | \n",
" h_grounded_into_double | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 64 | \n",
" 64 | \n",
" h_first_catcher_interference | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 65 | \n",
" 65 | \n",
" h_left_on_base | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 66 | \n",
" 66 | \n",
" h_pitchers_used | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 67 | \n",
" 67 | \n",
" h_individual_earned_runs | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 68 | \n",
" 68 | \n",
" h_team_earned_runs | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 69 | \n",
" 69 | \n",
" h_wild_pitches | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 70 | \n",
" 70 | \n",
" h_balks | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 71 | \n",
" 71 | \n",
" h_putouts | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 72 | \n",
" 72 | \n",
" h_assists | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 73 | \n",
" 73 | \n",
" h_errors | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 74 | \n",
" 74 | \n",
" h_passed_balls | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 75 | \n",
" 75 | \n",
" h_double_plays | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 76 | \n",
" 76 | \n",
" h_triple_plays | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 77 | \n",
" 77 | \n",
" hp_umpire_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 78 | \n",
" 78 | \n",
" hp_umpire_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 79 | \n",
" 79 | \n",
" 1b_umpire_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 80 | \n",
" 80 | \n",
" 1b_umpire_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 81 | \n",
" 81 | \n",
" 2b_umpire_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 82 | \n",
" 82 | \n",
" 2b_umpire_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 83 | \n",
" 83 | \n",
" 3b_umpire_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 84 | \n",
" 84 | \n",
" 3b_umpire_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 85 | \n",
" 85 | \n",
" lf_umpire_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 86 | \n",
" 86 | \n",
" lf_umpire_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 87 | \n",
" 87 | \n",
" rf_umpire_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 88 | \n",
" 88 | \n",
" rf_umpire_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 89 | \n",
" 89 | \n",
" v_manager_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 90 | \n",
" 90 | \n",
" v_manager_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 91 | \n",
" 91 | \n",
" h_manager_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 92 | \n",
" 92 | \n",
" h_manager_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 93 | \n",
" 93 | \n",
" winning_pitcher_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 94 | \n",
" 94 | \n",
" winning_pitcher_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 95 | \n",
" 95 | \n",
" losing_pitcher_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 96 | \n",
" 96 | \n",
" losing_pitcher_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 97 | \n",
" 97 | \n",
" saving_pitcher_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 98 | \n",
" 98 | \n",
" saving_pitcher_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 99 | \n",
" 99 | \n",
" winning_rbi_batter_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 100 | \n",
" 100 | \n",
" winning_rbi_batter_id_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 101 | \n",
" 101 | \n",
" v_starting_pitcher_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 102 | \n",
" 102 | \n",
" v_starting_pitcher_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 103 | \n",
" 103 | \n",
" h_starting_pitcher_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 104 | \n",
" 104 | \n",
" h_starting_pitcher_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 105 | \n",
" 105 | \n",
" v_player_1_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 106 | \n",
" 106 | \n",
" v_player_1_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 107 | \n",
" 107 | \n",
" v_player_1_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 108 | \n",
" 108 | \n",
" v_player_2_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 109 | \n",
" 109 | \n",
" v_player_2_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 110 | \n",
" 110 | \n",
" v_player_2_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 111 | \n",
" 111 | \n",
" v_player_3_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 112 | \n",
" 112 | \n",
" v_player_3_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 113 | \n",
" 113 | \n",
" v_player_3_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 114 | \n",
" 114 | \n",
" v_player_4_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 115 | \n",
" 115 | \n",
" v_player_4_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 116 | \n",
" 116 | \n",
" v_player_4_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 117 | \n",
" 117 | \n",
" v_player_5_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 118 | \n",
" 118 | \n",
" v_player_5_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 119 | \n",
" 119 | \n",
" v_player_5_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 120 | \n",
" 120 | \n",
" v_player_6_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 121 | \n",
" 121 | \n",
" v_player_6_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 122 | \n",
" 122 | \n",
" v_player_6_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 123 | \n",
" 123 | \n",
" v_player_7_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 124 | \n",
" 124 | \n",
" v_player_7_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 125 | \n",
" 125 | \n",
" v_player_7_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 126 | \n",
" 126 | \n",
" v_player_8_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 127 | \n",
" 127 | \n",
" v_player_8_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 128 | \n",
" 128 | \n",
" v_player_8_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 129 | \n",
" 129 | \n",
" v_player_9_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 130 | \n",
" 130 | \n",
" v_player_9_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 131 | \n",
" 131 | \n",
" v_player_9_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 132 | \n",
" 132 | \n",
" h_player_1_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 133 | \n",
" 133 | \n",
" h_player_1_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 134 | \n",
" 134 | \n",
" h_player_1_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 135 | \n",
" 135 | \n",
" h_player_2_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 136 | \n",
" 136 | \n",
" h_player_2_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 137 | \n",
" 137 | \n",
" h_player_2_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 138 | \n",
" 138 | \n",
" h_player_3_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 139 | \n",
" 139 | \n",
" h_player_3_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 140 | \n",
" 140 | \n",
" h_player_3_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 141 | \n",
" 141 | \n",
" h_player_4_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 142 | \n",
" 142 | \n",
" h_player_4_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 143 | \n",
" 143 | \n",
" h_player_4_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 144 | \n",
" 144 | \n",
" h_player_5_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 145 | \n",
" 145 | \n",
" h_player_5_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 146 | \n",
" 146 | \n",
" h_player_5_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 147 | \n",
" 147 | \n",
" h_player_6_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 148 | \n",
" 148 | \n",
" h_player_6_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 149 | \n",
" 149 | \n",
" h_player_6_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 150 | \n",
" 150 | \n",
" h_player_7_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 151 | \n",
" 151 | \n",
" h_player_7_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 152 | \n",
" 152 | \n",
" h_player_7_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 153 | \n",
" 153 | \n",
" h_player_8_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 154 | \n",
" 154 | \n",
" h_player_8_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 155 | \n",
" 155 | \n",
" h_player_8_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 156 | \n",
" 156 | \n",
" h_player_9_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 157 | \n",
" 157 | \n",
" h_player_9_name | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 158 | \n",
" 158 | \n",
" h_player_9_def_pos | \n",
" REAL | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 159 | \n",
" 159 | \n",
" additional_info | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 160 | \n",
" 160 | \n",
" acquisition_info | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
" 161 | \n",
" 161 | \n",
" game_id | \n",
" TEXT | \n",
" 0 | \n",
" None | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cid name type notnull dflt_value pk\n",
"0 0 date INTEGER 0 None 0\n",
"1 1 number_of_game INTEGER 0 None 0\n",
"2 2 day_of_week TEXT 0 None 0\n",
"3 3 v_name TEXT 0 None 0\n",
"4 4 v_league TEXT 0 None 0\n",
"5 5 v_game_number INTEGER 0 None 0\n",
"6 6 h_name TEXT 0 None 0\n",
"7 7 h_league TEXT 0 None 0\n",
"8 8 h_game_number INTEGER 0 None 0\n",
"9 9 v_score INTEGER 0 None 0\n",
"10 10 h_score INTEGER 0 None 0\n",
"11 11 length_outs REAL 0 None 0\n",
"12 12 day_night TEXT 0 None 0\n",
"13 13 completion TEXT 0 None 0\n",
"14 14 forefeit TEXT 0 None 0\n",
"15 15 protest TEXT 0 None 0\n",
"16 16 park_id TEXT 0 None 0\n",
"17 17 attendance REAL 0 None 0\n",
"18 18 length_minutes REAL 0 None 0\n",
"19 19 v_line_score TEXT 0 None 0\n",
"20 20 h_line_score TEXT 0 None 0\n",
"21 21 v_at_bats REAL 0 None 0\n",
"22 22 v_hits REAL 0 None 0\n",
"23 23 v_doubles REAL 0 None 0\n",
"24 24 v_triples REAL 0 None 0\n",
"25 25 v_homeruns REAL 0 None 0\n",
"26 26 v_rbi REAL 0 None 0\n",
"27 27 v_sacrifice_hits REAL 0 None 0\n",
"28 28 v_sacrifice_flies REAL 0 None 0\n",
"29 29 v_hit_by_pitch REAL 0 None 0\n",
"30 30 v_walks REAL 0 None 0\n",
"31 31 v_intentional_walks REAL 0 None 0\n",
"32 32 v_strikeouts REAL 0 None 0\n",
"33 33 v_stolen_bases REAL 0 None 0\n",
"34 34 v_caught_stealing REAL 0 None 0\n",
"35 35 v_grounded_into_double REAL 0 None 0\n",
"36 36 v_first_catcher_interference REAL 0 None 0\n",
"37 37 v_left_on_base REAL 0 None 0\n",
"38 38 v_pitchers_used REAL 0 None 0\n",
"39 39 v_individual_earned_runs REAL 0 None 0\n",
"40 40 v_team_earned_runs REAL 0 None 0\n",
"41 41 v_wild_pitches REAL 0 None 0\n",
"42 42 v_balks REAL 0 None 0\n",
"43 43 v_putouts REAL 0 None 0\n",
"44 44 v_assists REAL 0 None 0\n",
"45 45 v_errors REAL 0 None 0\n",
"46 46 v_passed_balls REAL 0 None 0\n",
"47 47 v_double_plays REAL 0 None 0\n",
"48 48 v_triple_plays REAL 0 None 0\n",
"49 49 h_at_bats REAL 0 None 0\n",
"50 50 h_hits REAL 0 None 0\n",
"51 51 h_doubles REAL 0 None 0\n",
"52 52 h_triples REAL 0 None 0\n",
"53 53 h_homeruns REAL 0 None 0\n",
"54 54 h_rbi REAL 0 None 0\n",
"55 55 h_sacrifice_hits REAL 0 None 0\n",
"56 56 h_sacrifice_flies REAL 0 None 0\n",
"57 57 h_hit_by_pitch REAL 0 None 0\n",
"58 58 h_walks REAL 0 None 0\n",
"59 59 h_intentional_walks REAL 0 None 0\n",
"60 60 h_strikeouts REAL 0 None 0\n",
"61 61 h_stolen_bases REAL 0 None 0\n",
"62 62 h_caught_stealing REAL 0 None 0\n",
"63 63 h_grounded_into_double REAL 0 None 0\n",
"64 64 h_first_catcher_interference REAL 0 None 0\n",
"65 65 h_left_on_base REAL 0 None 0\n",
"66 66 h_pitchers_used REAL 0 None 0\n",
"67 67 h_individual_earned_runs REAL 0 None 0\n",
"68 68 h_team_earned_runs REAL 0 None 0\n",
"69 69 h_wild_pitches REAL 0 None 0\n",
"70 70 h_balks REAL 0 None 0\n",
"71 71 h_putouts REAL 0 None 0\n",
"72 72 h_assists REAL 0 None 0\n",
"73 73 h_errors REAL 0 None 0\n",
"74 74 h_passed_balls REAL 0 None 0\n",
"75 75 h_double_plays REAL 0 None 0\n",
"76 76 h_triple_plays REAL 0 None 0\n",
"77 77 hp_umpire_id TEXT 0 None 0\n",
"78 78 hp_umpire_name TEXT 0 None 0\n",
"79 79 1b_umpire_id TEXT 0 None 0\n",
"80 80 1b_umpire_name TEXT 0 None 0\n",
"81 81 2b_umpire_id TEXT 0 None 0\n",
"82 82 2b_umpire_name TEXT 0 None 0\n",
"83 83 3b_umpire_id TEXT 0 None 0\n",
"84 84 3b_umpire_name TEXT 0 None 0\n",
"85 85 lf_umpire_id TEXT 0 None 0\n",
"86 86 lf_umpire_name TEXT 0 None 0\n",
"87 87 rf_umpire_id TEXT 0 None 0\n",
"88 88 rf_umpire_name TEXT 0 None 0\n",
"89 89 v_manager_id TEXT 0 None 0\n",
"90 90 v_manager_name TEXT 0 None 0\n",
"91 91 h_manager_id TEXT 0 None 0\n",
"92 92 h_manager_name TEXT 0 None 0\n",
"93 93 winning_pitcher_id TEXT 0 None 0\n",
"94 94 winning_pitcher_name TEXT 0 None 0\n",
"95 95 losing_pitcher_id TEXT 0 None 0\n",
"96 96 losing_pitcher_name TEXT 0 None 0\n",
"97 97 saving_pitcher_id TEXT 0 None 0\n",
"98 98 saving_pitcher_name TEXT 0 None 0\n",
"99 99 winning_rbi_batter_id TEXT 0 None 0\n",
"100 100 winning_rbi_batter_id_name TEXT 0 None 0\n",
"101 101 v_starting_pitcher_id TEXT 0 None 0\n",
"102 102 v_starting_pitcher_name TEXT 0 None 0\n",
"103 103 h_starting_pitcher_id TEXT 0 None 0\n",
"104 104 h_starting_pitcher_name TEXT 0 None 0\n",
"105 105 v_player_1_id TEXT 0 None 0\n",
"106 106 v_player_1_name TEXT 0 None 0\n",
"107 107 v_player_1_def_pos REAL 0 None 0\n",
"108 108 v_player_2_id TEXT 0 None 0\n",
"109 109 v_player_2_name TEXT 0 None 0\n",
"110 110 v_player_2_def_pos REAL 0 None 0\n",
"111 111 v_player_3_id TEXT 0 None 0\n",
"112 112 v_player_3_name TEXT 0 None 0\n",
"113 113 v_player_3_def_pos REAL 0 None 0\n",
"114 114 v_player_4_id TEXT 0 None 0\n",
"115 115 v_player_4_name TEXT 0 None 0\n",
"116 116 v_player_4_def_pos REAL 0 None 0\n",
"117 117 v_player_5_id TEXT 0 None 0\n",
"118 118 v_player_5_name TEXT 0 None 0\n",
"119 119 v_player_5_def_pos REAL 0 None 0\n",
"120 120 v_player_6_id TEXT 0 None 0\n",
"121 121 v_player_6_name TEXT 0 None 0\n",
"122 122 v_player_6_def_pos REAL 0 None 0\n",
"123 123 v_player_7_id TEXT 0 None 0\n",
"124 124 v_player_7_name TEXT 0 None 0\n",
"125 125 v_player_7_def_pos REAL 0 None 0\n",
"126 126 v_player_8_id TEXT 0 None 0\n",
"127 127 v_player_8_name TEXT 0 None 0\n",
"128 128 v_player_8_def_pos REAL 0 None 0\n",
"129 129 v_player_9_id TEXT 0 None 0\n",
"130 130 v_player_9_name TEXT 0 None 0\n",
"131 131 v_player_9_def_pos REAL 0 None 0\n",
"132 132 h_player_1_id TEXT 0 None 0\n",
"133 133 h_player_1_name TEXT 0 None 0\n",
"134 134 h_player_1_def_pos REAL 0 None 0\n",
"135 135 h_player_2_id TEXT 0 None 0\n",
"136 136 h_player_2_name TEXT 0 None 0\n",
"137 137 h_player_2_def_pos REAL 0 None 0\n",
"138 138 h_player_3_id TEXT 0 None 0\n",
"139 139 h_player_3_name TEXT 0 None 0\n",
"140 140 h_player_3_def_pos REAL 0 None 0\n",
"141 141 h_player_4_id TEXT 0 None 0\n",
"142 142 h_player_4_name TEXT 0 None 0\n",
"143 143 h_player_4_def_pos REAL 0 None 0\n",
"144 144 h_player_5_id TEXT 0 None 0\n",
"145 145 h_player_5_name TEXT 0 None 0\n",
"146 146 h_player_5_def_pos REAL 0 None 0\n",
"147 147 h_player_6_id TEXT 0 None 0\n",
"148 148 h_player_6_name TEXT 0 None 0\n",
"149 149 h_player_6_def_pos REAL 0 None 0\n",
"150 150 h_player_7_id TEXT 0 None 0\n",
"151 151 h_player_7_name TEXT 0 None 0\n",
"152 152 h_player_7_def_pos REAL 0 None 0\n",
"153 153 h_player_8_id TEXT 0 None 0\n",
"154 154 h_player_8_name TEXT 0 None 0\n",
"155 155 h_player_8_def_pos REAL 0 None 0\n",
"156 156 h_player_9_id TEXT 0 None 0\n",
"157 157 h_player_9_name TEXT 0 None 0\n",
"158 158 h_player_9_def_pos REAL 0 None 0\n",
"159 159 additional_info TEXT 0 None 0\n",
"160 160 acquisition_info TEXT 0 None 0\n",
"161 161 game_id TEXT 0 None 0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q = '''\n",
"SELECT * from pragma_table_info('game_log')\n",
"'''\n",
"\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Looking for Normalization Opportunities¶\n",
"\n",
"The following are opportunities for normalization of our data:\n",
"\n",
"- In person_codes, all the debut dates will be able to be reproduced using game log data.\n",
"- In team_codes, the start, end and sequence columns will be able to be reproduced using game log data.\n",
"- In park_codes, the start and end years will be able to be reproduced using game log data. While technically the state is an attribute of the city, we might not want to have a an incomplete city/state table so we will leave this in.\n",
"- There are lots of places in game log where we have a player ID followed by the players name. We will be able to remove this and use the name data in person_codes\n",
"- In game_log, all offensive and defensive stats are repeated for the home team and the visiting team. We could break these out and have a table that lists each game twice, one for each team, and cut out this column repetition.\n",
"- Similarly, in game_log, we have a listing for 9 players on each team with their positions - we can remove these and have one table that tracks player appearances and their positions.\n",
"- We can do a similar thing with the umpires from game_log, instead of listing all four positions as columns, we can put the umpires either in their own table or make one table for players, umpires and managers.\n",
"- We have several awards in game_log like winning pitcher and losing pitcher. We can either break these out into their own table, have a table for awards, or combine the awards in with general appearances like the players and umpires."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Planning a Normalized Schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![schema](schema-screenshot.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating Tables without Foreign Key Relations\n",
"\n",
"We'll start by creating new tables which don't contain any foreign key relations. It's important to start with these tables, as other tables will have relations to these tables, and so these tables will need to exist first.\n",
"\n",
"We will create following tables:\n",
"\n",
"`person`\n",
"- Each of the 'debut' columns have been omitted, as the data will be able to be found from other tables. Since the game log file has no data on coaches, we made the decision to not include this data.
\n",
"\n",
"`park`\n",
"- The start, end, and league columns contain data that is found in the main game log and can be removed.\n",
"\n",
"`league`\n",
"- Because some of the older leagues are not well known, we will create a table to store league names.\n",
"\n",
"`appearance_type`\n",
"- Our appearance table will include data on players with positions, umpires, managers, and awards (like winning pitcher). This table will store information on what different types of appearances are available."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating new tables: \n",
"`person`"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" person_id | \n",
" first_name | \n",
" last_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aardd001 | \n",
" David | \n",
" Aardsma | \n",
"
\n",
" \n",
" 1 | \n",
" aaroh101 | \n",
" Hank | \n",
" Aaron | \n",
"
\n",
" \n",
" 2 | \n",
" aarot101 | \n",
" Tommie | \n",
" Aaron | \n",
"
\n",
" \n",
" 3 | \n",
" aased001 | \n",
" Don | \n",
" Aase | \n",
"
\n",
" \n",
" 4 | \n",
" abada001 | \n",
" Andy | \n",
" Abad | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" person_id first_name last_name\n",
"0 aardd001 David Aardsma\n",
"1 aaroh101 Hank Aaron\n",
"2 aarot101 Tommie Aaron\n",
"3 aased001 Don Aase\n",
"4 abada001 Andy Abad"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = '''\n",
"CREATE TABLE IF NOT EXISTS person (\n",
" person_id TEXT PRIMARY KEY,\n",
" first_name TEXT, \n",
" last_name TEXT\n",
"); \n",
"'''\n",
"c2 = '''\n",
"INSERT OR IGNORE INTO person\n",
"SELECT \n",
" id, \n",
" first,\n",
" last\n",
"FROM person_codes;\n",
"'''\n",
"\n",
"q = '''\n",
"SELECT \n",
" *\n",
"FROM person\n",
"LIMIT 5;\n",
"'''\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`park`"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" park_id | \n",
" name | \n",
" nickname | \n",
" city | \n",
" state | \n",
" notes | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ALB01 | \n",
" Riverside Park | \n",
" None | \n",
" Albany | \n",
" NY | \n",
" TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882 | \n",
"
\n",
" \n",
" 1 | \n",
" ALT01 | \n",
" Columbia Park | \n",
" None | \n",
" Altoona | \n",
" PA | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" ANA01 | \n",
" Angel Stadium of Anaheim | \n",
" Edison Field; Anaheim Stadium | \n",
" Anaheim | \n",
" CA | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" ARL01 | \n",
" Arlington Stadium | \n",
" None | \n",
" Arlington | \n",
" TX | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" ARL02 | \n",
" Rangers Ballpark in Arlington | \n",
" The Ballpark in Arlington; Ameriquest Fl | \n",
" Arlington | \n",
" TX | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" park_id name \\\n",
"0 ALB01 Riverside Park \n",
"1 ALT01 Columbia Park \n",
"2 ANA01 Angel Stadium of Anaheim \n",
"3 ARL01 Arlington Stadium \n",
"4 ARL02 Rangers Ballpark in Arlington \n",
"\n",
" nickname city state \\\n",
"0 None Albany NY \n",
"1 None Altoona PA \n",
"2 Edison Field; Anaheim Stadium Anaheim CA \n",
"3 None Arlington TX \n",
"4 The Ballpark in Arlington; Ameriquest Fl Arlington TX \n",
"\n",
" notes \n",
"0 TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882 \n",
"1 None \n",
"2 None \n",
"3 None \n",
"4 None "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = '''\n",
"CREATE TABLE IF NOT EXISTS park (\n",
" park_id TEXT PRIMARY KEY,\n",
" name TEXT,\n",
" nickname TEXT,\n",
" city TEXT,\n",
" state TEXT,\n",
" notes TEXT\n",
"); \n",
"'''\n",
"\n",
"c2 = '''\n",
"INSERT OR IGNORE INTO park\n",
"SELECT \n",
" park_id, \n",
" name,\n",
" aka,\n",
" city, \n",
" state,\n",
" notes\n",
"FROM park_codes;\n",
"'''\n",
"\n",
"q = '''\n",
"SELECT \n",
" *\n",
"FROM park\n",
"LIMIT 5;\n",
"'''\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`league`"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" league_id | \n",
" league_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" NL | \n",
" National League | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" American League | \n",
"
\n",
" \n",
" 2 | \n",
" AA | \n",
" American Association | \n",
"
\n",
" \n",
" 3 | \n",
" FL | \n",
" Federal League | \n",
"
\n",
" \n",
" 4 | \n",
" PL | \n",
" Players League | \n",
"
\n",
" \n",
" 5 | \n",
" UA | \n",
" Union Association | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" league_id league_name\n",
"0 NL National League\n",
"1 AL American League\n",
"2 AA American Association\n",
"3 FL Federal League\n",
"4 PL Players League\n",
"5 UA Union Association"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = '''\n",
"CREATE TABLE IF NOT EXISTS league (\n",
" league_id TEXT PRIMARY KEY,\n",
" league_name TEXT\n",
"); \n",
"'''\n",
"\n",
"c2 = '''\n",
"INSERT OR IGNORE INTO league\n",
"VALUES\n",
" (\"NL\", \"National League\"),\n",
" (\"AL\", \"American League\"),\n",
" (\"AA\", \"American Association\"),\n",
" (\"FL\", \"Federal League\"),\n",
" (\"PL\", \"Players League\"),\n",
" (\"UA\", \"Union Association\")\n",
";\n",
"'''\n",
"\n",
"q = '''\n",
"SELECT \n",
" *\n",
"FROM league;\n",
"'''\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`appearance_type`
\n",
"We have a appearance_type.csv file, which contains all values need for this table."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" appearance_type_id | \n",
" name | \n",
" category | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" O1 | \n",
" Batter 1 | \n",
" offense | \n",
"
\n",
" \n",
" 1 | \n",
" O2 | \n",
" Batter 2 | \n",
" offense | \n",
"
\n",
" \n",
" 2 | \n",
" O3 | \n",
" Batter 3 | \n",
" offense | \n",
"
\n",
" \n",
" 3 | \n",
" O4 | \n",
" Batter 4 | \n",
" offense | \n",
"
\n",
" \n",
" 4 | \n",
" O5 | \n",
" Batter 5 | \n",
" offense | \n",
"
\n",
" \n",
" 5 | \n",
" O6 | \n",
" Batter 6 | \n",
" offense | \n",
"
\n",
" \n",
" 6 | \n",
" O7 | \n",
" Batter 7 | \n",
" offense | \n",
"
\n",
" \n",
" 7 | \n",
" O8 | \n",
" Batter 8 | \n",
" offense | \n",
"
\n",
" \n",
" 8 | \n",
" O9 | \n",
" Batter 9 | \n",
" offense | \n",
"
\n",
" \n",
" 9 | \n",
" D1 | \n",
" Pitcher | \n",
" defense | \n",
"
\n",
" \n",
" 10 | \n",
" D2 | \n",
" Catcher | \n",
" defense | \n",
"
\n",
" \n",
" 11 | \n",
" D3 | \n",
" 1st Base | \n",
" defense | \n",
"
\n",
" \n",
" 12 | \n",
" D4 | \n",
" 2nd Base | \n",
" defense | \n",
"
\n",
" \n",
" 13 | \n",
" D5 | \n",
" 3rd Base | \n",
" defense | \n",
"
\n",
" \n",
" 14 | \n",
" D6 | \n",
" Shortstop | \n",
" defense | \n",
"
\n",
" \n",
" 15 | \n",
" D7 | \n",
" Left Field | \n",
" defense | \n",
"
\n",
" \n",
" 16 | \n",
" D8 | \n",
" Center Field | \n",
" defense | \n",
"
\n",
" \n",
" 17 | \n",
" D9 | \n",
" Right Field | \n",
" defense | \n",
"
\n",
" \n",
" 18 | \n",
" D10 | \n",
" Unknown Position | \n",
" defense | \n",
"
\n",
" \n",
" 19 | \n",
" UHP | \n",
" Home Plate | \n",
" umpire | \n",
"
\n",
" \n",
" 20 | \n",
" U1B | \n",
" First Base | \n",
" umpire | \n",
"
\n",
" \n",
" 21 | \n",
" U2B | \n",
" Second Base | \n",
" umpire | \n",
"
\n",
" \n",
" 22 | \n",
" U3B | \n",
" Third Base | \n",
" umpire | \n",
"
\n",
" \n",
" 23 | \n",
" ULF | \n",
" Left Field | \n",
" umpire | \n",
"
\n",
" \n",
" 24 | \n",
" URF | \n",
" Right Field | \n",
" umpire | \n",
"
\n",
" \n",
" 25 | \n",
" MM | \n",
" Manager | \n",
" manager | \n",
"
\n",
" \n",
" 26 | \n",
" AWP | \n",
" Winning Pitcher | \n",
" award | \n",
"
\n",
" \n",
" 27 | \n",
" ALP | \n",
" Losing Pitcher | \n",
" award | \n",
"
\n",
" \n",
" 28 | \n",
" ASP | \n",
" Saving Pitcher | \n",
" award | \n",
"
\n",
" \n",
" 29 | \n",
" AWB | \n",
" Winning RBI Batter | \n",
" award | \n",
"
\n",
" \n",
" 30 | \n",
" PSP | \n",
" Starting Pitcher | \n",
" pitcher | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" appearance_type_id name category\n",
"0 O1 Batter 1 offense\n",
"1 O2 Batter 2 offense\n",
"2 O3 Batter 3 offense\n",
"3 O4 Batter 4 offense\n",
"4 O5 Batter 5 offense\n",
"5 O6 Batter 6 offense\n",
"6 O7 Batter 7 offense\n",
"7 O8 Batter 8 offense\n",
"8 O9 Batter 9 offense\n",
"9 D1 Pitcher defense\n",
"10 D2 Catcher defense\n",
"11 D3 1st Base defense\n",
"12 D4 2nd Base defense\n",
"13 D5 3rd Base defense\n",
"14 D6 Shortstop defense\n",
"15 D7 Left Field defense\n",
"16 D8 Center Field defense\n",
"17 D9 Right Field defense\n",
"18 D10 Unknown Position defense\n",
"19 UHP Home Plate umpire\n",
"20 U1B First Base umpire\n",
"21 U2B Second Base umpire\n",
"22 U3B Third Base umpire\n",
"23 ULF Left Field umpire\n",
"24 URF Right Field umpire\n",
"25 MM Manager manager\n",
"26 AWP Winning Pitcher award\n",
"27 ALP Losing Pitcher award\n",
"28 ASP Saving Pitcher award\n",
"29 AWB Winning RBI Batter award\n",
"30 PSP Starting Pitcher pitcher"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = '''DROP TABLE IF EXISTS appearance_type'''\n",
"\n",
"c2 = '''\n",
"CREATE TABLE IF NOT EXISTS appearance_type (\n",
" appearance_type_id TEXT PRIMARY KEY,\n",
" name TEXT,\n",
" category TEXT\n",
");\n",
"'''\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"\n",
"appearance_pd = pd.read_csv('appearance_type.csv')\n",
"\n",
"with sqlite3.connect('mlb.db') as conn:\n",
" appearance_pd.to_sql('appearance_type',\n",
" conn,\n",
" index=False,\n",
" if_exists='append')\n",
"\n",
"q = '''\n",
"SELECT \n",
" *\n",
"FROM appearance_type;\n",
"'''\n",
"\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Re-using the run_command() function defined earlier, we can add a single line to enable enforcement of foreign key restraints:\n",
"\n",
"`def run_command(c):\n",
" with sqlite3.connect(DB) as conn:\n",
" conn.execute('PRAGMA foreign_keys = ON;')\n",
" conn.isolation_level = None\n",
" conn.execute(c)`\n",
" \n",
"### Adding Tables with foreign keys\n",
"\n",
"`team`\n",
"- The start, end, and sequence columns for this table can be derived from the game level data."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" team_id | \n",
" league_id | \n",
" city | \n",
" nickname | \n",
" franch_id | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" ALT | \n",
" UA | \n",
" Altoona | \n",
" Mountain Cities | \n",
" ALT | \n",
"
\n",
" \n",
" 1 | \n",
" ARI | \n",
" NL | \n",
" Arizona | \n",
" Diamondbacks | \n",
" ARI | \n",
"
\n",
" \n",
" 2 | \n",
" BFN | \n",
" NL | \n",
" Buffalo | \n",
" Bisons | \n",
" BFN | \n",
"
\n",
" \n",
" 3 | \n",
" BFP | \n",
" PL | \n",
" Buffalo | \n",
" Bisons | \n",
" BFP | \n",
"
\n",
" \n",
" 4 | \n",
" BL1 | \n",
" None | \n",
" Baltimore | \n",
" Canaries | \n",
" BL1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" team_id league_id city nickname franch_id\n",
"0 ALT UA Altoona Mountain Cities ALT\n",
"1 ARI NL Arizona Diamondbacks ARI\n",
"2 BFN NL Buffalo Bisons BFN\n",
"3 BFP PL Buffalo Bisons BFP\n",
"4 BL1 None Baltimore Canaries BL1"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def run_command(c):\n",
" with sqlite3.connect(db) as conn:\n",
" conn.execute('PRAGMA foreign_keys = ON;')\n",
" conn.isolation_level = None\n",
" conn.execute(c)\n",
"\n",
"c1 = \"\"\"\n",
"CREATE TABLE IF NOT EXISTS team (\n",
" team_id TEXT PRIMARY KEY,\n",
" league_id TEXT,\n",
" city TEXT,\n",
" nickname TEXT,\n",
" franch_id TEXT,\n",
" FOREIGN KEY (league_id) REFERENCES league(league_id)\n",
");\n",
"\"\"\"\n",
"\n",
"c2 = \"\"\"\n",
"INSERT OR IGNORE INTO team\n",
"SELECT\n",
" team_id,\n",
" league,\n",
" city,\n",
" nickname,\n",
" franch_id\n",
"FROM team_codes;\n",
"\"\"\"\n",
"\n",
"q = \"\"\"\n",
"SELECT * FROM team\n",
"LIMIT 5;\n",
"\"\"\"\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`game`\n",
"- We will include all columns for the game log that don't refer to one specific team or player, instead putting those in two appearance tables.\n",
"- We will remove the column with the day of the week, as this can be derived from the date.\n",
"- We will change the day_night column to day, with the intention of making this a boolean column. "
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" game_id | \n",
" date | \n",
" number_of_game | \n",
" park_id | \n",
" length_outs | \n",
" day | \n",
" completion | \n",
" forefeit | \n",
" protest | \n",
" attendance | \n",
" legnth_minutes | \n",
" additional_info | \n",
" acquisition_info | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" FW1187105040 | \n",
" 18710504 | \n",
" 0 | \n",
" FOR01 | \n",
" 54 | \n",
" 1 | \n",
" None | \n",
" None | \n",
" None | \n",
" 200 | \n",
" 120 | \n",
" None | \n",
" Y | \n",
"
\n",
" \n",
" 1 | \n",
" WS3187105050 | \n",
" 18710505 | \n",
" 0 | \n",
" WAS01 | \n",
" 54 | \n",
" 1 | \n",
" None | \n",
" None | \n",
" None | \n",
" 5000 | \n",
" 145 | \n",
" HTBF | \n",
" Y | \n",
"
\n",
" \n",
" 2 | \n",
" RC1187105060 | \n",
" 18710506 | \n",
" 0 | \n",
" RCK01 | \n",
" 54 | \n",
" 1 | \n",
" None | \n",
" None | \n",
" None | \n",
" 1000 | \n",
" 140 | \n",
" None | \n",
" Y | \n",
"
\n",
" \n",
" 3 | \n",
" CH1187105080 | \n",
" 18710508 | \n",
" 0 | \n",
" CHI01 | \n",
" 54 | \n",
" 1 | \n",
" None | \n",
" None | \n",
" None | \n",
" 5000 | \n",
" 150 | \n",
" None | \n",
" Y | \n",
"
\n",
" \n",
" 4 | \n",
" TRO187105090 | \n",
" 18710509 | \n",
" 0 | \n",
" TRO01 | \n",
" 54 | \n",
" 1 | \n",
" None | \n",
" None | \n",
" None | \n",
" 3250 | \n",
" 145 | \n",
" HTBF | \n",
" Y | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" game_id date number_of_game park_id length_outs day \\\n",
"0 FW1187105040 18710504 0 FOR01 54 1 \n",
"1 WS3187105050 18710505 0 WAS01 54 1 \n",
"2 RC1187105060 18710506 0 RCK01 54 1 \n",
"3 CH1187105080 18710508 0 CHI01 54 1 \n",
"4 TRO187105090 18710509 0 TRO01 54 1 \n",
"\n",
" completion forefeit protest attendance legnth_minutes additional_info \\\n",
"0 None None None 200 120 None \n",
"1 None None None 5000 145 HTBF \n",
"2 None None None 1000 140 None \n",
"3 None None None 5000 150 None \n",
"4 None None None 3250 145 HTBF \n",
"\n",
" acquisition_info \n",
"0 Y \n",
"1 Y \n",
"2 Y \n",
"3 Y \n",
"4 Y "
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = \"\"\"\n",
"CREATE TABLE IF NOT EXISTS game (\n",
" game_id TEXT PRIMARY KEY,\n",
" date TEXT,\n",
" number_of_game INTEGER,\n",
" park_id TEXT,\n",
" length_outs INTEGER,\n",
" day BOOLEAN,\n",
" completion TEXT,\n",
" forefeit TEXT,\n",
" protest TEXT,\n",
" attendance INTEGER,\n",
" legnth_minutes INTEGER,\n",
" additional_info TEXT,\n",
" acquisition_info TEXT,\n",
" FOREIGN KEY (park_id) REFERENCES park(park_id)\n",
");\n",
"\"\"\"\n",
"\n",
"c2 = \"\"\"\n",
"INSERT OR IGNORE INTO game\n",
"SELECT\n",
" game_id,\n",
" date,\n",
" number_of_game,\n",
" park_id,\n",
" length_outs,\n",
" CASE\n",
" WHEN day_night = \"D\" THEN 1\n",
" WHEN day_night = \"N\" THEN 0\n",
" ELSE NULL\n",
" END\n",
" AS day,\n",
" completion,\n",
" forefeit,\n",
" protest,\n",
" attendance,\n",
" length_minutes,\n",
" additional_info,\n",
" acquisition_info\n",
"FROM game_log;\n",
"\"\"\"\n",
"\n",
"q = \"\"\"\n",
"SELECT * FROM game\n",
"LIMIT 5;\n",
"\"\"\"\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Team Appearance`\n",
"\n",
"The team_appearance table has a compound primary key composed of the team name and the game ID. In addition, a boolean column home is used to differentiate between the home and the away team. The rest of the columns are scores or statistics that in our original game log are repeated for each of the home and away teams.\n",
"\n",
"In order to insert this data cleanly, we'll need to use a UNION clause. This will combine data from home team and versus team."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" team_id | \n",
" game_id | \n",
" home | \n",
" league_id | \n",
" score | \n",
" line_score | \n",
" at_bats | \n",
" hits | \n",
" doubles | \n",
" triples | \n",
" homeruns | \n",
" rbi | \n",
" sacrifice_hits | \n",
" sacrifice_flies | \n",
" hit_by_pitch | \n",
" walks | \n",
" intentional_walks | \n",
" strikeouts | \n",
" stolen_bases | \n",
" caught_stealing | \n",
" grounded_into_double | \n",
" first_catcher_interference | \n",
" left_on_base | \n",
" pitchers_used | \n",
" individual_earned_runs | \n",
" team_earned_runs | \n",
" wild_pitches | \n",
" balks | \n",
" putouts | \n",
" assists | \n",
" errors | \n",
" passed_balls | \n",
" double_plays | \n",
" triple_plays | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" SLU | \n",
" ALT188404300 | \n",
" 0 | \n",
" UA | \n",
" 15 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" ALT | \n",
" ALT188404300 | \n",
" 1 | \n",
" UA | \n",
" 2 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" BSU | \n",
" WSU188409250 | \n",
" 0 | \n",
" UA | \n",
" 2 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" WSU | \n",
" WSU188409250 | \n",
" 1 | \n",
" UA | \n",
" 10 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" team_id game_id home league_id score line_score at_bats hits \\\n",
"0 SLU ALT188404300 0 UA 15 None None None \n",
"1 ALT ALT188404300 1 UA 2 None None None \n",
"2 BSU WSU188409250 0 UA 2 None None None \n",
"3 WSU WSU188409250 1 UA 10 None None None \n",
"\n",
" doubles triples homeruns rbi sacrifice_hits sacrifice_flies hit_by_pitch \\\n",
"0 None None None None None None None \n",
"1 None None None None None None None \n",
"2 None None None None None None None \n",
"3 None None None None None None None \n",
"\n",
" walks intentional_walks strikeouts stolen_bases caught_stealing \\\n",
"0 None None None None None \n",
"1 None None None None None \n",
"2 None None None None None \n",
"3 None None None None None \n",
"\n",
" grounded_into_double first_catcher_interference left_on_base pitchers_used \\\n",
"0 None None None None \n",
"1 None None None None \n",
"2 None None None None \n",
"3 None None None None \n",
"\n",
" individual_earned_runs team_earned_runs wild_pitches balks putouts assists \\\n",
"0 None None None None None None \n",
"1 None None None None None None \n",
"2 None None None None None None \n",
"3 None None None None None None \n",
"\n",
" errors passed_balls double_plays triple_plays \n",
"0 None None None None \n",
"1 None None None None \n",
"2 None None None None \n",
"3 None None None None "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = \"\"\"\n",
"CREATE TABLE IF NOT EXISTS team_appearance (\n",
" team_id TEXT,\n",
" game_id TEXT,\n",
" home BOOLEAN,\n",
" league_id TEXT,\n",
" score INTEGER,\n",
" line_score TEXT,\n",
" at_bats INTEGER,\n",
" hits INTEGER,\n",
" doubles INTEGER,\n",
" triples INTEGER,\n",
" homeruns INTEGER,\n",
" rbi INTEGER,\n",
" sacrifice_hits INTEGER,\n",
" sacrifice_flies INTEGER,\n",
" hit_by_pitch INTEGER,\n",
" walks INTEGER,\n",
" intentional_walks INTEGER,\n",
" strikeouts INTEGER,\n",
" stolen_bases INTEGER,\n",
" caught_stealing INTEGER,\n",
" grounded_into_double INTEGER,\n",
" first_catcher_interference INTEGER,\n",
" left_on_base INTEGER,\n",
" pitchers_used INTEGER,\n",
" individual_earned_runs INTEGER,\n",
" team_earned_runs INTEGER,\n",
" wild_pitches INTEGER,\n",
" balks INTEGER,\n",
" putouts INTEGER,\n",
" assists INTEGER,\n",
" errors INTEGER,\n",
" passed_balls INTEGER,\n",
" double_plays INTEGER,\n",
" triple_plays INTEGER,\n",
" PRIMARY KEY (team_id, game_id),\n",
" FOREIGN KEY (team_id) REFERENCES team(team_id),\n",
" FOREIGN KEY (game_id) REFERENCES game(game_id),\n",
" FOREIGN KEY (league_id) REFERENCES league(league_id)\n",
");\n",
"\"\"\"\n",
"\n",
"run_command(c1)\n",
"\n",
"c2 = \"\"\"\n",
"INSERT OR IGNORE INTO team_appearance\n",
" SELECT\n",
" h_name,\n",
" game_id,\n",
" 1 AS home,\n",
" h_league,\n",
" h_score,\n",
" h_line_score,\n",
" h_at_bats,\n",
" h_hits,\n",
" h_doubles,\n",
" h_triples,\n",
" h_homeruns,\n",
" h_rbi,\n",
" h_sacrifice_hits,\n",
" h_sacrifice_flies,\n",
" h_hit_by_pitch,\n",
" h_walks,\n",
" h_intentional_walks,\n",
" h_strikeouts,\n",
" h_stolen_bases,\n",
" h_caught_stealing,\n",
" h_grounded_into_double,\n",
" h_first_catcher_interference,\n",
" h_left_on_base,\n",
" h_pitchers_used,\n",
" h_individual_earned_runs,\n",
" h_team_earned_runs,\n",
" h_wild_pitches,\n",
" h_balks,\n",
" h_putouts,\n",
" h_assists,\n",
" h_errors,\n",
" h_passed_balls,\n",
" h_double_plays,\n",
" h_triple_plays\n",
" FROM game_log\n",
"\n",
"UNION\n",
"\n",
" SELECT \n",
" v_name,\n",
" game_id,\n",
" 0 AS home,\n",
" v_league,\n",
" v_score,\n",
" v_line_score,\n",
" v_at_bats,\n",
" v_hits,\n",
" v_doubles,\n",
" v_triples,\n",
" v_homeruns,\n",
" v_rbi,\n",
" v_sacrifice_hits,\n",
" v_sacrifice_flies,\n",
" v_hit_by_pitch,\n",
" v_walks,\n",
" v_intentional_walks,\n",
" v_strikeouts,\n",
" v_stolen_bases,\n",
" v_caught_stealing,\n",
" v_grounded_into_double,\n",
" v_first_catcher_interference,\n",
" v_left_on_base,\n",
" v_pitchers_used,\n",
" v_individual_earned_runs,\n",
" v_team_earned_runs,\n",
" v_wild_pitches,\n",
" v_balks,\n",
" v_putouts,\n",
" v_assists,\n",
" v_errors,\n",
" v_passed_balls,\n",
" v_double_plays,\n",
" v_triple_plays\n",
" from game_log;\n",
"\"\"\"\n",
"\n",
"run_command(c2)\n",
"\n",
"q = \"\"\"\n",
"SELECT * FROM team_appearance\n",
"WHERE game_id = (SELECT MIN(game_id) FROM team_appearance)\n",
" OR game_id = (SELECT MAX(game_id) FROM team_appearance)\n",
"ORDER By game_id, home;\n",
"\"\"\"\n",
"\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Person Appearance`\n",
"\n",
"The final table we need to create is person_appearance. It has foreign key relations to four tables:\n",
" - team\n",
" - person\n",
" - game\n",
" - appearance_type\n",
" \n",
"The person_appearance table will be used to store information on appearances in games by managers, players, and umpires as detailed in the appearance_type table.\n",
"\n",
"We'll need to use a similar technique to insert data as we used with the team_appearance table, however we will have to write much larger queries - one for each column instead of one for each team as before. We will need to work out for each column what the appearance_type_id will be by cross-referencing the columns with the appearance_type table.\n",
"\n",
"We have decided to create an integer primary key for this table, because having every column be a compound primary quickly becomes cumbersome when writing queries.\n",
"\n",
"When we get to the offensive and defensive positions for both teams, we essentially are performing 36 permutations: 2 (home, away) * 2 (offense + defense) * 9 (9 positions).\n",
"\n",
"To save us from manually copying this out, we can instead use a loop and python string formatting to generate the queries:\n",
"\n",
"```\n",
"template = \"\"\"\n",
"INSERT INTO person_appearance (\n",
" game_id,\n",
" team_id,\n",
" person_id,\n",
" appearance_type_id\n",
")\n",
" SELECT\n",
" game_id,\n",
" {hv}_name,\n",
" {hv}_player_{num}_id,\n",
" \"O{num}\"\n",
" FROM game_log\n",
" WHERE {hv}_player_{num}_id IS NOT NULL\n",
"UNION\n",
" SELECT\n",
" game_id,\n",
" {hv}_name,\n",
" {hv}_player_{num}_id,\n",
" \"D\" || CAST({hv}_player_{num}_def_pos AS INT)\n",
" FROM game_log\n",
" WHERE {hv}_player_{num}_id IS NOT NULL;\n",
"\"\"\"\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"\n",
"for hv in [\"h\",\"v\"]:\n",
" for num in range(1,10):\n",
" query_vars = {\n",
" \"hv\": hv,\n",
" \"num\": num\n",
" }\n",
" # run commmand is a helper function which runs\n",
" # a query against our database.\n",
" run_command(template.format(**query_vars))\n",
"```\n",
"\n",
"However, we will still need to manually write down queries for inserting data related to umpires, managers, pitchers and awards."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"c0 = \"DROP TABLE IF EXISTS person_appearance\"\n",
"\n",
"run_command(c0)\n",
"\n",
"c1 = \"\"\"\n",
"CREATE TABLE person_appearance (\n",
" appearance_id INTEGER PRIMARY KEY,\n",
" person_id TEXT,\n",
" team_id TEXT,\n",
" game_id TEXT,\n",
" appearance_type_id,\n",
" FOREIGN KEY (person_id) REFERENCES person(person_id),\n",
" FOREIGN KEY (team_id) REFERENCES team(team_id),\n",
" FOREIGN KEY (game_id) REFERENCES game(game_id),\n",
" FOREIGN KEY (appearance_type_id) REFERENCES appearance_type(appearance_type_id)\n",
");\n",
"\"\"\"\n",
"\n",
"c2 = \"\"\"\n",
"INSERT OR IGNORE INTO person_appearance (\n",
" game_id,\n",
" team_id,\n",
" person_id,\n",
" appearance_type_id\n",
") \n",
" SELECT\n",
" game_id,\n",
" NULL,\n",
" hp_umpire_id,\n",
" \"UHP\"\n",
" FROM game_log\n",
" WHERE hp_umpire_id IS NOT NULL \n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" NULL,\n",
" [1b_umpire_id],\n",
" \"U1B\"\n",
" FROM game_log\n",
" WHERE \"1b_umpire_id\" IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" NULL,\n",
" [2b_umpire_id],\n",
" \"U2B\"\n",
" FROM game_log\n",
" WHERE [2b_umpire_id] IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" NULL,\n",
" [3b_umpire_id],\n",
" \"U3B\"\n",
" FROM game_log\n",
" WHERE [3b_umpire_id] IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" NULL,\n",
" lf_umpire_id,\n",
" \"ULF\"\n",
" FROM game_log\n",
" WHERE lf_umpire_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" NULL,\n",
" rf_umpire_id,\n",
" \"URF\"\n",
" FROM game_log\n",
" WHERE rf_umpire_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" v_name,\n",
" v_manager_id,\n",
" \"MM\"\n",
" FROM game_log\n",
" WHERE v_manager_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" h_name,\n",
" h_manager_id,\n",
" \"MM\"\n",
" FROM game_log\n",
" WHERE h_manager_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" CASE\n",
" WHEN h_score > v_score THEN h_name\n",
" ELSE v_name\n",
" END,\n",
" winning_pitcher_id,\n",
" \"AWP\"\n",
" FROM game_log\n",
" WHERE winning_pitcher_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" CASE\n",
" WHEN h_score < v_score THEN h_name\n",
" ELSE v_name\n",
" END,\n",
" losing_pitcher_id,\n",
" \"ALP\"\n",
" FROM game_log\n",
" WHERE losing_pitcher_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" CASE\n",
" WHEN h_score > v_score THEN h_name\n",
" ELSE v_name\n",
" END,\n",
" saving_pitcher_id,\n",
" \"ASP\"\n",
" FROM game_log\n",
" WHERE saving_pitcher_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" CASE\n",
" WHEN h_score > v_score THEN h_name\n",
" ELSE v_name\n",
" END,\n",
" winning_rbi_batter_id,\n",
" \"AWB\"\n",
" FROM game_log\n",
" WHERE winning_rbi_batter_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" v_name,\n",
" v_starting_pitcher_id,\n",
" \"PSP\"\n",
" FROM game_log\n",
" WHERE v_starting_pitcher_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" h_name,\n",
" h_starting_pitcher_id,\n",
" \"PSP\"\n",
" FROM game_log\n",
" WHERE h_starting_pitcher_id IS NOT NULL;\n",
"\"\"\"\n",
"\n",
"template = \"\"\"\n",
"INSERT INTO person_appearance (\n",
" game_id,\n",
" team_id,\n",
" person_id,\n",
" appearance_type_id\n",
") \n",
" SELECT\n",
" game_id,\n",
" {hv}_name,\n",
" {hv}_player_{num}_id,\n",
" \"O{num}\"\n",
" FROM game_log\n",
" WHERE {hv}_player_{num}_id IS NOT NULL\n",
"\n",
"UNION\n",
"\n",
" SELECT\n",
" game_id,\n",
" {hv}_name,\n",
" {hv}_player_{num}_id,\n",
" \"D\" || CAST({hv}_player_{num}_def_pos AS INT)\n",
" FROM game_log\n",
" WHERE {hv}_player_{num}_id IS NOT NULL;\n",
"\"\"\"\n",
"\n",
"run_command(c1)\n",
"run_command(c2)\n",
"\n",
"for hv in [\"h\",\"v\"]:\n",
" for num in range(1,10):\n",
" query_vars = {\n",
" \"hv\": hv,\n",
" \"num\": num\n",
" }\n",
" run_command(template.format(**query_vars))"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" games_game\n",
"0 171907\n",
" games_person_appearance\n",
"0 171907\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" appearance_id | \n",
" person_id | \n",
" team_id | \n",
" game_id | \n",
" appearance_type_id | \n",
" name | \n",
" category | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1646114 | \n",
" steab101 | \n",
" None | \n",
" WSU188409250 | \n",
" UHP | \n",
" Home Plate | \n",
" umpire | \n",
"
\n",
" \n",
" 1 | \n",
" 1646116 | \n",
" murnt101 | \n",
" BSU | \n",
" WSU188409250 | \n",
" MM | \n",
" Manager | \n",
" manager | \n",
"
\n",
" \n",
" 2 | \n",
" 1646115 | \n",
" crane101 | \n",
" BSU | \n",
" WSU188409250 | \n",
" PSP | \n",
" Starting Pitcher | \n",
" pitcher | \n",
"
\n",
" \n",
" 3 | \n",
" 1646118 | \n",
" scanm801 | \n",
" WSU | \n",
" WSU188409250 | \n",
" MM | \n",
" Manager | \n",
" manager | \n",
"
\n",
" \n",
" 4 | \n",
" 1646117 | \n",
" dailh101 | \n",
" WSU | \n",
" WSU188409250 | \n",
" PSP | \n",
" Starting Pitcher | \n",
" pitcher | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" appearance_id person_id team_id game_id appearance_type_id \\\n",
"0 1646114 steab101 None WSU188409250 UHP \n",
"1 1646116 murnt101 BSU WSU188409250 MM \n",
"2 1646115 crane101 BSU WSU188409250 PSP \n",
"3 1646118 scanm801 WSU WSU188409250 MM \n",
"4 1646117 dailh101 WSU WSU188409250 PSP \n",
"\n",
" name category \n",
"0 Home Plate umpire \n",
"1 Manager manager \n",
"2 Starting Pitcher pitcher \n",
"3 Manager manager \n",
"4 Starting Pitcher pitcher "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(run_query(\"SELECT COUNT(DISTINCT game_id) games_game FROM game\"))\n",
"print(run_query(\"SELECT COUNT(DISTINCT game_id) games_person_appearance FROM person_appearance\"))\n",
"\n",
"q = \"\"\"\n",
"SELECT\n",
" pa.*,\n",
" at.name,\n",
" at.category\n",
"FROM person_appearance pa\n",
"INNER JOIN appearance_type at on at.appearance_type_id = pa.appearance_type_id\n",
"WHERE PA.game_id = (\n",
" SELECT max(game_id)\n",
" FROM person_appearance\n",
" )\n",
"ORDER BY team_id, appearance_type_id\n",
"\"\"\"\n",
"\n",
"run_query(q)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Remove the Original Tables\n",
"\n",
"Drop the tables we created to hold our unnormalized data:\n",
"- game_log\n",
"- park_codes\n",
"- team_codes\n",
"- person_codes"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" type | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" person | \n",
" table | \n",
"
\n",
" \n",
" 1 | \n",
" park | \n",
" table | \n",
"
\n",
" \n",
" 2 | \n",
" league | \n",
" table | \n",
"
\n",
" \n",
" 3 | \n",
" team | \n",
" table | \n",
"
\n",
" \n",
" 4 | \n",
" game | \n",
" table | \n",
"
\n",
" \n",
" 5 | \n",
" team_appearance | \n",
" table | \n",
"
\n",
" \n",
" 6 | \n",
" game_log | \n",
" table | \n",
"
\n",
" \n",
" 7 | \n",
" park_codes | \n",
" table | \n",
"
\n",
" \n",
" 8 | \n",
" person_codes | \n",
" table | \n",
"
\n",
" \n",
" 9 | \n",
" team_codes | \n",
" table | \n",
"
\n",
" \n",
" 10 | \n",
" appearance_type | \n",
" table | \n",
"
\n",
" \n",
" 11 | \n",
" person_appearance | \n",
" table | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name type\n",
"0 person table\n",
"1 park table\n",
"2 league table\n",
"3 team table\n",
"4 game table\n",
"5 team_appearance table\n",
"6 game_log table\n",
"7 park_codes table\n",
"8 person_codes table\n",
"9 team_codes table\n",
"10 appearance_type table\n",
"11 person_appearance table"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"show_tables()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"tables = [\n",
" \"game_log\",\n",
" \"park_codes\",\n",
" \"team_codes\",\n",
" \"person_codes\"\n",
"]\n",
"\n",
"for t in tables:\n",
" c = '''\n",
" DROP TABLE {}\n",
" '''.format(t)\n",
" \n",
" run_command(c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Our new database consists of the following mentioned tables***"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" type | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" person | \n",
" table | \n",
"
\n",
" \n",
" 1 | \n",
" park | \n",
" table | \n",
"
\n",
" \n",
" 2 | \n",
" league | \n",
" table | \n",
"
\n",
" \n",
" 3 | \n",
" team | \n",
" table | \n",
"
\n",
" \n",
" 4 | \n",
" game | \n",
" table | \n",
"
\n",
" \n",
" 5 | \n",
" team_appearance | \n",
" table | \n",
"
\n",
" \n",
" 6 | \n",
" appearance_type | \n",
" table | \n",
"
\n",
" \n",
" 7 | \n",
" person_appearance | \n",
" table | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name type\n",
"0 person table\n",
"1 park table\n",
"2 league table\n",
"3 team table\n",
"4 game table\n",
"5 team_appearance table\n",
"6 appearance_type table\n",
"7 person_appearance table"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"show_tables()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### *Conclusion: We have a new Databse `ml.db` with normalised tables*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}