{ "metadata": { "name": "", "signature": "sha256:d66bda4ad596e7d1038472946b691f727760aa36c6a5bcb7a9e63f8067624882" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"Blaze\n", "\n", "# Getting Started with Blaze\n", "\n", "* Full tutorial available at http://github.com/ContinuumIO/blaze-tutorial\n", "* Install software with `conda install -c blaze blaze`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Using Blaze with Foreign Systems\n", "\n", "Blaze gives us an interactive experience over data living in foreign systems, like a SQL database." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from blaze import Data, join, by, transform\n", "\n", "db = Data('sqlite:///data/lahman2013.sqlite')\n", "db.Teams" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ " yearID lgID teamID franchID divID Rank G Ghome W L ... \\\n", "0 1871 NA BS1 BNA None 3 31 NaN 20 10 ... \n", "1 1871 NA CH1 CNA None 2 28 NaN 19 9 ... \n", "2 1871 NA CL1 CFC None 8 29 NaN 10 19 ... \n", "3 1871 NA FW1 KEK None 7 19 NaN 7 12 ... \n", "4 1871 NA NY2 NNA None 5 33 NaN 16 17 ... \n", "5 1871 NA PH1 PNA None 1 28 NaN 21 7 ... \n", "6 1871 NA RC1 ROK None 9 25 NaN 4 21 ... \n", "7 1871 NA TRO TRO None 6 29 NaN 13 15 ... \n", "8 1871 NA WS3 OLY None 4 32 NaN 15 15 ... \n", "9 1872 NA BL1 BLC None 2 58 NaN 35 19 ... \n", "10 1872 NA BR1 ECK None 9 29 NaN 3 26 ... \n", "\n", " DP FP name park \\\n", "0 NaN 0.83 Boston Red Stockings South End Grounds I \n", "1 NaN 0.82 Chicago White Stockings Union Base-Ball Grounds \n", "2 NaN 0.81 Cleveland Forest Citys National Association Grounds \n", "3 NaN 0.80 Fort Wayne Kekiongas Hamilton Field \n", "4 NaN 0.83 New York Mutuals Union Grounds (Brooklyn) \n", "5 NaN 0.84 Philadelphia Athletics Jefferson Street Grounds \n", "6 NaN 0.82 Rockford Forest Citys Agricultural Society Fair Grounds \n", "7 NaN 0.84 Troy Haymakers Haymakers' Grounds \n", "8 NaN 0.85 Washington Olympics Olympics Grounds \n", "9 NaN 0.82 Baltimore Canaries Newington Park \n", "10 NaN 0.79 Brooklyn Eckfords Union Grounds \n", "\n", " attendance BPF PPF teamIDBR teamIDlahman45 teamIDretro \n", "0 NaN 103 98 BOS BS1 BS1 \n", "1 NaN 104 102 CHI CH1 CH1 \n", "2 NaN 96 100 CLE CL1 CL1 \n", "3 NaN 101 107 KEK FW1 FW1 \n", "4 NaN 90 88 NYU NY2 NY2 \n", "5 NaN 102 98 ATH PH1 PH1 \n", "6 NaN 97 99 ROK RC1 RC1 \n", "7 NaN 101 100 TRO TRO TRO \n", "8 NaN 94 98 OLY WS3 WS3 \n", "9 NaN 106 102 BAL BL1 BL1 \n", "10 NaN 87 96 ECK BR1 BR1 \n", "\n", "..." ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use all of our standard queries" ] }, { "cell_type": "code", "collapsed": false, "input": [ "db.Teams[db.Teams.name == 'Chicago White Stockings'][['name', 'yearID', 'park']]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ " name yearID park\n", "0 Chicago White Stockings 1871 Union Base-Ball Grounds\n", "1 Chicago White Stockings 1874 23rd Street Grounds\n", "2 Chicago White Stockings 1875 23rd Street Grounds\n", "3 Chicago White Stockings 1876 23rd Street Grounds\n", "4 Chicago White Stockings 1877 23rd Street Grounds\n", "5 Chicago White Stockings 1878 Lake Front Park I\n", "6 Chicago White Stockings 1879 Lake Front Park I\n", "7 Chicago White Stockings 1880 Lake Front Park I\n", "8 Chicago White Stockings 1881 Lake Front Park I\n", "9 Chicago White Stockings 1882 Lake Front Park I/Lake Front Park II\n", "..." ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "by(db.Teams.name, start_year=db.Teams.yearID.min(), \n", " end_year=db.Teams.yearID.max()).sort('end_year', ascending=False)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ " name end_year start_year\n", "0 Arizona Diamondbacks 2013 1998\n", "1 Atlanta Braves 2013 1966\n", "2 Baltimore Orioles 2013 1882\n", "3 Boston Red Sox 2013 1908\n", "4 Chicago Cubs 2013 1903\n", "5 Chicago White Sox 2013 1901\n", "6 Cincinnati Reds 2013 1876\n", "7 Cleveland Indians 2013 1915\n", "8 Colorado Rockies 2013 1993\n", "9 Detroit Tigers 2013 1901\n", "..." ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### These queries happen in the database\n", "\n", "Note that the Blaze software doesn't perform computations, the database does. Blaze is just a friendly intermediary between you and your database.\n", "\n", "Here we see what the generated SQL looks like. You generally don't need to do this at home." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from blaze import compute # compute is internal API, not intended for users\n", "\n", "expr = db.Teams[db.Teams.name == 'Chicago White Stockings'][['name', 'yearID', 'park']]\n", "print compute(expr, post_compute=False)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "SELECT \"Teams\".name, \"Teams\".\"yearID\", \"Teams\".park \n", "FROM \"Teams\" \n", "WHERE \"Teams\".name = ?\n" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "expr = by(db.Teams.name, start_year=db.Teams.yearID.min(), \n", " end_year=db.Teams.yearID.max()).sort('end_year', ascending=False)\n", "print compute(expr, post_compute=False)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "SELECT \"Teams\".name, max(\"Teams\".\"yearID\") AS end_year, min(\"Teams\".\"yearID\") AS start_year \n", "FROM \"Teams\" GROUP BY \"Teams\".name ORDER BY end_year DESC\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercises\n", "\n", "First, look at the `db.dshape` to get an idea of what we have in the database. This will tell you what tables are in the database as well as what columns we have in each table." ] }, { "cell_type": "code", "collapsed": false, "input": [ "db.dshape" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Play for a while with these tables inspecting data and writing a few queries that might interest you.\n", "\n", "If you're having trouble finding interesting queries you might try the following:\n", "\n", "* How many unique pitchers were there in each year?\n", "* In what years did Hank Aaron play? When was his last year? (Hank has the playerID `aaronha01`)\n", "* What was the average salary per year. Has it increased or decreased over time?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }