{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"images/continuum_analytics_logo.png\" \n",
    "                                alt=\"Continuum Logo\",\n",
    "                                align=\"right\",\n",
    "                                width=\"30%\">,\n",
    "\n",
    "Introduction to Blaze\n",
    "=====================\n",
    "\n",
    "In this tutorial we'll learn how to use Blaze to discover, migrate, and query data living in other databases.  Generally this tutorial will have the following format\n",
    "\n",
    "1. `odo` - Move data to database\n",
    "2. `blaze` - Query data in database\n",
    "\n",
    "\n",
    "Install\n",
    "-------\n",
    "\n",
    "This tutorial uses many different libraries that are all available with the [Anaconda Distribution](http://continuum.io/downloads). Once you have Anaconda install, please run these commands from a terminal:\n",
    "\n",
    "```\n",
    "$ conda install -y blaze\n",
    "$ conda install -y bokeh\n",
    "$ conda install -y odo\n",
    "```\n",
    "\n",
    "nbviewer: http://nbviewer.ipython.org/github/ContinuumIO/pydata-apps/blob/master/Section-1_blaze.ipynb\n",
    "\n",
    "github: https://github.com/ContinuumIO/pydata-apps\n",
    "\n",
    "<hr/>\n",
    "\n",
    "\n",
    "Goal: Accessible, Interactive, Analytic Queries\n",
    "-----------------------------------------------\n",
    "\n",
    "NumPy and Pandas provide accessible, interactive, analytic queries; this is valuable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>SepalLength</th>\n",
       "      <th>SepalWidth</th>\n",
       "      <th>PetalLength</th>\n",
       "      <th>PetalWidth</th>\n",
       "      <th>Species</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>4.6</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  SepalLength  SepalWidth  PetalLength  PetalWidth Species\n",
       "0           1          5.1         3.5          1.4         0.2  setosa\n",
       "1           2          4.9         3.0          1.4         0.2  setosa\n",
       "2           3          4.7         3.2          1.3         0.2  setosa\n",
       "3           4          4.6         3.1          1.5         0.2  setosa\n",
       "4           5          5.0         3.6          1.4         0.2  setosa"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('iris.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Species\n",
       "setosa        1.462\n",
       "versicolor    4.260\n",
       "virginica     5.552\n",
       "Name: PetalLength, dtype: float64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(df.Species).PetalLength.mean()  # Average petal length per species"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "But as data grows and systems become more complex, moving data and querying data become more difficult. Python already has excellent tools for data that fits in memory, but we want to hook up to data that is inconvenient.\n",
    "\n",
    "From now on, we're going to assume one of the following:\n",
    "\n",
    "1.  You have an inconvenient amount of data\n",
    "2.  That data should live someplace other than your computer\n",
    "\n",
    "<hr/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Databases and Python\n",
    "--------------------\n",
    "\n",
    "When in-memory arrays/dataframes cease to be an option, we turn to databases.  These live outside of the Python process and so might be less convenient.  The open source Python ecosystem includes libraries to interact with these databases and with foreign data in general.  \n",
    "\n",
    "Examples:\n",
    "\n",
    "* SQL - [`sqlalchemy`](http://sqlalchemy.org) \n",
    "    * Hive/Cassandra - [`pyhive`](https://github.com/dropbox/PyHive)\n",
    "    * Impala  - [`impyla`](https://github.com/cloudera/impyla)\n",
    "    * RedShift - [`redshift-sqlalchemy`](https://pypi.python.org/pypi/redshift-sqlalchemy)\n",
    "    * ...\n",
    "* MongoDB - [`pymongo`](http://api.mongodb.org/python/current/)\n",
    "* HBase - [`happybase`](http://happybase.readthedocs.org/en/latest/)\n",
    "* Spark - [`pyspark`](http://spark.apache.org/docs/latest/api/python/)\n",
    "* SSH - [`paramiko`](http://www.paramiko.org/)\n",
    "* HDFS - [`pywebhdfs`](https://pypi.python.org/pypi/pywebhdfs)\n",
    "* Amazon S3 - [`boto`](https://boto.readthedocs.org/en/latest/)\n",
    "\n",
    "Today we're going to use some of these indirectly with `odo` (was `into`) and Blaze.  We'll try to point out these libraries as we automate them so that, if you'd like, you can use them independently.\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"images/continuum_analytics_logo.png\" \n",
    "                                alt=\"Continuum Logo\",\n",
    "                                align=\"right\",\n",
    "                                width=\"30%\">,\n",
    "\n",
    "`odo` (formerly `into`)\n",
    "=======================\n",
    "\n",
    "Odo migrates data between formats and locations.\n",
    "\n",
    "Before we can use a database we need to move data into it.  The `odo` project provides a single consistent interface to move data between formats and between locations.\n",
    "\n",
    "We'll start with local data and eventually move out to remote data.\n",
    "\n",
    "[*odo docs*](http://odo.readthedocs.org/en/latest/index.html)\n",
    "\n",
    "\n",
    "<hr/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Examples\n",
    "\n",
    "\n",
    "Odo moves data into a target from a source\n",
    "\n",
    "```python\n",
    ">>> odo(source, target)\n",
    "```\n",
    "\n",
    "The target and source can be either a Python object or a string URI.  The following are all valid calls to `into`\n",
    "\n",
    "```python\n",
    ">>> odo('iris.csv', pd.DataFrame)  # Load CSV file into new DataFrame\n",
    ">>> odo(my_df, 'iris.json')        # Write DataFrame into JSON file\n",
    ">>> odo('iris.csv', 'iris.json')   # Migrate data from CSV to JSON\n",
    "```\n",
    "\n",
    "<hr/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "Use `odo` to load the `iris.csv` file into a Python `list`, a `np.ndarray`, and a `pd.DataFrame`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from odo import odo\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>SepalLength</th>\n",
       "      <th>SepalWidth</th>\n",
       "      <th>PetalLength</th>\n",
       "      <th>PetalWidth</th>\n",
       "      <th>Species</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>4.6</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>5.4</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.4</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>4.6</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.3</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>4.4</td>\n",
       "      <td>2.9</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>4.9</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11</td>\n",
       "      <td>5.4</td>\n",
       "      <td>3.7</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>12</td>\n",
       "      <td>4.8</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>4.8</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.1</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14</td>\n",
       "      <td>4.3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.1</td>\n",
       "      <td>0.1</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>15</td>\n",
       "      <td>5.8</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.2</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>16</td>\n",
       "      <td>5.7</td>\n",
       "      <td>4.4</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.4</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>17</td>\n",
       "      <td>5.4</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.4</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>18</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.3</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>19</td>\n",
       "      <td>5.7</td>\n",
       "      <td>3.8</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.3</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>20</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.8</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.3</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>21</td>\n",
       "      <td>5.4</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>22</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.7</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.4</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>23</td>\n",
       "      <td>4.6</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>24</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.3</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.5</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>25</td>\n",
       "      <td>4.8</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.9</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>26</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>27</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.4</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>28</td>\n",
       "      <td>5.2</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>29</td>\n",
       "      <td>5.2</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>30</td>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>121</td>\n",
       "      <td>6.9</td>\n",
       "      <td>3.2</td>\n",
       "      <td>5.7</td>\n",
       "      <td>2.3</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>122</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.8</td>\n",
       "      <td>4.9</td>\n",
       "      <td>2.0</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>123</td>\n",
       "      <td>7.7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>6.7</td>\n",
       "      <td>2.0</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>124</td>\n",
       "      <td>6.3</td>\n",
       "      <td>2.7</td>\n",
       "      <td>4.9</td>\n",
       "      <td>1.8</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>124</th>\n",
       "      <td>125</td>\n",
       "      <td>6.7</td>\n",
       "      <td>3.3</td>\n",
       "      <td>5.7</td>\n",
       "      <td>2.1</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>125</th>\n",
       "      <td>126</td>\n",
       "      <td>7.2</td>\n",
       "      <td>3.2</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.8</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126</th>\n",
       "      <td>127</td>\n",
       "      <td>6.2</td>\n",
       "      <td>2.8</td>\n",
       "      <td>4.8</td>\n",
       "      <td>1.8</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>128</td>\n",
       "      <td>6.1</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.9</td>\n",
       "      <td>1.8</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>129</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2.8</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.1</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>130</td>\n",
       "      <td>7.2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.8</td>\n",
       "      <td>1.6</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>131</td>\n",
       "      <td>7.4</td>\n",
       "      <td>2.8</td>\n",
       "      <td>6.1</td>\n",
       "      <td>1.9</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131</th>\n",
       "      <td>132</td>\n",
       "      <td>7.9</td>\n",
       "      <td>3.8</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2.0</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>133</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2.8</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.2</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133</th>\n",
       "      <td>134</td>\n",
       "      <td>6.3</td>\n",
       "      <td>2.8</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>134</th>\n",
       "      <td>135</td>\n",
       "      <td>6.1</td>\n",
       "      <td>2.6</td>\n",
       "      <td>5.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>136</td>\n",
       "      <td>7.7</td>\n",
       "      <td>3.0</td>\n",
       "      <td>6.1</td>\n",
       "      <td>2.3</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>136</th>\n",
       "      <td>137</td>\n",
       "      <td>6.3</td>\n",
       "      <td>3.4</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.4</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>138</td>\n",
       "      <td>6.4</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.5</td>\n",
       "      <td>1.8</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138</th>\n",
       "      <td>139</td>\n",
       "      <td>6.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.8</td>\n",
       "      <td>1.8</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>140</td>\n",
       "      <td>6.9</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.4</td>\n",
       "      <td>2.1</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140</th>\n",
       "      <td>141</td>\n",
       "      <td>6.7</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.4</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>141</th>\n",
       "      <td>142</td>\n",
       "      <td>6.9</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>2.3</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>143</td>\n",
       "      <td>5.8</td>\n",
       "      <td>2.7</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.9</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>144</td>\n",
       "      <td>6.8</td>\n",
       "      <td>3.2</td>\n",
       "      <td>5.9</td>\n",
       "      <td>2.3</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>144</th>\n",
       "      <td>145</td>\n",
       "      <td>6.7</td>\n",
       "      <td>3.3</td>\n",
       "      <td>5.7</td>\n",
       "      <td>2.5</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>145</th>\n",
       "      <td>146</td>\n",
       "      <td>6.7</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.2</td>\n",
       "      <td>2.3</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146</th>\n",
       "      <td>147</td>\n",
       "      <td>6.3</td>\n",
       "      <td>2.5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.9</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>148</td>\n",
       "      <td>6.5</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.2</td>\n",
       "      <td>2.0</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>148</th>\n",
       "      <td>149</td>\n",
       "      <td>6.2</td>\n",
       "      <td>3.4</td>\n",
       "      <td>5.4</td>\n",
       "      <td>2.3</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>150</td>\n",
       "      <td>5.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.8</td>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Unnamed: 0  SepalLength  SepalWidth  PetalLength  PetalWidth    Species\n",
       "0             1          5.1         3.5          1.4         0.2     setosa\n",
       "1             2          4.9         3.0          1.4         0.2     setosa\n",
       "2             3          4.7         3.2          1.3         0.2     setosa\n",
       "3             4          4.6         3.1          1.5         0.2     setosa\n",
       "4             5          5.0         3.6          1.4         0.2     setosa\n",
       "5             6          5.4         3.9          1.7         0.4     setosa\n",
       "6             7          4.6         3.4          1.4         0.3     setosa\n",
       "7             8          5.0         3.4          1.5         0.2     setosa\n",
       "8             9          4.4         2.9          1.4         0.2     setosa\n",
       "9            10          4.9         3.1          1.5         0.1     setosa\n",
       "10           11          5.4         3.7          1.5         0.2     setosa\n",
       "11           12          4.8         3.4          1.6         0.2     setosa\n",
       "12           13          4.8         3.0          1.4         0.1     setosa\n",
       "13           14          4.3         3.0          1.1         0.1     setosa\n",
       "14           15          5.8         4.0          1.2         0.2     setosa\n",
       "15           16          5.7         4.4          1.5         0.4     setosa\n",
       "16           17          5.4         3.9          1.3         0.4     setosa\n",
       "17           18          5.1         3.5          1.4         0.3     setosa\n",
       "18           19          5.7         3.8          1.7         0.3     setosa\n",
       "19           20          5.1         3.8          1.5         0.3     setosa\n",
       "20           21          5.4         3.4          1.7         0.2     setosa\n",
       "21           22          5.1         3.7          1.5         0.4     setosa\n",
       "22           23          4.6         3.6          1.0         0.2     setosa\n",
       "23           24          5.1         3.3          1.7         0.5     setosa\n",
       "24           25          4.8         3.4          1.9         0.2     setosa\n",
       "25           26          5.0         3.0          1.6         0.2     setosa\n",
       "26           27          5.0         3.4          1.6         0.4     setosa\n",
       "27           28          5.2         3.5          1.5         0.2     setosa\n",
       "28           29          5.2         3.4          1.4         0.2     setosa\n",
       "29           30          4.7         3.2          1.6         0.2     setosa\n",
       "..          ...          ...         ...          ...         ...        ...\n",
       "120         121          6.9         3.2          5.7         2.3  virginica\n",
       "121         122          5.6         2.8          4.9         2.0  virginica\n",
       "122         123          7.7         2.8          6.7         2.0  virginica\n",
       "123         124          6.3         2.7          4.9         1.8  virginica\n",
       "124         125          6.7         3.3          5.7         2.1  virginica\n",
       "125         126          7.2         3.2          6.0         1.8  virginica\n",
       "126         127          6.2         2.8          4.8         1.8  virginica\n",
       "127         128          6.1         3.0          4.9         1.8  virginica\n",
       "128         129          6.4         2.8          5.6         2.1  virginica\n",
       "129         130          7.2         3.0          5.8         1.6  virginica\n",
       "130         131          7.4         2.8          6.1         1.9  virginica\n",
       "131         132          7.9         3.8          6.4         2.0  virginica\n",
       "132         133          6.4         2.8          5.6         2.2  virginica\n",
       "133         134          6.3         2.8          5.1         1.5  virginica\n",
       "134         135          6.1         2.6          5.6         1.4  virginica\n",
       "135         136          7.7         3.0          6.1         2.3  virginica\n",
       "136         137          6.3         3.4          5.6         2.4  virginica\n",
       "137         138          6.4         3.1          5.5         1.8  virginica\n",
       "138         139          6.0         3.0          4.8         1.8  virginica\n",
       "139         140          6.9         3.1          5.4         2.1  virginica\n",
       "140         141          6.7         3.1          5.6         2.4  virginica\n",
       "141         142          6.9         3.1          5.1         2.3  virginica\n",
       "142         143          5.8         2.7          5.1         1.9  virginica\n",
       "143         144          6.8         3.2          5.9         2.3  virginica\n",
       "144         145          6.7         3.3          5.7         2.5  virginica\n",
       "145         146          6.7         3.0          5.2         2.3  virginica\n",
       "146         147          6.3         2.5          5.0         1.9  virginica\n",
       "147         148          6.5         3.0          5.2         2.0  virginica\n",
       "148         149          6.2         3.4          5.4         2.3  virginica\n",
       "149         150          5.9         3.0          5.1         1.8  virginica\n",
       "\n",
       "[150 rows x 6 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "odo(\"iris.csv\", pd.DataFrame)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "\n",
    "URI Strings\n",
    "-----------\n",
    "\n",
    "Odo refers to foreign data either with a Python object like a `sqlalchemy.Table` object for a SQL table, or with a string URI, like `postgresql://hostname::tablename`.\n",
    "\n",
    "URI's often take on the following form\n",
    "\n",
    "    protocol://path-to-resource::path-within-resource\n",
    "    \n",
    "Where `path-to-resource` might point to a file, a database hostname, etc. while `path-within-resource` might refer to a datapath or table name.  Note the two main separators\n",
    "\n",
    "*   `://` separates the protocol on the left (`sqlite`, `mongodb`, `ssh`, `hdfs`, `hive`, ...)\n",
    "*   `::` separates the path within the database on the right (e.g. tablename)\n",
    "\n",
    "[*odo docs on uri strings*](http://odo.readthedocs.org/en/latest/uri.html)\n",
    "\n",
    "<hr/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Examples\n",
    "\n",
    "Here are some example URIs\n",
    "\n",
    "```\n",
    "myfile.json\n",
    "myfiles.*.csv'\n",
    "postgresql://hostname::tablename\n",
    "mongodb://hostname/db::collection\n",
    "ssh://user@host:/path/to/myfile.csv\n",
    "hdfs://user@host:/path/to/*.csv\n",
    "```\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "Migrate your CSV file into a table named `iris` in a new SQLite database at `sqlite:///my.db`.  Remember to use the `::` separator and to separate your database name from your table name.\n",
    "\n",
    "[*odo docs on SQL*](http://odo.readthedocs.org/en/latest/sql.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Table('iris', MetaData(bind=Engine(sqlite:///my.db)), Column('Unnamed: 0', BIGINT(), table=<iris>, nullable=False), Column('SepalLength', FLOAT(), table=<iris>), Column('SepalWidth', FLOAT(), table=<iris>), Column('PetalLength', FLOAT(), table=<iris>), Column('PetalWidth', FLOAT(), table=<iris>), Column('Species', TEXT(), table=<iris>), schema=None)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "odo(\"iris.csv\", \"sqlite:///my.db::iris\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What kind of object did you get receive as output?  Call `type` on your result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sqlalchemy.sql.schema.Table"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "How it works\n",
    "------------\n",
    "\n",
    "Odo is a network of fast pairwise conversions between pairs of formats.  We when we migrate between two formats we traverse a path of pairwise conversions.\n",
    "\n",
    "We visualize that network below:\n",
    "\n",
    "![](images/conversions.png)\n",
    "\n",
    "Each node represents a data format. Each directed edge represents a function to transform data between two formats. A single call to into may traverse multiple edges and multiple intermediate formats. Red nodes support larger-than-memory data.\n",
    "\n",
    "A single call to into may traverse several intermediate formats calling on several conversion functions.   For example, we when migrate a CSV file to a Mongo database we might take the following route:\n",
    "\n",
    "* Load in to a `DataFrame` (`pandas.read_csv`)\n",
    "* Convert to `np.recarray` (`DataFrame.to_records`)\n",
    "* Then to a Python `Iterator` (`np.ndarray.tolist`)\n",
    "* Finally to Mongo (`pymongo.Collection.insert`)\n",
    "\n",
    "Alternatively we could write a special function that uses MongoDB's native CSV\n",
    "loader and shortcut this entire process with a direct edge `CSV -> Mongo`.\n",
    "\n",
    "These functions are chosen because they are fast, often far faster than converting through a central serialization format.\n",
    "\n",
    "This picture is actually from an older version of `odo`, when the graph was still small enough to visualize pleasantly.  See [*odo docs*](http://odo.readthedocs.org/en/latest/overview.html) for a more updated version.\n",
    "\n",
    "<hr/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remote Data\n",
    "-----------\n",
    "\n",
    "We can interact with remote data in three locations\n",
    "\n",
    "1.  On Amazon's S3 (this will be quick)\n",
    "2.  On a remote machine via `ssh`\n",
    "3.  On the Hadoop File System (HDFS)\n",
    "\n",
    "For most of this we'll wait until we've seen Blaze, briefly we'll use S3.\n",
    "\n",
    "### S3\n",
    "\n",
    "For now, we quickly grab a file from Amazon's `S3`.\n",
    "\n",
    "This example depends on  [`boto`](https://boto.readthedocs.org/en/latest/) to interact with S3.\n",
    "\n",
    "    conda install boto\n",
    "\n",
    "[*odo docs on aws*](http://odo.readthedocs.org/en/latest/aws.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_bill</th>\n",
       "      <th>tip</th>\n",
       "      <th>sex</th>\n",
       "      <th>smoker</th>\n",
       "      <th>day</th>\n",
       "      <th>time</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16.99</td>\n",
       "      <td>1.01</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10.34</td>\n",
       "      <td>1.66</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21.01</td>\n",
       "      <td>3.50</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23.68</td>\n",
       "      <td>3.31</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>24.59</td>\n",
       "      <td>3.61</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>25.29</td>\n",
       "      <td>4.71</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>8.77</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>26.88</td>\n",
       "      <td>3.12</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>15.04</td>\n",
       "      <td>1.96</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>14.78</td>\n",
       "      <td>3.23</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10.27</td>\n",
       "      <td>1.71</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>35.26</td>\n",
       "      <td>5.00</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>15.42</td>\n",
       "      <td>1.57</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>18.43</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>14.83</td>\n",
       "      <td>3.02</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>21.58</td>\n",
       "      <td>3.92</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>10.33</td>\n",
       "      <td>1.67</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>16.29</td>\n",
       "      <td>3.71</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>16.97</td>\n",
       "      <td>3.50</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sun</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>20.65</td>\n",
       "      <td>3.35</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>17.92</td>\n",
       "      <td>4.08</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>20.29</td>\n",
       "      <td>2.75</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>15.77</td>\n",
       "      <td>2.23</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>39.42</td>\n",
       "      <td>7.58</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>19.82</td>\n",
       "      <td>3.18</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>17.81</td>\n",
       "      <td>2.34</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>13.37</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>12.69</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>21.70</td>\n",
       "      <td>4.30</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>19.65</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>214</th>\n",
       "      <td>28.17</td>\n",
       "      <td>6.50</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>215</th>\n",
       "      <td>12.90</td>\n",
       "      <td>1.10</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>216</th>\n",
       "      <td>28.15</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>217</th>\n",
       "      <td>11.59</td>\n",
       "      <td>1.50</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>218</th>\n",
       "      <td>7.74</td>\n",
       "      <td>1.44</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>219</th>\n",
       "      <td>30.14</td>\n",
       "      <td>3.09</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>220</th>\n",
       "      <td>12.16</td>\n",
       "      <td>2.20</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>221</th>\n",
       "      <td>13.42</td>\n",
       "      <td>3.48</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>222</th>\n",
       "      <td>8.58</td>\n",
       "      <td>1.92</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>223</th>\n",
       "      <td>15.98</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224</th>\n",
       "      <td>13.42</td>\n",
       "      <td>1.58</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>225</th>\n",
       "      <td>16.27</td>\n",
       "      <td>2.50</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>226</th>\n",
       "      <td>10.09</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Fri</td>\n",
       "      <td>Lunch</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>227</th>\n",
       "      <td>20.45</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>228</th>\n",
       "      <td>13.28</td>\n",
       "      <td>2.72</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>229</th>\n",
       "      <td>22.12</td>\n",
       "      <td>2.88</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>230</th>\n",
       "      <td>24.01</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>231</th>\n",
       "      <td>15.69</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>232</th>\n",
       "      <td>11.61</td>\n",
       "      <td>3.39</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>233</th>\n",
       "      <td>10.77</td>\n",
       "      <td>1.47</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>234</th>\n",
       "      <td>15.53</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>235</th>\n",
       "      <td>10.07</td>\n",
       "      <td>1.25</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>236</th>\n",
       "      <td>12.60</td>\n",
       "      <td>1.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>237</th>\n",
       "      <td>32.83</td>\n",
       "      <td>1.17</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>238</th>\n",
       "      <td>35.83</td>\n",
       "      <td>4.67</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239</th>\n",
       "      <td>29.03</td>\n",
       "      <td>5.92</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240</th>\n",
       "      <td>27.18</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Female</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>241</th>\n",
       "      <td>22.67</td>\n",
       "      <td>2.00</td>\n",
       "      <td>Male</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>242</th>\n",
       "      <td>17.82</td>\n",
       "      <td>1.75</td>\n",
       "      <td>Male</td>\n",
       "      <td>No</td>\n",
       "      <td>Sat</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>243</th>\n",
       "      <td>18.78</td>\n",
       "      <td>3.00</td>\n",
       "      <td>Female</td>\n",
       "      <td>No</td>\n",
       "      <td>Thur</td>\n",
       "      <td>Dinner</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>244 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     total_bill   tip     sex smoker   day    time  size\n",
       "0         16.99  1.01  Female     No   Sun  Dinner     2\n",
       "1         10.34  1.66    Male     No   Sun  Dinner     3\n",
       "2         21.01  3.50    Male     No   Sun  Dinner     3\n",
       "3         23.68  3.31    Male     No   Sun  Dinner     2\n",
       "4         24.59  3.61  Female     No   Sun  Dinner     4\n",
       "5         25.29  4.71    Male     No   Sun  Dinner     4\n",
       "6          8.77  2.00    Male     No   Sun  Dinner     2\n",
       "7         26.88  3.12    Male     No   Sun  Dinner     4\n",
       "8         15.04  1.96    Male     No   Sun  Dinner     2\n",
       "9         14.78  3.23    Male     No   Sun  Dinner     2\n",
       "10        10.27  1.71    Male     No   Sun  Dinner     2\n",
       "11        35.26  5.00  Female     No   Sun  Dinner     4\n",
       "12        15.42  1.57    Male     No   Sun  Dinner     2\n",
       "13        18.43  3.00    Male     No   Sun  Dinner     4\n",
       "14        14.83  3.02  Female     No   Sun  Dinner     2\n",
       "15        21.58  3.92    Male     No   Sun  Dinner     2\n",
       "16        10.33  1.67  Female     No   Sun  Dinner     3\n",
       "17        16.29  3.71    Male     No   Sun  Dinner     3\n",
       "18        16.97  3.50  Female     No   Sun  Dinner     3\n",
       "19        20.65  3.35    Male     No   Sat  Dinner     3\n",
       "20        17.92  4.08    Male     No   Sat  Dinner     2\n",
       "21        20.29  2.75  Female     No   Sat  Dinner     2\n",
       "22        15.77  2.23  Female     No   Sat  Dinner     2\n",
       "23        39.42  7.58    Male     No   Sat  Dinner     4\n",
       "24        19.82  3.18    Male     No   Sat  Dinner     2\n",
       "25        17.81  2.34    Male     No   Sat  Dinner     4\n",
       "26        13.37  2.00    Male     No   Sat  Dinner     2\n",
       "27        12.69  2.00    Male     No   Sat  Dinner     2\n",
       "28        21.70  4.30    Male     No   Sat  Dinner     2\n",
       "29        19.65  3.00  Female     No   Sat  Dinner     2\n",
       "..          ...   ...     ...    ...   ...     ...   ...\n",
       "214       28.17  6.50  Female    Yes   Sat  Dinner     3\n",
       "215       12.90  1.10  Female    Yes   Sat  Dinner     2\n",
       "216       28.15  3.00    Male    Yes   Sat  Dinner     5\n",
       "217       11.59  1.50    Male    Yes   Sat  Dinner     2\n",
       "218        7.74  1.44    Male    Yes   Sat  Dinner     2\n",
       "219       30.14  3.09  Female    Yes   Sat  Dinner     4\n",
       "220       12.16  2.20    Male    Yes   Fri   Lunch     2\n",
       "221       13.42  3.48  Female    Yes   Fri   Lunch     2\n",
       "222        8.58  1.92    Male    Yes   Fri   Lunch     1\n",
       "223       15.98  3.00  Female     No   Fri   Lunch     3\n",
       "224       13.42  1.58    Male    Yes   Fri   Lunch     2\n",
       "225       16.27  2.50  Female    Yes   Fri   Lunch     2\n",
       "226       10.09  2.00  Female    Yes   Fri   Lunch     2\n",
       "227       20.45  3.00    Male     No   Sat  Dinner     4\n",
       "228       13.28  2.72    Male     No   Sat  Dinner     2\n",
       "229       22.12  2.88  Female    Yes   Sat  Dinner     2\n",
       "230       24.01  2.00    Male    Yes   Sat  Dinner     4\n",
       "231       15.69  3.00    Male    Yes   Sat  Dinner     3\n",
       "232       11.61  3.39    Male     No   Sat  Dinner     2\n",
       "233       10.77  1.47    Male     No   Sat  Dinner     2\n",
       "234       15.53  3.00    Male    Yes   Sat  Dinner     2\n",
       "235       10.07  1.25    Male     No   Sat  Dinner     2\n",
       "236       12.60  1.00    Male    Yes   Sat  Dinner     2\n",
       "237       32.83  1.17    Male    Yes   Sat  Dinner     2\n",
       "238       35.83  4.67  Female     No   Sat  Dinner     3\n",
       "239       29.03  5.92    Male     No   Sat  Dinner     3\n",
       "240       27.18  2.00  Female    Yes   Sat  Dinner     2\n",
       "241       22.67  2.00    Male    Yes   Sat  Dinner     2\n",
       "242       17.82  1.75    Male     No   Sat  Dinner     2\n",
       "243       18.78  3.00  Female     No  Thur  Dinner     2\n",
       "\n",
       "[244 rows x 7 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "odo('s3://nyqpug/tips.csv', pd.DataFrame)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "<img src=\"images/continuum_analytics_logo.png\" \n",
    "                                alt=\"Continuum Logo\",\n",
    "                                align=\"right\",\n",
    "                                width=\"30%\">,\n",
    "\n",
    "Blaze\n",
    "=====\n",
    "\n",
    "Blaze translates a subset of numpy/pandas syntax into database queries.  It hides away the database.\n",
    "\n",
    "On simple datasets, like CSV files, Blaze acts like Pandas with slightly different syntax.  In this case Blaze is just using Pandas."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "### Pandas example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>SepalLength</th>\n",
       "      <th>SepalWidth</th>\n",
       "      <th>PetalLength</th>\n",
       "      <th>PetalWidth</th>\n",
       "      <th>Species</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>4.6</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  SepalLength  SepalWidth  PetalLength  PetalWidth Species\n",
       "0           1          5.1         3.5          1.4         0.2  setosa\n",
       "1           2          4.9         3.0          1.4         0.2  setosa\n",
       "2           3          4.7         3.2          1.3         0.2  setosa\n",
       "3           4          4.6         3.1          1.5         0.2  setosa\n",
       "4           5          5.0         3.6          1.4         0.2  setosa"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('iris.csv')\n",
    "df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['setosa', 'versicolor', 'virginica'], dtype=object)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Species.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0          setosa\n",
       "50     versicolor\n",
       "100     virginica\n",
       "Name: Species, dtype: object"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Species.drop_duplicates()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "### Blaze example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>SepalLength</th>\n",
       "      <th>SepalWidth</th>\n",
       "      <th>PetalLength</th>\n",
       "      <th>PetalWidth</th>\n",
       "      <th>Species</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>4.6</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "   Unnamed: 0  SepalLength  SepalWidth  PetalLength  PetalWidth Species\n",
       "0           1          5.1         3.5          1.4         0.2  setosa\n",
       "1           2          4.9         3.0          1.4         0.2  setosa\n",
       "2           3          4.7         3.2          1.3         0.2  setosa\n",
       "3           4          4.6         3.1          1.5         0.2  setosa\n",
       "4           5          5.0         3.6          1.4         0.2  setosa"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import blaze as bz\n",
    "\n",
    "d = bz.Data('iris.csv')\n",
    "d.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Species</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>versicolor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "      Species\n",
       "0      setosa\n",
       "1  versicolor\n",
       "2   virginica"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d.Species.distinct()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "Foreign Data\n",
    "------------\n",
    "\n",
    "Blaze does different things under-the-hood on different kinds of data\n",
    "\n",
    "*  CSV files: Pandas DataFrames (or iterators of DataFrames)\n",
    "*  SQL tables: [SQLAlchemy](http://sqlalchemy.org).\n",
    "*  Mongo collections: [PyMongo](http://api.mongodb.org/python/current/)\n",
    "*  ...\n",
    "\n",
    "SQL\n",
    "---\n",
    "\n",
    "We'll play with SQL a lot during this tutorial.  Blaze translates your query to SQLAlchemy.  SQLAlchemy then translates to the SQL dialect of your database, your database then executes that query intelligently.\n",
    "\n",
    "* Blaze $\\rightarrow$ SQLAlchemy $\\rightarrow$ SQL $\\rightarrow$ Database computation\n",
    "\n",
    "This translation process lets analysts interact with a familiar interface while leveraging a potentially powerful database.\n",
    "\n",
    "To keep things local we'll use SQLite, but this works with any database with a SQLAlchemy dialect.  Examples in this section use the iris dataset.  Exercises use the Lahman Baseball statistics database, year 2013.\n",
    "\n",
    "If you have not downloaded this dataset you could do so here - https://github.com/jknecht/baseball-archive-sqlite/raw/master/lahman2013.sqlite.  \n",
    "\n",
    "<hr/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[34m06-simple-apps\u001b[m\u001b[m                  Section3-2_bokeh.ipynb\r\n",
      "\u001b[1m\u001b[34m07-final-app\u001b[m\u001b[m                    \u001b[1m\u001b[34mimages\u001b[m\u001b[m\r\n",
      "LICENSE                         iris.csv\r\n",
      "Section-1_blaze-solutions.ipynb lahman2013.sqlite\r\n",
      "Section3-1_blaze.ipynb          my.db\r\n"
     ]
    }
   ],
   "source": [
    "!ls "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Examples\n",
    "\n",
    "Lets dive into Blaze Syntax.  For simple queries it looks and feels similar to Pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "db = bz.Data('sqlite:///my.db')\n",
    "#db.iris\n",
    "#db.iris.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Species</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Species</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>versicolor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>virginica</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "      Species\n",
       "0     Species\n",
       "1      setosa\n",
       "2  versicolor\n",
       "3   virginica"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.iris.Species.distinct()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Species</th>\n",
       "      <th>SepalLength</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>6.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>6.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>5.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>5.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>6.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>4.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>6.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>5.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>versicolor</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "       Species  SepalLength\n",
       "0   versicolor          7.0\n",
       "1   versicolor          6.4\n",
       "2   versicolor          6.9\n",
       "3   versicolor          5.5\n",
       "4   versicolor          6.5\n",
       "5   versicolor          5.7\n",
       "6   versicolor          6.3\n",
       "7   versicolor          4.9\n",
       "8   versicolor          6.6\n",
       "9   versicolor          5.2\n",
       "..."
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.iris[db.iris.Species == 'versicolor'][['Species', 'SepalLength']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "\n",
    "### Work happens on the database\n",
    "\n",
    "If we were using pandas we would read the table into pandas, then use pandas' fast in-memory algorithms for computation.  Here we translate your query into SQL and then send that query to the database to do the work.\n",
    "\n",
    "* Pandas $\\leftarrow_\\textrm{data}$ SQL, then Pandas computes\n",
    "* Blaze $\\rightarrow_\\textrm{query}$ SQL, then database computes\n",
    "\n",
    "If we want to dive into the internal API we can inspect the query that Blaze transmits.\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT iris.\"Species\", iris.\"SepalLength\" \n",
      "FROM iris \n",
      "WHERE iris.\"Species\" = ?\n"
     ]
    }
   ],
   "source": [
    "# Inspect SQL query\n",
    "query = db.iris[db.iris.Species == 'versicolor'][['Species', 'SepalLength']]\n",
    "print bz.compute(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT iris.\"Species\", max(iris.\"PetalLength\") AS longest, min(iris.\"PetalLength\") AS shortest \n",
      "FROM iris GROUP BY iris.\"Species\"\n"
     ]
    }
   ],
   "source": [
    "query = bz.by(db.iris.Species, longest=db.iris.PetalLength.max(),\n",
    "                              shortest=db.iris.PetalLength.min())\n",
    "print bz.compute(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(u'Species', u'PetalLength', u'PetalLength'),\n",
       " (u'setosa', 1.9, 1.0),\n",
       " (u'versicolor', 5.1, 3.0),\n",
       " (u'virginica', 6.9, 4.5)]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "odo(query, list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "\n",
    "### Exercises\n",
    "\n",
    "Now we load the Lahman baseball database and perform similar queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dshape(\"\"\"{\n",
       "  AllstarFull: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    gameNum: ?int32,\n",
       "    gameID: ?string,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    GP: ?int32,\n",
       "    startingPos: ?int32\n",
       "    },\n",
       "  Appearances: var * {\n",
       "    yearID: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    playerID: ?string,\n",
       "    G_all: ?int32,\n",
       "    GS: ?int32,\n",
       "    G_batting: ?int32,\n",
       "    G_defense: ?int32,\n",
       "    G_p: ?int32,\n",
       "    G_c: ?int32,\n",
       "    G_1b: ?int32,\n",
       "    G_2b: ?int32,\n",
       "    G_3b: ?int32,\n",
       "    G_ss: ?int32,\n",
       "    G_lf: ?int32,\n",
       "    G_cf: ?int32,\n",
       "    G_rf: ?int32,\n",
       "    G_of: ?int32,\n",
       "    G_dh: ?int32,\n",
       "    G_ph: ?int32,\n",
       "    G_pr: ?int32\n",
       "    },\n",
       "  AwardsManagers: var * {\n",
       "    playerID: ?string,\n",
       "    awardID: ?string,\n",
       "    yearID: ?int32,\n",
       "    lgID: ?string,\n",
       "    tie: ?string,\n",
       "    notes: ?string\n",
       "    },\n",
       "  AwardsPlayers: var * {\n",
       "    playerID: ?string,\n",
       "    awardID: ?string,\n",
       "    yearID: ?int32,\n",
       "    lgID: ?string,\n",
       "    tie: ?string,\n",
       "    notes: ?string\n",
       "    },\n",
       "  AwardsShareManagers: var * {\n",
       "    awardID: ?string,\n",
       "    yearID: ?int32,\n",
       "    lgID: ?string,\n",
       "    playerID: ?string,\n",
       "    pointsWon: ?int32,\n",
       "    pointsMax: ?int32,\n",
       "    votesFirst: ?int32\n",
       "    },\n",
       "  AwardsSharePlayers: var * {\n",
       "    awardID: ?string,\n",
       "    yearID: ?int32,\n",
       "    lgID: ?string,\n",
       "    playerID: ?string,\n",
       "    pointsWon: ?float64,\n",
       "    pointsMax: ?int32,\n",
       "    votesFirst: ?float64\n",
       "    },\n",
       "  Batting: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    stint: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    G: ?int32,\n",
       "    G_batting: ?int32,\n",
       "    AB: ?int32,\n",
       "    R: ?int32,\n",
       "    H: ?int32,\n",
       "    2B: ?int32,\n",
       "    3B: ?int32,\n",
       "    HR: ?int32,\n",
       "    RBI: ?int32,\n",
       "    SB: ?int32,\n",
       "    CS: ?int32,\n",
       "    BB: ?int32,\n",
       "    SO: ?int32,\n",
       "    IBB: ?int32,\n",
       "    HBP: ?int32,\n",
       "    SH: ?int32,\n",
       "    SF: ?int32,\n",
       "    GIDP: ?int32,\n",
       "    G_old: ?int32\n",
       "    },\n",
       "  BattingPost: var * {\n",
       "    yearID: ?int32,\n",
       "    round: ?string,\n",
       "    playerID: ?string,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    G: ?int32,\n",
       "    AB: ?int32,\n",
       "    R: ?int32,\n",
       "    H: ?int32,\n",
       "    2B: ?int32,\n",
       "    3B: ?int32,\n",
       "    HR: ?int32,\n",
       "    RBI: ?int32,\n",
       "    SB: ?int32,\n",
       "    CS: ?int32,\n",
       "    BB: ?int32,\n",
       "    SO: ?int32,\n",
       "    IBB: ?int32,\n",
       "    HBP: ?int32,\n",
       "    SH: ?int32,\n",
       "    SF: ?int32,\n",
       "    GIDP: ?int32\n",
       "    },\n",
       "  Fielding: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    stint: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    POS: ?string,\n",
       "    G: ?int32,\n",
       "    GS: ?int32,\n",
       "    InnOuts: ?int32,\n",
       "    PO: ?int32,\n",
       "    A: ?int32,\n",
       "    E: ?int32,\n",
       "    DP: ?int32,\n",
       "    PB: ?int32,\n",
       "    WP: ?int32,\n",
       "    SB: ?int32,\n",
       "    CS: ?int32,\n",
       "    ZR: ?float64\n",
       "    },\n",
       "  FieldingOF: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    stint: ?int32,\n",
       "    Glf: ?int32,\n",
       "    Gcf: ?int32,\n",
       "    Grf: ?int32\n",
       "    },\n",
       "  FieldingPost: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    round: ?string,\n",
       "    POS: ?string,\n",
       "    G: ?int32,\n",
       "    GS: ?int32,\n",
       "    InnOuts: ?int32,\n",
       "    PO: ?int32,\n",
       "    A: ?int32,\n",
       "    E: ?int32,\n",
       "    DP: ?int32,\n",
       "    TP: ?int32,\n",
       "    PB: ?int32,\n",
       "    SB: ?int32,\n",
       "    CS: ?int32\n",
       "    },\n",
       "  HallOfFame: var * {\n",
       "    playerID: ?string,\n",
       "    yearid: ?int32,\n",
       "    votedBy: ?string,\n",
       "    ballots: ?int32,\n",
       "    needed: ?int32,\n",
       "    votes: ?int32,\n",
       "    inducted: ?string,\n",
       "    category: ?string,\n",
       "    needed_note: ?string\n",
       "    },\n",
       "  Managers: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    inseason: ?int32,\n",
       "    G: ?int32,\n",
       "    W: ?int32,\n",
       "    L: ?int32,\n",
       "    rank: ?int32,\n",
       "    plyrMgr: ?string\n",
       "    },\n",
       "  ManagersHalf: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    inseason: ?int32,\n",
       "    half: ?int32,\n",
       "    G: ?int32,\n",
       "    W: ?int32,\n",
       "    L: ?int32,\n",
       "    rank: ?int32\n",
       "    },\n",
       "  Master: var * {\n",
       "    playerID: ?string,\n",
       "    birthYear: ?int32,\n",
       "    birthMonth: ?int32,\n",
       "    birthDay: ?int32,\n",
       "    birthCountry: ?string,\n",
       "    birthState: ?string,\n",
       "    birthCity: ?string,\n",
       "    deathYear: ?int32,\n",
       "    deathMonth: ?int32,\n",
       "    deathDay: ?int32,\n",
       "    deathCountry: ?string,\n",
       "    deathState: ?string,\n",
       "    deathCity: ?string,\n",
       "    nameFirst: ?string,\n",
       "    nameLast: ?string,\n",
       "    nameGiven: ?string,\n",
       "    weight: ?int32,\n",
       "    height: ?float64,\n",
       "    bats: ?string,\n",
       "    throws: ?string,\n",
       "    debut: ?float64,\n",
       "    finalGame: ?float64,\n",
       "    retroID: ?string,\n",
       "    bbrefID: ?string\n",
       "    },\n",
       "  Pitching: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    stint: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    W: ?int32,\n",
       "    L: ?int32,\n",
       "    G: ?int32,\n",
       "    GS: ?int32,\n",
       "    CG: ?int32,\n",
       "    SHO: ?int32,\n",
       "    SV: ?int32,\n",
       "    IPouts: ?int32,\n",
       "    H: ?int32,\n",
       "    ER: ?int32,\n",
       "    HR: ?int32,\n",
       "    BB: ?int32,\n",
       "    SO: ?int32,\n",
       "    BAOpp: ?float64,\n",
       "    ERA: ?float64,\n",
       "    IBB: ?int32,\n",
       "    WP: ?int32,\n",
       "    HBP: ?int32,\n",
       "    BK: ?int32,\n",
       "    BFP: ?int32,\n",
       "    GF: ?int32,\n",
       "    R: ?int32,\n",
       "    SH: ?int32,\n",
       "    SF: ?int32,\n",
       "    GIDP: ?int32\n",
       "    },\n",
       "  PitchingPost: var * {\n",
       "    playerID: ?string,\n",
       "    yearID: ?int32,\n",
       "    round: ?string,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    W: ?int32,\n",
       "    L: ?int32,\n",
       "    G: ?int32,\n",
       "    GS: ?int32,\n",
       "    CG: ?int32,\n",
       "    SHO: ?int32,\n",
       "    SV: ?int32,\n",
       "    IPouts: ?int32,\n",
       "    H: ?int32,\n",
       "    ER: ?int32,\n",
       "    HR: ?int32,\n",
       "    BB: ?int32,\n",
       "    SO: ?int32,\n",
       "    BAOpp: ?float64,\n",
       "    ERA: ?float64,\n",
       "    IBB: ?int32,\n",
       "    WP: ?int32,\n",
       "    HBP: ?int32,\n",
       "    BK: ?int32,\n",
       "    BFP: ?int32,\n",
       "    GF: ?int32,\n",
       "    R: ?int32,\n",
       "    SH: ?int32,\n",
       "    SF: ?int32,\n",
       "    GIDP: ?int32\n",
       "    },\n",
       "  Salaries: var * {\n",
       "    yearID: ?int32,\n",
       "    teamID: ?string,\n",
       "    lgID: ?string,\n",
       "    playerID: ?string,\n",
       "    salary: ?float64\n",
       "    },\n",
       "  Schools: var * {\n",
       "    schoolID: ?string,\n",
       "    schoolName: ?string,\n",
       "    schoolCity: ?string,\n",
       "    schoolState: ?string,\n",
       "    schoolNick: ?string\n",
       "    },\n",
       "  SchoolsPlayers: var * {\n",
       "    playerID: ?string,\n",
       "    schoolID: ?string,\n",
       "    yearMin: ?int32,\n",
       "    yearMax: ?int32\n",
       "    },\n",
       "  SeriesPost: var * {\n",
       "    yearID: ?int32,\n",
       "    round: ?string,\n",
       "    teamIDwinner: ?string,\n",
       "    lgIDwinner: ?string,\n",
       "    teamIDloser: ?string,\n",
       "    lgIDloser: ?string,\n",
       "    wins: ?int32,\n",
       "    losses: ?int32,\n",
       "    ties: ?int32\n",
       "    },\n",
       "  Teams: var * {\n",
       "    yearID: ?int32,\n",
       "    lgID: ?string,\n",
       "    teamID: ?string,\n",
       "    franchID: ?string,\n",
       "    divID: ?string,\n",
       "    Rank: ?int32,\n",
       "    G: ?int32,\n",
       "    Ghome: ?int32,\n",
       "    W: ?int32,\n",
       "    L: ?int32,\n",
       "    DivWin: ?string,\n",
       "    WCWin: ?string,\n",
       "    LgWin: ?string,\n",
       "    WSWin: ?string,\n",
       "    R: ?int32,\n",
       "    AB: ?int32,\n",
       "    H: ?int32,\n",
       "    2B: ?int32,\n",
       "    3B: ?int32,\n",
       "    HR: ?int32,\n",
       "    BB: ?int32,\n",
       "    SO: ?int32,\n",
       "    SB: ?int32,\n",
       "    CS: ?int32,\n",
       "    HBP: ?int32,\n",
       "    SF: ?int32,\n",
       "    RA: ?int32,\n",
       "    ER: ?int32,\n",
       "    ERA: ?float64,\n",
       "    CG: ?int32,\n",
       "    SHO: ?int32,\n",
       "    SV: ?int32,\n",
       "    IPouts: ?int32,\n",
       "    HA: ?int32,\n",
       "    HRA: ?int32,\n",
       "    BBA: ?int32,\n",
       "    SOA: ?int32,\n",
       "    E: ?int32,\n",
       "    DP: ?int32,\n",
       "    FP: ?float64,\n",
       "    name: ?string,\n",
       "    park: ?string,\n",
       "    attendance: ?int32,\n",
       "    BPF: ?int32,\n",
       "    PPF: ?int32,\n",
       "    teamIDBR: ?string,\n",
       "    teamIDlahman45: ?string,\n",
       "    teamIDretro: ?string\n",
       "    },\n",
       "  TeamsFranchises: var * {\n",
       "    franchID: ?string,\n",
       "    franchName: ?string,\n",
       "    active: ?string,\n",
       "    NAassoc: ?string\n",
       "    },\n",
       "  TeamsHalf: var * {\n",
       "    yearID: ?int32,\n",
       "    lgID: ?string,\n",
       "    teamID: ?string,\n",
       "    Half: ?string,\n",
       "    divID: ?string,\n",
       "    DivWin: ?string,\n",
       "    Rank: ?int32,\n",
       "    G: ?int32,\n",
       "    W: ?int32,\n",
       "    L: ?int32\n",
       "    },\n",
       "  temp: var * {ID: ?int32, namefull: ?string, born: ?float64}\n",
       "  }\"\"\")"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# db = bz.Data('postgresql://postgres:postgres@ec2-54-159-160-163.compute-1.amazonaws.com')  # Use Postgres if you don't have the sqlite file\n",
    "db = bz.Data('sqlite:///lahman2013.sqlite')\n",
    "db.dshape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# View the Salaries table\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# What are the distinct teamIDs in the Salaries table?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# What is the minimum and maximum yearID in the Sarlaries table?  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# For the Oakland Athletics (teamID OAK), pick out the playerID, salary, and yearID columns\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Sort that result by salary.  \n",
    "# Use the ascending=False keyword argument to the sort function to find the highest paid players\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "\n",
    "### Example: Split-apply-combine\n",
    "\n",
    "In Pandas we perform computations on a *per-group* basis with the `groupby` operator.  In Blaze our syntax is slightly different, using instead the `by` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Species\n",
       "setosa        1.0\n",
       "versicolor    3.0\n",
       "virginica     4.5\n",
       "Name: PetalLength, dtype: float64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "iris = pd.read_csv('iris.csv')\n",
    "iris.groupby('Species').PetalLength.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Species\n",
      "setosa        1.0\n",
      "versicolor    3.0\n",
      "virginica     4.5\n",
      "Name: PetalLength, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "iris = bz.Data('sqlite:///my.db::iris')\n",
    "bz.by(iris.Species, largest=iris.PetalLength.max(), \n",
    "                    smallest=iris.PetalLength.min())\n",
    "print(_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "Store Results\n",
    "-------------\n",
    "\n",
    "By default Blaze only shows us the first ten lines of a result.  This provides a more interactive feel and stops us from accidentally crushing our system.  Sometimes we do want to compute all of the results and store them someplace.\n",
    "\n",
    "Blaze expressions are valid sources for `odo`.  So we can store our results in any format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(u'Species', u'PetalLength', u'PetalLength'),\n",
       " (u'setosa', 1.9, 1.0),\n",
       " (u'versicolor', 5.1, 3.0),\n",
       " (u'virginica', 6.9, 4.5)]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "iris = bz.Data('sqlite:///my.db::iris')\n",
    "query = bz.by(iris.Species, largest=iris.PetalLength.max(),   # A lazily evaluated result\n",
    "                            smallest=iris.PetalLength.min())  \n",
    "\n",
    "odo(query, list)  # A concrete result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr/>\n",
    "\n",
    "### Exercise: Storage\n",
    "\n",
    "The solution to the first split-apply-combine problem is below.  Store that result in a list, a CSV file, and in a new SQL table in our database (use a uri like `sqlite://...` to specify the SQL table.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "result = bz.by(db.Salaries.teamID, avg=db.Salaries.salary.mean(), \n",
    "                                   max=db.Salaries.salary.max(), \n",
    "                                   ratio=db.Salaries.salary.max() / db.Salaries.salary.min()\n",
    "                ).sort('ratio', ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(u'PHI', 2092230.9326364691, 25000000.0, 416.6666666666667),\n",
       " (u'LAN', 2346982.6980255516, 23854494.0, 397.5749),\n",
       " (u'NYN', 2317349.977245509, 23145011.0, 385.7501833333333),\n",
       " (u'DET', 1980834.9902080784, 23000000.0, 383.3333333333333),\n",
       " (u'MIN', 1525031.650385604, 23000000.0, 383.3333333333333),\n",
       " (u'BOS', 2692113.8568075118, 22500000.0, 375.0),\n",
       " (u'SFN', 2044198.6841463414, 22250000.0, 370.8333333333333),\n",
       " (u'TEX', 1874651.551143201, 22000000.0, 366.6666666666667),\n",
       " (u'SEA', 1932288.913202934, 20557143.0, 342.61905),\n",
       " (u'CHA', 1992653.5012315272, 17000000.0, 340.0)]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "odo(result, list)[:10]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}