{
"metadata": {
"name": "",
"signature": "sha256:d825a277965c84a256612bb6ff63f606b5bba263c90ac5407506f08370ba339b"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"# Getting Started with Blaze\n",
"\n",
"* Full tutorial available at http://github.com/ContinuumIO/blaze-tutorial\n",
"* Install software with `conda install -c blaze blaze`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Basic Queries\n",
"\n",
"For basic tabular queries, Blaze shares the same syntax as Pandas."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze import Data, by, join, transform"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank = Data([[1, 'Alice', 100],\n",
" [2, 'Bob', -200],\n",
" [3, 'Charlie', 300],\n",
" [4, 'Dennis', 400],\n",
" [5, 'Edith', -500]], columns=['id', 'name', 'amount'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Arithmetic and Reductions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank.amount"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
" amount\n",
"0 100\n",
"1 -200\n",
"2 300\n",
"3 400\n",
"4 -500"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank.amount / 100"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
" amount\n",
"0 1\n",
"1 -2\n",
"2 3\n",
"3 4\n",
"4 -5"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"(bank.amount / 100).mean()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"0.2"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multiple columns and sorting"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank[['name', 'amount']].sort('amount')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selections\n",
"\n",
"We select subsets of data by indexing one expression with another"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank[bank.amount < 0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
" id name amount\n",
"0 2 Bob -200\n",
"1 5 Edith -500"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Combining Operations\n",
"\n",
"We can combine these sorts of operations with each other"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank[bank.amount < 0].amount / 100"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
" amount\n",
"0 -2\n",
"1 -5"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank[bank.amount < 0].name"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
" name\n",
"0 Bob\n",
"1 Edith"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Exercises\n",
"\n",
"Write expressions to answer the following questions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# What are the IDs of everyone with a positive amount?\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# What is the name of the person with amount 400?\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# What is the difference between the minimum and maximum amounts?\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. More complex queries\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we need a more interesting dataset. We open the standard *iris* dataset, a table of 150 measurements of flowers in the iris genus. We find this dataset in a CSV file in the `data/` directory. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"iris = Data('data/iris.csv')\n",
"iris"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"
| \n", " | sepal_length | \n", "sepal_width | \n", "petal_length | \n", "petal_width | \n", "species | \n", "
|---|---|---|---|---|---|
| 0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 5 | \n", "5.4 | \n", "3.9 | \n", "1.7 | \n", "0.4 | \n", "Iris-setosa | \n", "
| 6 | \n", "4.6 | \n", "3.4 | \n", "1.4 | \n", "0.3 | \n", "Iris-setosa | \n", "
| 7 | \n", "5.0 | \n", "3.4 | \n", "1.5 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 8 | \n", "4.4 | \n", "2.9 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
| 9 | \n", "4.9 | \n", "3.1 | \n", "1.5 | \n", "0.1 | \n", "Iris-setosa | \n", "
| 10 | \n", "5.4 | \n", "3.7 | \n", "1.5 | \n", "0.2 | \n", "Iris-setosa | \n", "