{ "metadata": { "name": "", "signature": "sha256:d825a277965c84a256612bb6ff63f606b5bba263c90ac5407506f08370ba339b" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Getting Started with Blaze\n", "\n", "* Full tutorial available at http://github.com/ContinuumIO/blaze-tutorial\n", "* Install software with `conda install -c blaze blaze`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Basic Queries\n", "\n", "For basic tabular queries, Blaze shares the same syntax as Pandas." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from blaze import Data, by, join, transform" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "bank = Data([[1, 'Alice', 100],\n", " [2, 'Bob', -200],\n", " [3, 'Charlie', 300],\n", " [4, 'Dennis', 400],\n", " [5, 'Edith', -500]], columns=['id', 'name', 'amount'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arithmetic and Reductions" ] }, { "cell_type": "code", "collapsed": false, "input": [ "bank.amount" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ " amount\n", "0 100\n", "1 -200\n", "2 300\n", "3 400\n", "4 -500" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "bank.amount / 100" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ " amount\n", "0 1\n", "1 -2\n", "2 3\n", "3 4\n", "4 -5" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "(bank.amount / 100).mean()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "0.2" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multiple columns and sorting" ] }, { "cell_type": "code", "collapsed": false, "input": [ "bank[['name', 'amount']].sort('amount')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selections\n", "\n", "We select subsets of data by indexing one expression with another" ] }, { "cell_type": "code", "collapsed": false, "input": [ "bank[bank.amount < 0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ " id name amount\n", "0 2 Bob -200\n", "1 5 Edith -500" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combining Operations\n", "\n", "We can combine these sorts of operations with each other" ] }, { "cell_type": "code", "collapsed": false, "input": [ "bank[bank.amount < 0].amount / 100" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ " amount\n", "0 -2\n", "1 -5" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "bank[bank.amount < 0].name" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ " name\n", "0 Bob\n", "1 Edith" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercises\n", "\n", "Write expressions to answer the following questions" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# What are the IDs of everyone with a positive amount?\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "# What is the name of the person with amount 400?\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "# What is the difference between the minimum and maximum amounts?\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. More complex queries\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need a more interesting dataset. We open the standard *iris* dataset, a table of 150 measurements of flowers in the iris genus. We find this dataset in a CSV file in the `data/` directory. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "iris = Data('data/iris.csv')\n", "iris" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | sepal_length | \n", "sepal_width | \n", "petal_length | \n", "petal_width | \n", "species | \n", "
---|---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "Iris-setosa | \n", "
3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "Iris-setosa | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
5 | \n", "5.4 | \n", "3.9 | \n", "1.7 | \n", "0.4 | \n", "Iris-setosa | \n", "
6 | \n", "4.6 | \n", "3.4 | \n", "1.4 | \n", "0.3 | \n", "Iris-setosa | \n", "
7 | \n", "5.0 | \n", "3.4 | \n", "1.5 | \n", "0.2 | \n", "Iris-setosa | \n", "
8 | \n", "4.4 | \n", "2.9 | \n", "1.4 | \n", "0.2 | \n", "Iris-setosa | \n", "
9 | \n", "4.9 | \n", "3.1 | \n", "1.5 | \n", "0.1 | \n", "Iris-setosa | \n", "
10 | \n", "5.4 | \n", "3.7 | \n", "1.5 | \n", "0.2 | \n", "Iris-setosa | \n", "