{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Hello! Welcome!\n",
"\n",
"Did you see my talk at PyCon 2014! You can use this notebook to try it out for yourself!\n",
"\n",
"If you didn't, and you'd like to, the video is here: [Diving into Open Data with IPython Notebook and Pandas](http://pyvideo.org/video/2657/diving-into-open-data-with-ipython-notebook-pan-0). The pandas cookbook I mentioned is at http://github.com/jvns/pandas-cookbook.\n",
"\n",
"**IMPORTANT**: To make this work, you'll need to\n",
"\n",
"1. install IPython Notebook and Pandas (easiest way: use the free/open source [Anaconda](http://store.continuum.io))\n",
"2. Download this notebook\n",
"3. Download the data and put it in the same directory: [2012.csv](https://raw.githubusercontent.com/jvns/talks/master/pycon2014/2012.csv) ([original source](http://donnees.ville.montreal.qc.ca/dataset/velos-comptage))\n",
"4. In that directory, run `ipython notebook`\n",
"\n",
"
This work is licensed under a Creative Commons Attribution 4.0 International License."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Some imports we'll need\n",
"import numpy as np\n",
"import pandas as pd\n",
"julia = {'email': 'julia@jvns.ca', 'twitter': 'http://twitter.com/b0rk', 'slides': 'http://bit.ly/pycon-pandas', 'website': 'http://jvns.ca'}"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Diving into Open Data with IPython Notebook & Pandas\n",
"\n",
"# I'm Julia Evans\n",
"\n",
"Data at [Stripe](https://stripe.com), work on [Montr\u00e9al All-Girl Hack Night](http://mtlallgirlhacknight.ca), [PyLadies MTL](http://meetup.com/pyladiesmtl)\n",
"\n",
"You can follow along with this talk at: \n",
"\n",
"[http://bit.ly/pycon-pandas](http://bit.ly/pycon-pandas)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print 'Email:', julia['email']\n",
"print 'Twitter:', julia['twitter']\n",
"print 'Blog:', julia['website']"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Email: julia@jvns.ca\n",
"Twitter: http://twitter.com/b0rk\n",
"Blog: http://jvns.ca\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# This talk\n",
"\n",
"### 1. What are IPython Notebook & Pandas?\n",
"### 2. Practical examples of using them!\n",
"### 3. Advice."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# IPython Notebook + Pandas + Numpy: what are they?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# IPython Notebook\n",
"\n",
"* web-based user interface to IPython\n",
"* pretty graphs\n",
"* literate programming\n",
"* Can make slideshows :) (this presentation)\n",
"* version controlled science!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Pandas: No loops!\n",
"\n",
"Imagine:\n",
"\n",
"* dataset with every complaint call ever made in New York\n",
"* You want to know how many noise complaints are made in each borough each hour\n",
"\n",
"## You don't need to write loops to do this!\n",
"\n",
"Also the solution is 5 lines of code"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Numpy makes Pandas fast\n",
"\n",
"* numpy: fast array computations\n",
"* pandas is built on top of numpy"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"py_list = range(20000000)\n",
"numpy_array = np.arange(20000000)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"total = 0\n",
"for x in py_list:\n",
" x += total * total"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 1.22 s per loop\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%timeit\n",
"np.sum(numpy_array * numpy_array)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 83.4 ms per loop\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# How do I install it?\n",
"\n",
"## Don't: Use the Ubuntu packages\n",
"\n",
"\n",
"sudo apt-get install ipython-notebook
\n",
"\n",
"## Do: Use pip or [Anaconda](https://store.continuum.io/)\n",
"\n",
"
\n",
"\n",
"pip install ipython tornado pyzmq
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# How to run IPython Notebook\n",
"\n",
"
\n",
"pip install numpy pandas matplotlib\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Practical examples!!!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Sensors on bike paths measure the number of cyclists"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Part 1: Import the 2012 bike path data from a CSV"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"import numpy as np"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import matplotlib\n",
"# display graphs inline\n",
"%matplotlib inline \n",
"\n",
"# Make graphs prettier\n",
"pd.set_option('display.max_columns', 15)\n",
"pd.set_option('display.line_width', 400)\n",
"pd.set_option('display.mpl_style', 'default')\n",
"\n",
"# Make the fonts bigger\n",
"matplotlib.rc('figure', figsize=(14, 7))\n",
"matplotlib.rc('font', family='normal', weight='bold', size=22)"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bike_data = pd.read_csv(\"./2012.csv\")\n",
"bike_data[:5]"
],
"language": "python",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"html": [
"\n",
"$ ipython notebook\n",
"
\n",
"
\n", " | Date;Berri 1;Br\ufffdbeuf (donn\ufffdes non disponibles);C\ufffdte-Sainte-Catherine;Maisonneuve 1;Maisonneuve 2;du Parc;Pierre-Dupuy;Rachel1;St-Urbain (donn\ufffdes non disponibles) | \n", "
---|---|
0 | \n", "01/01/2012;35;;0;38;51;26;10;16; | \n", "
1 | \n", "02/01/2012;83;;1;68;153;53;6;43; | \n", "
2 | \n", "03/01/2012;135;;2;104;248;89;3;58; | \n", "
3 | \n", "04/01/2012;144;;1;116;318;111;8;61; | \n", "
4 | \n", "05/01/2012;197;;2;124;330;97;13;95; | \n", "
5 rows \u00d7 1 columns
\n", "\n", " | Berri 1 | \n", "C\u00f4te-Sainte-Catherine | \n", "Maisonneuve 1 | \n", "
---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " |
2012-01-01 | \n", "35 | \n", "0 | \n", "38 | \n", "
2012-01-02 | \n", "83 | \n", "1 | \n", "68 | \n", "
2012-01-03 | \n", "135 | \n", "2 | \n", "104 | \n", "
2012-01-04 | \n", "144 | \n", "1 | \n", "116 | \n", "
2012-01-05 | \n", "197 | \n", "2 | \n", "124 | \n", "
5 rows \u00d7 3 columns
\n", "\n", " | Berri 1 | \n", "C\u00f4te-Sainte-Catherine | \n", "Maisonneuve 1 | \n", "
---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " |
2012-01-01 | \n", "35 | \n", "0 | \n", "38 | \n", "
2012-01-02 | \n", "83 | \n", "1 | \n", "68 | \n", "
2012-01-03 | \n", "135 | \n", "2 | \n", "104 | \n", "
3 rows \u00d7 3 columns
\n", "\n", " | Berri 1 | \n", "Maisonneuve 1 | \n", "
---|---|---|
Date | \n", "\n", " | \n", " |
2012-01-01 | \n", "35 | \n", "38 | \n", "
2012-01-02 | \n", "83 | \n", "68 | \n", "
2012-01-03 | \n", "135 | \n", "104 | \n", "
3 rows \u00d7 2 columns
\n", "\n", " | Berri 1 | \n", "C\u00f4te-Sainte-Catherine | \n", "Maisonneuve 1 | \n", "
---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " |
2012-01-01 | \n", "35 | \n", "0 | \n", "38 | \n", "
2012-01-14 | \n", "32 | \n", "0 | \n", "54 | \n", "
2012-01-15 | \n", "54 | \n", "0 | \n", "33 | \n", "
2012-01-21 | \n", "53 | \n", "0 | \n", "47 | \n", "
2012-01-22 | \n", "71 | \n", "0 | \n", "41 | \n", "
2012-02-05 | \n", "72 | \n", "0 | \n", "46 | \n", "
2012-02-11 | \n", "71 | \n", "0 | \n", "63 | \n", "
2012-02-25 | \n", "62 | \n", "0 | \n", "48 | \n", "
8 rows \u00d7 3 columns
\n", "\n", " | Berri 1 | \n", "C\u00f4te-Sainte-Catherine | \n", "Maisonneuve 1 | \n", "weekday | \n", "
---|---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " | \n", " |
2012-01-01 | \n", "35 | \n", "0 | \n", "38 | \n", "6 | \n", "
2012-01-02 | \n", "83 | \n", "1 | \n", "68 | \n", "0 | \n", "
2012-01-03 | \n", "135 | \n", "2 | \n", "104 | \n", "1 | \n", "
2012-01-04 | \n", "144 | \n", "1 | \n", "116 | \n", "2 | \n", "
2012-01-05 | \n", "197 | \n", "2 | \n", "124 | \n", "3 | \n", "
5 rows \u00d7 4 columns
\n", "\n", " | Berri 1 | \n", "C\u00f4te-Sainte-Catherine | \n", "Maisonneuve 1 | \n", "
---|---|---|---|
weekday | \n", "\n", " | \n", " | \n", " |
0 | \n", "134298 | \n", "60329 | \n", "90051 | \n", "
1 | \n", "135305 | \n", "58708 | \n", "92035 | \n", "
2 | \n", "152972 | \n", "67344 | \n", "104891 | \n", "
3 | \n", "160131 | \n", "69028 | \n", "111895 | \n", "
4 | \n", "141771 | \n", "56446 | \n", "98568 | \n", "
5 | \n", "101578 | \n", "34018 | \n", "62067 | \n", "
6 | \n", "99310 | \n", "36466 | \n", "55324 | \n", "
7 rows \u00d7 3 columns
\n", "\n", " | Temp (C) | \n", "Dew Point Temp (C) | \n", "Rel Hum (%) | \n", "Wind Spd (km/h) | \n", "Visibility (km) | \n", "Stn Press (kPa) | \n", "Weather | \n", "
---|---|---|---|---|---|---|---|
Date/Time | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2012-01-01 00:00:00 | \n", "-1.8 | \n", "-3.9 | \n", "86 | \n", "4 | \n", "8.0 | \n", "101.24 | \n", "Fog | \n", "
2012-01-01 01:00:00 | \n", "-1.8 | \n", "-3.7 | \n", "87 | \n", "4 | \n", "8.0 | \n", "101.24 | \n", "Fog | \n", "
2012-01-01 02:00:00 | \n", "-1.8 | \n", "-3.4 | \n", "89 | \n", "7 | \n", "4.0 | \n", "101.26 | \n", "Freezing Drizzle,Fog | \n", "
2012-01-01 03:00:00 | \n", "-1.5 | \n", "-3.2 | \n", "88 | \n", "6 | \n", "4.0 | \n", "101.27 | \n", "Freezing Drizzle,Fog | \n", "
2012-01-01 04:00:00 | \n", "-1.5 | \n", "-3.3 | \n", "88 | \n", "7 | \n", "4.8 | \n", "101.23 | \n", "Fog | \n", "
5 rows \u00d7 7 columns
\n", "\n", " | Berri 1 | \n", "C\u00f4te-Sainte-Catherine | \n", "Maisonneuve 1 | \n", "weekday | \n", "mean temp | \n", "
---|---|---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
2012-01-01 | \n", "35 | \n", "0 | \n", "38 | \n", "6 | \n", "0.629167 | \n", "
2012-01-02 | \n", "83 | \n", "1 | \n", "68 | \n", "0 | \n", "0.041667 | \n", "
2012-01-03 | \n", "135 | \n", "2 | \n", "104 | \n", "1 | \n", "-14.416667 | \n", "
2012-01-04 | \n", "144 | \n", "1 | \n", "116 | \n", "2 | \n", "-13.645833 | \n", "
2012-01-05 | \n", "197 | \n", "2 | \n", "124 | \n", "3 | \n", "-6.750000 | \n", "
5 rows \u00d7 5 columns
\n", "