{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Pandas II" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## More indexing tricks" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We'll start out with some data from Beer Advocate (see [Tom Augspurger](https://github.com/TomAugspurger/pydata-chi-h2t/blob/master/3-Indexing.ipynb) for some cool details on how he extracted this data)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "pd.options.display.max_rows = 10" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
07.02511287Bell's Cherry StoutAmerican Stout4.54.04.54.0blaheath4.5Batch 8144\\tPitch black in color with a 1/2 f...2009-10-05 21:31:48
15.7197369790Duck-Rabbit PorterAmerican Porter4.54.04.54.0GJ404.0Sampled from a 12oz bottle in a standard pint...2009-10-05 21:32:09
24.8110983182Fürstenberg Premium PilsenerGerman Pilsener4.03.03.03.0biegaman3.5Haystack yellow with an energetic group of bu...2009-10-05 21:32:13
39.5285773818Unearthly (Imperial India Pale Ale)American Double / Imperial IPA4.04.04.04.0nick764.0The aroma has pine, wood, citrus, caramel, an...2009-10-05 21:32:37
45.8398119Wolaver's Pale AleAmerican Pale Ale (APA)4.03.04.03.5champ1033.0A: Pours a slightly hazy golden/orange color....2009-10-05 21:33:14
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "0 7.0 2511 287 Bell's Cherry Stout \n", "1 5.7 19736 9790 Duck-Rabbit Porter \n", "2 4.8 11098 3182 Fürstenberg Premium Pilsener \n", "3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) \n", "4 5.8 398 119 Wolaver's Pale Ale \n", "\n", " beer_style review_appearance review_aroma \\\n", "0 American Stout 4.5 4.0 \n", "1 American Porter 4.5 4.0 \n", "2 German Pilsener 4.0 3.0 \n", "3 American Double / Imperial IPA 4.0 4.0 \n", "4 American Pale Ale (APA) 4.0 3.0 \n", "\n", " review_overall review_palate profile_name review_taste \\\n", "0 4.5 4.0 blaheath 4.5 \n", "1 4.5 4.0 GJ40 4.0 \n", "2 3.0 3.0 biegaman 3.5 \n", "3 4.0 4.0 nick76 4.0 \n", "4 4.0 3.5 champ103 3.0 \n", "\n", " text time \n", "0 Batch 8144\\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48 \n", "1 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09 \n", "2 Haystack yellow with an energetic group of bu... 2009-10-05 21:32:13 \n", "3 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37 \n", "4 A: Pours a slightly hazy golden/orange color.... 2009-10-05 21:33:14 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('data/beer_subset.csv.gz', parse_dates=['time'], compression='gzip')\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Boolean indexing\n", "\n", "Like a where clause in SQL. \n", "\n", "The indexer (or boolean mask) should be 1-dimensional and the same length as the thing being indexed." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 True\n", "3 False\n", "4 False\n", " ... \n", "994 False\n", "995 False\n", "996 False\n", "997 False\n", "998 False\n", "Name: abv, dtype: bool" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['abv'] < 5" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
24.8110983182Fürstenberg Premium PilsenerGerman Pilsener4.03.03.03.0biegaman3.5Haystack yellow with an energetic group of bu...2009-10-05 21:32:13
74.81669256Great WhiteWitbier4.54.54.54.5n0rc414.5Ok, for starters great white I believe will b...2009-10-05 21:34:29
214.6401118Dark IslandScottish Ale4.04.03.54.0abuliarose4.0Poured into a snifter, revealing black opaque...2009-10-05 21:47:36
224.9504418968Kipona FestMärzen / Oktoberfest4.03.54.04.0drcarver4.0A - a medium brown body with an off white hea...2009-10-05 21:47:56
284.6401118Dark IslandScottish Ale4.04.04.54.0sisuspeed4.0The color of this beer fits the name well. Op...2009-10-05 21:53:38
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "2 4.8 11098 3182 Fürstenberg Premium Pilsener \n", "7 4.8 1669 256 Great White \n", "21 4.6 401 118 Dark Island \n", "22 4.9 5044 18968 Kipona Fest \n", "28 4.6 401 118 Dark Island \n", "\n", " beer_style review_appearance review_aroma review_overall \\\n", "2 German Pilsener 4.0 3.0 3.0 \n", "7 Witbier 4.5 4.5 4.5 \n", "21 Scottish Ale 4.0 4.0 3.5 \n", "22 Märzen / Oktoberfest 4.0 3.5 4.0 \n", "28 Scottish Ale 4.0 4.0 4.5 \n", "\n", " review_palate profile_name review_taste \\\n", "2 3.0 biegaman 3.5 \n", "7 4.5 n0rc41 4.5 \n", "21 4.0 abuliarose 4.0 \n", "22 4.0 drcarver 4.0 \n", "28 4.0 sisuspeed 4.0 \n", "\n", " text time \n", "2 Haystack yellow with an energetic group of bu... 2009-10-05 21:32:13 \n", "7 Ok, for starters great white I believe will b... 2009-10-05 21:34:29 \n", "21 Poured into a snifter, revealing black opaque... 2009-10-05 21:47:36 \n", "22 A - a medium brown body with an off white hea... 2009-10-05 21:47:56 \n", "28 The color of this beer fits the name well. Op... 2009-10-05 21:53:38 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['abv'] < 5].head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
07.02511287Bell's Cherry StoutAmerican Stout4.54.04.54.0blaheath4.5Batch 8144\\tPitch black in color with a 1/2 f...2009-10-05 21:31:48
15.7197369790Duck-Rabbit PorterAmerican Porter4.54.04.54.0GJ404.0Sampled from a 12oz bottle in a standard pint...2009-10-05 21:32:09
24.8110983182Fürstenberg Premium PilsenerGerman Pilsener4.03.03.03.0biegaman3.5Haystack yellow with an energetic group of bu...2009-10-05 21:32:13
66.2531281114Smokin' Amber Kegs Gone WildAmerican Amber / Red Ale3.54.04.54.0Deuane4.5An American amber with the addition of smoked...2009-10-05 21:34:24
74.81669256Great WhiteWitbier4.54.54.54.5n0rc414.5Ok, for starters great white I believe will b...2009-10-05 21:34:29
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "0 7.0 2511 287 Bell's Cherry Stout \n", "1 5.7 19736 9790 Duck-Rabbit Porter \n", "2 4.8 11098 3182 Fürstenberg Premium Pilsener \n", "6 6.2 53128 1114 Smokin' Amber Kegs Gone Wild \n", "7 4.8 1669 256 Great White \n", "\n", " beer_style review_appearance review_aroma review_overall \\\n", "0 American Stout 4.5 4.0 4.5 \n", "1 American Porter 4.5 4.0 4.5 \n", "2 German Pilsener 4.0 3.0 3.0 \n", "6 American Amber / Red Ale 3.5 4.0 4.5 \n", "7 Witbier 4.5 4.5 4.5 \n", "\n", " review_palate profile_name review_taste \\\n", "0 4.0 blaheath 4.5 \n", "1 4.0 GJ40 4.0 \n", "2 3.0 biegaman 3.5 \n", "6 4.0 Deuane 4.5 \n", "7 4.5 n0rc41 4.5 \n", "\n", " text time \n", "0 Batch 8144\\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48 \n", "1 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09 \n", "2 Haystack yellow with an energetic group of bu... 2009-10-05 21:32:13 \n", "6 An American amber with the addition of smoked... 2009-10-05 21:34:24 \n", "7 Ok, for starters great white I believe will b... 2009-10-05 21:34:29 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[((df['abv'] < 5) & (df['time'] > pd.Timestamp('2009-06'))) | \n", " (df['review_overall'] >= 4.5)].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Be careful with the order of operations..." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 > 1 & 0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Safest to use parentheses..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(2 > 1) & 0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Select just the rows where the `beer_style` contains `'IPA'`:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "df.beer_style.str?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "df.beer_style.str." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 False\n", "3 True\n", "4 False\n", " ... \n", "994 False\n", "995 False\n", "996 False\n", "997 False\n", "998 False\n", "Name: beer_style, dtype: bool" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['beer_style'].str.contains('IPA')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
39.5285773818Unearthly (Imperial India Pale Ale)American Double / Imperial IPA4.04.04.04.0nick764.0The aroma has pine, wood, citrus, caramel, an...2009-10-05 21:32:37
86.76549140Northern Hemisphere Harvest Wet Hop AleAmerican IPA4.04.04.04.0david184.0I like all of Sierra Nevada's beers but felt ...2009-10-05 21:34:31
168.0361793818Hoppe (Imperial Extra Pale Ale)American Double / Imperial IPA4.03.04.03.5nick763.0The aroma is papery with citrus, yeast, and s...2009-10-05 21:43:23
236.544727596Portsmouth 5 C's IPAAmerican IPA4.55.05.04.5ALeF5.0As a devoted drinker of American and English ...2009-10-05 21:48:46
265.937477140Sierra Nevada Anniversary Ale (2007-2009)American IPA4.54.54.54.5n0rc414.5Poured a great dark color with great smell! t...2009-10-05 21:51:33
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) \n", "8 6.7 6549 140 Northern Hemisphere Harvest Wet Hop Ale \n", "16 8.0 36179 3818 Hoppe (Imperial Extra Pale Ale) \n", "23 6.5 44727 596 Portsmouth 5 C's IPA \n", "26 5.9 37477 140 Sierra Nevada Anniversary Ale (2007-2009) \n", "\n", " beer_style review_appearance review_aroma \\\n", "3 American Double / Imperial IPA 4.0 4.0 \n", "8 American IPA 4.0 4.0 \n", "16 American Double / Imperial IPA 4.0 3.0 \n", "23 American IPA 4.5 5.0 \n", "26 American IPA 4.5 4.5 \n", "\n", " review_overall review_palate profile_name review_taste \\\n", "3 4.0 4.0 nick76 4.0 \n", "8 4.0 4.0 david18 4.0 \n", "16 4.0 3.5 nick76 3.0 \n", "23 5.0 4.5 ALeF 5.0 \n", "26 4.5 4.5 n0rc41 4.5 \n", "\n", " text time \n", "3 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37 \n", "8 I like all of Sierra Nevada's beers but felt ... 2009-10-05 21:34:31 \n", "16 The aroma is papery with citrus, yeast, and s... 2009-10-05 21:43:23 \n", "23 As a devoted drinker of American and English ... 2009-10-05 21:48:46 \n", "26 Poured a great dark color with great smell! t... 2009-10-05 21:51:33 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['beer_style'].str.contains('IPA')].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Find the rows where the beer style is either `'American IPA'` or `'Pilsner'`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
86.76549140Northern Hemisphere Harvest Wet Hop AleAmerican IPA4.04.04.04.0david184.0I like all of Sierra Nevada's beers but felt ...2009-10-05 21:34:31
236.544727596Portsmouth 5 C's IPAAmerican IPA4.55.05.04.5ALeF5.0As a devoted drinker of American and English ...2009-10-05 21:48:46
265.937477140Sierra Nevada Anniversary Ale (2007-2009)American IPA4.54.54.54.5n0rc414.5Poured a great dark color with great smell! t...2009-10-05 21:51:33
327.56076651Flower Power India Pale AleAmerican IPA3.54.54.03.5OnThenIn4.0Appearance: The beer pours a rather cloudy da...2009-10-05 22:02:11
486.744749140Sierra Nevada Chico Estate Harvest Wet Hop Ale...American IPA4.53.54.04.5mikey7114.0I love this concept. Way to go Sierra Nevada!...2009-10-05 22:19:33
\n", "
" ], "text/plain": [ " abv beer_id brewer_id \\\n", "8 6.7 6549 140 \n", "23 6.5 44727 596 \n", "26 5.9 37477 140 \n", "32 7.5 6076 651 \n", "48 6.7 44749 140 \n", "\n", " beer_name beer_style \\\n", "8 Northern Hemisphere Harvest Wet Hop Ale American IPA \n", "23 Portsmouth 5 C's IPA American IPA \n", "26 Sierra Nevada Anniversary Ale (2007-2009) American IPA \n", "32 Flower Power India Pale Ale American IPA \n", "48 Sierra Nevada Chico Estate Harvest Wet Hop Ale... American IPA \n", "\n", " review_appearance review_aroma review_overall review_palate \\\n", "8 4.0 4.0 4.0 4.0 \n", "23 4.5 5.0 5.0 4.5 \n", "26 4.5 4.5 4.5 4.5 \n", "32 3.5 4.5 4.0 3.5 \n", "48 4.5 3.5 4.0 4.5 \n", "\n", " profile_name review_taste \\\n", "8 david18 4.0 \n", "23 ALeF 5.0 \n", "26 n0rc41 4.5 \n", "32 OnThenIn 4.0 \n", "48 mikey711 4.0 \n", "\n", " text time \n", "8 I like all of Sierra Nevada's beers but felt ... 2009-10-05 21:34:31 \n", "23 As a devoted drinker of American and English ... 2009-10-05 21:48:46 \n", "26 Poured a great dark color with great smell! t... 2009-10-05 21:51:33 \n", "32 Appearance: The beer pours a rather cloudy da... 2009-10-05 22:02:11 \n", "48 I love this concept. Way to go Sierra Nevada!... 2009-10-05 22:19:33 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['beer_style'] == 'American IPA') | (df['beer_style'] == 'Pilsner')].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Or more succinctly:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
86.76549140Northern Hemisphere Harvest Wet Hop AleAmerican IPA4.04.04.04.0david184.0I like all of Sierra Nevada's beers but felt ...2009-10-05 21:34:31
236.544727596Portsmouth 5 C's IPAAmerican IPA4.55.05.04.5ALeF5.0As a devoted drinker of American and English ...2009-10-05 21:48:46
265.937477140Sierra Nevada Anniversary Ale (2007-2009)American IPA4.54.54.54.5n0rc414.5Poured a great dark color with great smell! t...2009-10-05 21:51:33
327.56076651Flower Power India Pale AleAmerican IPA3.54.54.03.5OnThenIn4.0Appearance: The beer pours a rather cloudy da...2009-10-05 22:02:11
486.744749140Sierra Nevada Chico Estate Harvest Wet Hop Ale...American IPA4.53.54.04.5mikey7114.0I love this concept. Way to go Sierra Nevada!...2009-10-05 22:19:33
\n", "
" ], "text/plain": [ " abv beer_id brewer_id \\\n", "8 6.7 6549 140 \n", "23 6.5 44727 596 \n", "26 5.9 37477 140 \n", "32 7.5 6076 651 \n", "48 6.7 44749 140 \n", "\n", " beer_name beer_style \\\n", "8 Northern Hemisphere Harvest Wet Hop Ale American IPA \n", "23 Portsmouth 5 C's IPA American IPA \n", "26 Sierra Nevada Anniversary Ale (2007-2009) American IPA \n", "32 Flower Power India Pale Ale American IPA \n", "48 Sierra Nevada Chico Estate Harvest Wet Hop Ale... American IPA \n", "\n", " review_appearance review_aroma review_overall review_palate \\\n", "8 4.0 4.0 4.0 4.0 \n", "23 4.5 5.0 5.0 4.5 \n", "26 4.5 4.5 4.5 4.5 \n", "32 3.5 4.5 4.0 3.5 \n", "48 4.5 3.5 4.0 4.5 \n", "\n", " profile_name review_taste \\\n", "8 david18 4.0 \n", "23 ALeF 5.0 \n", "26 n0rc41 4.5 \n", "32 OnThenIn 4.0 \n", "48 mikey711 4.0 \n", "\n", " text time \n", "8 I like all of Sierra Nevada's beers but felt ... 2009-10-05 21:34:31 \n", "23 As a devoted drinker of American and English ... 2009-10-05 21:48:46 \n", "26 Poured a great dark color with great smell! t... 2009-10-05 21:51:33 \n", "32 Appearance: The beer pours a rather cloudy da... 2009-10-05 22:02:11 \n", "48 I love this concept. Way to go Sierra Nevada!... 2009-10-05 22:19:33 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['beer_style'].isin(['American IPA', 'Pilsner'])].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Mini Exercise\n", "\n", "- Select the rows where the scores of the 5 review_cols ('review_appearance', 'review_aroma', 'review_overall', 'review_palate', 'review_taste') are all at least 4.0.\n", "\n", "- _Hint_: Like NumPy arrays, DataFrames have an any and all methods that check whether it contains any or all True values. These methods also take an axis argument for the dimension to remove.\n", " - 0 or 'index' removes (or aggregates over) the vertical dimension\n", " - 1 or 'columns' removes (aggregates over) the horizontal dimension." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Index(['abv', 'beer_id', 'brewer_id', 'beer_name', 'beer_style',\n", " 'review_appearance', 'review_aroma', 'review_overall', 'review_palate',\n", " 'profile_name', 'review_taste', 'text', 'time'],\n", " dtype='object')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['review_appearance',\n", " 'review_aroma',\n", " 'review_overall',\n", " 'review_palate',\n", " 'review_taste']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "review_cols = [c for c in df.columns if c[0:6] == 'review']\n", "review_cols" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
07.02511287Bell's Cherry StoutAmerican Stout4.54.04.54.0blaheath4.5Batch 8144\\tPitch black in color with a 1/2 f...2009-10-05 21:31:48
15.7197369790Duck-Rabbit PorterAmerican Porter4.54.04.54.0GJ404.0Sampled from a 12oz bottle in a standard pint...2009-10-05 21:32:09
39.5285773818Unearthly (Imperial India Pale Ale)American Double / Imperial IPA4.04.04.04.0nick764.0The aroma has pine, wood, citrus, caramel, an...2009-10-05 21:32:37
74.81669256Great WhiteWitbier4.54.54.54.5n0rc414.5Ok, for starters great white I believe will b...2009-10-05 21:34:29
86.76549140Northern Hemisphere Harvest Wet Hop AleAmerican IPA4.04.04.04.0david184.0I like all of Sierra Nevada's beers but felt ...2009-10-05 21:34:31
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "0 7.0 2511 287 Bell's Cherry Stout \n", "1 5.7 19736 9790 Duck-Rabbit Porter \n", "3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) \n", "7 4.8 1669 256 Great White \n", "8 6.7 6549 140 Northern Hemisphere Harvest Wet Hop Ale \n", "\n", " beer_style review_appearance review_aroma \\\n", "0 American Stout 4.5 4.0 \n", "1 American Porter 4.5 4.0 \n", "3 American Double / Imperial IPA 4.0 4.0 \n", "7 Witbier 4.5 4.5 \n", "8 American IPA 4.0 4.0 \n", "\n", " review_overall review_palate profile_name review_taste \\\n", "0 4.5 4.0 blaheath 4.5 \n", "1 4.5 4.0 GJ40 4.0 \n", "3 4.0 4.0 nick76 4.0 \n", "7 4.5 4.5 n0rc41 4.5 \n", "8 4.0 4.0 david18 4.0 \n", "\n", " text time \n", "0 Batch 8144\\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48 \n", "1 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09 \n", "3 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37 \n", "7 Ok, for starters great white I believe will b... 2009-10-05 21:34:29 \n", "8 I like all of Sierra Nevada's beers but felt ... 2009-10-05 21:34:31 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['review_appearance'] >= 4) &\n", " (df['review_aroma'] >= 4) &\n", " (df['review_overall'] >= 4) &\n", " (df['review_palate'] >= 4) &\n", " (df['review_taste'] >= 4)].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Or the short way:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
07.02511287Bell's Cherry StoutAmerican Stout4.54.04.54.0blaheath4.5Batch 8144\\tPitch black in color with a 1/2 f...2009-10-05 21:31:48
15.7197369790Duck-Rabbit PorterAmerican Porter4.54.04.54.0GJ404.0Sampled from a 12oz bottle in a standard pint...2009-10-05 21:32:09
39.5285773818Unearthly (Imperial India Pale Ale)American Double / Imperial IPA4.04.04.04.0nick764.0The aroma has pine, wood, citrus, caramel, an...2009-10-05 21:32:37
74.81669256Great WhiteWitbier4.54.54.54.5n0rc414.5Ok, for starters great white I believe will b...2009-10-05 21:34:29
86.76549140Northern Hemisphere Harvest Wet Hop AleAmerican IPA4.04.04.04.0david184.0I like all of Sierra Nevada's beers but felt ...2009-10-05 21:34:31
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "0 7.0 2511 287 Bell's Cherry Stout \n", "1 5.7 19736 9790 Duck-Rabbit Porter \n", "3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) \n", "7 4.8 1669 256 Great White \n", "8 6.7 6549 140 Northern Hemisphere Harvest Wet Hop Ale \n", "\n", " beer_style review_appearance review_aroma \\\n", "0 American Stout 4.5 4.0 \n", "1 American Porter 4.5 4.0 \n", "3 American Double / Imperial IPA 4.0 4.0 \n", "7 Witbier 4.5 4.5 \n", "8 American IPA 4.0 4.0 \n", "\n", " review_overall review_palate profile_name review_taste \\\n", "0 4.5 4.0 blaheath 4.5 \n", "1 4.5 4.0 GJ40 4.0 \n", "3 4.0 4.0 nick76 4.0 \n", "7 4.5 4.5 n0rc41 4.5 \n", "8 4.0 4.0 david18 4.0 \n", "\n", " text time \n", "0 Batch 8144\\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48 \n", "1 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09 \n", "3 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37 \n", "7 Ok, for starters great white I believe will b... 2009-10-05 21:34:29 \n", "8 I like all of Sierra Nevada's beers but felt ... 2009-10-05 21:34:31 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df[review_cols] >= 4).all(1)].head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
review_appearancereview_aromareview_overallreview_palatereview_taste
0TrueTrueTrueTrueTrue
1TrueTrueTrueTrueTrue
2TrueFalseFalseFalseFalse
3TrueTrueTrueTrueTrue
4TrueFalseTrueFalseFalse
..................
994TrueTrueFalseTrueTrue
995TrueFalseTrueTrueTrue
996FalseTrueFalseFalseTrue
997TrueFalseFalseFalseTrue
998TrueTrueFalseTrueFalse
\n", "

999 rows × 5 columns

\n", "
" ], "text/plain": [ " review_appearance review_aroma review_overall review_palate review_taste\n", "0 True True True True True\n", "1 True True True True True\n", "2 True False False False False\n", "3 True True True True True\n", "4 True False True False False\n", ".. ... ... ... ... ...\n", "994 True True False True True\n", "995 True False True True True\n", "996 False True False False True\n", "997 True False False False True\n", "998 True True False True False\n", "\n", "[999 rows x 5 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df[review_cols] >= 4)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 True\n", "2 False\n", "3 True\n", "4 False\n", " ... \n", "994 False\n", "995 False\n", "996 False\n", "997 False\n", "998 False\n", "dtype: bool" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df[review_cols] >= 4).all(axis=1)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
07.02511287Bell's Cherry StoutAmerican Stout4.54.04.54.0blaheath4.5Batch 8144\\tPitch black in color with a 1/2 f...2009-10-05 21:31:48
15.7197369790Duck-Rabbit PorterAmerican Porter4.54.04.54.0GJ404.0Sampled from a 12oz bottle in a standard pint...2009-10-05 21:32:09
39.5285773818Unearthly (Imperial India Pale Ale)American Double / Imperial IPA4.04.04.04.0nick764.0The aroma has pine, wood, citrus, caramel, an...2009-10-05 21:32:37
74.81669256Great WhiteWitbier4.54.54.54.5n0rc414.5Ok, for starters great white I believe will b...2009-10-05 21:34:29
86.76549140Northern Hemisphere Harvest Wet Hop AleAmerican IPA4.04.04.04.0david184.0I like all of Sierra Nevada's beers but felt ...2009-10-05 21:34:31
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "0 7.0 2511 287 Bell's Cherry Stout \n", "1 5.7 19736 9790 Duck-Rabbit Porter \n", "3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) \n", "7 4.8 1669 256 Great White \n", "8 6.7 6549 140 Northern Hemisphere Harvest Wet Hop Ale \n", "\n", " beer_style review_appearance review_aroma \\\n", "0 American Stout 4.5 4.0 \n", "1 American Porter 4.5 4.0 \n", "3 American Double / Imperial IPA 4.0 4.0 \n", "7 Witbier 4.5 4.5 \n", "8 American IPA 4.0 4.0 \n", "\n", " review_overall review_palate profile_name review_taste \\\n", "0 4.5 4.0 blaheath 4.5 \n", "1 4.5 4.0 GJ40 4.0 \n", "3 4.0 4.0 nick76 4.0 \n", "7 4.5 4.5 n0rc41 4.5 \n", "8 4.0 4.0 david18 4.0 \n", "\n", " text time \n", "0 Batch 8144\\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48 \n", "1 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09 \n", "3 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37 \n", "7 Ok, for starters great white I believe will b... 2009-10-05 21:34:29 \n", "8 I like all of Sierra Nevada's beers but felt ... 2009-10-05 21:34:31 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df[review_cols] >= 4).all(1)].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now select rows where the _average_ of the 5 `review_cols` is at least 4." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 4.3\n", "1 4.2\n", "2 3.3\n", "3 4.0\n", "4 3.5\n", " ... \n", "994 4.0\n", "995 4.2\n", "996 3.5\n", "997 3.7\n", "998 3.7\n", "dtype: float64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[review_cols].mean(axis=1)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
07.02511287Bell's Cherry StoutAmerican Stout4.54.04.54.0blaheath4.5Batch 8144\\tPitch black in color with a 1/2 f...2009-10-05 21:31:48
15.7197369790Duck-Rabbit PorterAmerican Porter4.54.04.54.0GJ404.0Sampled from a 12oz bottle in a standard pint...2009-10-05 21:32:09
39.5285773818Unearthly (Imperial India Pale Ale)American Double / Imperial IPA4.04.04.04.0nick764.0The aroma has pine, wood, citrus, caramel, an...2009-10-05 21:32:37
57.0966365Pike Street XXXXX StoutAmerican Stout4.04.03.54.0sprucetip4.5From notes. Pours black, thin mocha head fade...2009-10-05 21:33:48
66.2531281114Smokin' Amber Kegs Gone WildAmerican Amber / Red Ale3.54.04.54.0Deuane4.5An American amber with the addition of smoked...2009-10-05 21:34:24
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name \\\n", "0 7.0 2511 287 Bell's Cherry Stout \n", "1 5.7 19736 9790 Duck-Rabbit Porter \n", "3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) \n", "5 7.0 966 365 Pike Street XXXXX Stout \n", "6 6.2 53128 1114 Smokin' Amber Kegs Gone Wild \n", "\n", " beer_style review_appearance review_aroma \\\n", "0 American Stout 4.5 4.0 \n", "1 American Porter 4.5 4.0 \n", "3 American Double / Imperial IPA 4.0 4.0 \n", "5 American Stout 4.0 4.0 \n", "6 American Amber / Red Ale 3.5 4.0 \n", "\n", " review_overall review_palate profile_name review_taste \\\n", "0 4.5 4.0 blaheath 4.5 \n", "1 4.5 4.0 GJ40 4.0 \n", "3 4.0 4.0 nick76 4.0 \n", "5 3.5 4.0 sprucetip 4.5 \n", "6 4.5 4.0 Deuane 4.5 \n", "\n", " text time \n", "0 Batch 8144\\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48 \n", "1 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09 \n", "3 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37 \n", "5 From notes. Pours black, thin mocha head fade... 2009-10-05 21:33:48 \n", "6 An American amber with the addition of smoked... 2009-10-05 21:34:24 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df[review_cols].mean(axis=1) >= 4].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Hierarchical Indexing" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- One of the most powerful and most complicated features of pandas\n", "- Let's you represent high-dimensional datasets in a table" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palatereview_tastetext
profile_namebeer_idtime
blaheath25112009-10-05 21:31:487.0287Bell's Cherry StoutAmerican Stout4.54.04.54.04.5Batch 8144\\tPitch black in color with a 1/2 f...
GJ40197362009-10-05 21:32:095.79790Duck-Rabbit PorterAmerican Porter4.54.04.54.04.0Sampled from a 12oz bottle in a standard pint...
biegaman110982009-10-05 21:32:134.83182Fürstenberg Premium PilsenerGerman Pilsener4.03.03.03.03.5Haystack yellow with an energetic group of bu...
nick76285772009-10-05 21:32:379.53818Unearthly (Imperial India Pale Ale)American Double / Imperial IPA4.04.04.04.04.0The aroma has pine, wood, citrus, caramel, an...
champ1033982009-10-05 21:33:145.8119Wolaver's Pale AleAmerican Pale Ale (APA)4.03.04.03.53.0A: Pours a slightly hazy golden/orange color....
\n", "
" ], "text/plain": [ " abv brewer_id \\\n", "profile_name beer_id time \n", "blaheath 2511 2009-10-05 21:31:48 7.0 287 \n", "GJ40 19736 2009-10-05 21:32:09 5.7 9790 \n", "biegaman 11098 2009-10-05 21:32:13 4.8 3182 \n", "nick76 28577 2009-10-05 21:32:37 9.5 3818 \n", "champ103 398 2009-10-05 21:33:14 5.8 119 \n", "\n", " beer_name \\\n", "profile_name beer_id time \n", "blaheath 2511 2009-10-05 21:31:48 Bell's Cherry Stout \n", "GJ40 19736 2009-10-05 21:32:09 Duck-Rabbit Porter \n", "biegaman 11098 2009-10-05 21:32:13 Fürstenberg Premium Pilsener \n", "nick76 28577 2009-10-05 21:32:37 Unearthly (Imperial India Pale Ale) \n", "champ103 398 2009-10-05 21:33:14 Wolaver's Pale Ale \n", "\n", " beer_style \\\n", "profile_name beer_id time \n", "blaheath 2511 2009-10-05 21:31:48 American Stout \n", "GJ40 19736 2009-10-05 21:32:09 American Porter \n", "biegaman 11098 2009-10-05 21:32:13 German Pilsener \n", "nick76 28577 2009-10-05 21:32:37 American Double / Imperial IPA \n", "champ103 398 2009-10-05 21:33:14 American Pale Ale (APA) \n", "\n", " review_appearance review_aroma \\\n", "profile_name beer_id time \n", "blaheath 2511 2009-10-05 21:31:48 4.5 4.0 \n", "GJ40 19736 2009-10-05 21:32:09 4.5 4.0 \n", "biegaman 11098 2009-10-05 21:32:13 4.0 3.0 \n", "nick76 28577 2009-10-05 21:32:37 4.0 4.0 \n", "champ103 398 2009-10-05 21:33:14 4.0 3.0 \n", "\n", " review_overall review_palate \\\n", "profile_name beer_id time \n", "blaheath 2511 2009-10-05 21:31:48 4.5 4.0 \n", "GJ40 19736 2009-10-05 21:32:09 4.5 4.0 \n", "biegaman 11098 2009-10-05 21:32:13 3.0 3.0 \n", "nick76 28577 2009-10-05 21:32:37 4.0 4.0 \n", "champ103 398 2009-10-05 21:33:14 4.0 3.5 \n", "\n", " review_taste \\\n", "profile_name beer_id time \n", "blaheath 2511 2009-10-05 21:31:48 4.5 \n", "GJ40 19736 2009-10-05 21:32:09 4.0 \n", "biegaman 11098 2009-10-05 21:32:13 3.5 \n", "nick76 28577 2009-10-05 21:32:37 4.0 \n", "champ103 398 2009-10-05 21:33:14 3.0 \n", "\n", " text \n", "profile_name beer_id time \n", "blaheath 2511 2009-10-05 21:31:48 Batch 8144\\tPitch black in color with a 1/2 f... \n", "GJ40 19736 2009-10-05 21:32:09 Sampled from a 12oz bottle in a standard pint... \n", "biegaman 11098 2009-10-05 21:32:13 Haystack yellow with an energetic group of bu... \n", "nick76 28577 2009-10-05 21:32:37 The aroma has pine, wood, citrus, caramel, an... \n", "champ103 398 2009-10-05 21:33:14 A: Pours a slightly hazy golden/orange color.... " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews = df.set_index(['profile_name', 'beer_id', 'time'])\n", "reviews.head()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palatereview_tastetext
profile_namebeer_idtime
05Harley13072009-10-06 00:10:068.5428Der Weisse BockWeizenbock4.04.04.04.04.0Can't find the date on this one.\\t\\tPurchased...
ADZA509942009-10-06 11:08:30NaN11611Saison De Coing (Quince Saison)Saison / Farmhouse Ale4.04.03.53.53.5I tried this breweries normal Saison ages ago...
ALeF447272009-10-05 21:48:466.5596Portsmouth 5 C's IPAAmerican IPA4.55.05.04.55.0As a devoted drinker of American and English ...
ATPete9452009-10-06 22:46:5410.0173AdamOld Ale4.04.54.04.04.512oz bottle\\t\\tPours a deep copper brown colo...
54282009-10-06 22:53:2610.0335New Holland Dragon's Milk Oak Barrel AleAmerican Stout3.54.54.04.04.022oz bottle\\t\\tPours a muddy brown color with...
\n", "
" ], "text/plain": [ " abv brewer_id \\\n", "profile_name beer_id time \n", "05Harley 1307 2009-10-06 00:10:06 8.5 428 \n", "ADZA 50994 2009-10-06 11:08:30 NaN 11611 \n", "ALeF 44727 2009-10-05 21:48:46 6.5 596 \n", "ATPete 945 2009-10-06 22:46:54 10.0 173 \n", " 5428 2009-10-06 22:53:26 10.0 335 \n", "\n", " beer_name \\\n", "profile_name beer_id time \n", "05Harley 1307 2009-10-06 00:10:06 Der Weisse Bock \n", "ADZA 50994 2009-10-06 11:08:30 Saison De Coing (Quince Saison) \n", "ALeF 44727 2009-10-05 21:48:46 Portsmouth 5 C's IPA \n", "ATPete 945 2009-10-06 22:46:54 Adam \n", " 5428 2009-10-06 22:53:26 New Holland Dragon's Milk Oak Barrel Ale \n", "\n", " beer_style \\\n", "profile_name beer_id time \n", "05Harley 1307 2009-10-06 00:10:06 Weizenbock \n", "ADZA 50994 2009-10-06 11:08:30 Saison / Farmhouse Ale \n", "ALeF 44727 2009-10-05 21:48:46 American IPA \n", "ATPete 945 2009-10-06 22:46:54 Old Ale \n", " 5428 2009-10-06 22:53:26 American Stout \n", "\n", " review_appearance review_aroma \\\n", "profile_name beer_id time \n", "05Harley 1307 2009-10-06 00:10:06 4.0 4.0 \n", "ADZA 50994 2009-10-06 11:08:30 4.0 4.0 \n", "ALeF 44727 2009-10-05 21:48:46 4.5 5.0 \n", "ATPete 945 2009-10-06 22:46:54 4.0 4.5 \n", " 5428 2009-10-06 22:53:26 3.5 4.5 \n", "\n", " review_overall review_palate \\\n", "profile_name beer_id time \n", "05Harley 1307 2009-10-06 00:10:06 4.0 4.0 \n", "ADZA 50994 2009-10-06 11:08:30 3.5 3.5 \n", "ALeF 44727 2009-10-05 21:48:46 5.0 4.5 \n", "ATPete 945 2009-10-06 22:46:54 4.0 4.0 \n", " 5428 2009-10-06 22:53:26 4.0 4.0 \n", "\n", " review_taste \\\n", "profile_name beer_id time \n", "05Harley 1307 2009-10-06 00:10:06 4.0 \n", "ADZA 50994 2009-10-06 11:08:30 3.5 \n", "ALeF 44727 2009-10-05 21:48:46 5.0 \n", "ATPete 945 2009-10-06 22:46:54 4.5 \n", " 5428 2009-10-06 22:53:26 4.0 \n", "\n", " text \n", "profile_name beer_id time \n", "05Harley 1307 2009-10-06 00:10:06 Can't find the date on this one.\\t\\tPurchased... \n", "ADZA 50994 2009-10-06 11:08:30 I tried this breweries normal Saison ages ago... \n", "ALeF 44727 2009-10-05 21:48:46 As a devoted drinker of American and English ... \n", "ATPete 945 2009-10-06 22:46:54 12oz bottle\\t\\tPours a deep copper brown colo... \n", " 5428 2009-10-06 22:53:26 22oz bottle\\t\\tPours a muddy brown color with... " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews = reviews.sort_index()\n", "reviews.head()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palatereview_tastetext
beer_idtime
13072009-10-06 00:10:068.5428Der Weisse BockWeizenbock4.04.04.04.04.0Can't find the date on this one.\\t\\tPurchased...
\n", "
" ], "text/plain": [ " abv brewer_id beer_name beer_style \\\n", "beer_id time \n", "1307 2009-10-06 00:10:06 8.5 428 Der Weisse Bock Weizenbock \n", "\n", " review_appearance review_aroma review_overall \\\n", "beer_id time \n", "1307 2009-10-06 00:10:06 4.0 4.0 4.0 \n", "\n", " review_palate review_taste \\\n", "beer_id time \n", "1307 2009-10-06 00:10:06 4.0 4.0 \n", "\n", " text \n", "beer_id time \n", "1307 2009-10-06 00:10:06 Can't find the date on this one.\\t\\tPurchased... " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc['05Harley']" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palatereview_tastetext
time
2009-10-06 00:10:068.5428Der Weisse BockWeizenbock4.04.04.04.04.0Can't find the date on this one.\\t\\tPurchased...
\n", "
" ], "text/plain": [ " abv brewer_id beer_name beer_style \\\n", "time \n", "2009-10-06 00:10:06 8.5 428 Der Weisse Bock Weizenbock \n", "\n", " review_appearance review_aroma review_overall \\\n", "time \n", "2009-10-06 00:10:06 4.0 4.0 4.0 \n", "\n", " review_palate review_taste \\\n", "time \n", "2009-10-06 00:10:06 4.0 4.0 \n", "\n", " text \n", "time \n", "2009-10-06 00:10:06 Can't find the date on this one.\\t\\tPurchased... " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc['05Harley',1307]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "ename": "TypeError", "evalue": "cannot do label indexing on with these indexers [1307] of ", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mreviews\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1307\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1292\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1293\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1294\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_tuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1295\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1296\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_tuple\u001b[0;34m(self, tup)\u001b[0m\n\u001b[1;32m 782\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_getitem_tuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 783\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 784\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_lowerdim\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 785\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mIndexingError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 786\u001b[0m \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_lowerdim\u001b[0;34m(self, tup)\u001b[0m\n\u001b[1;32m 889\u001b[0m \u001b[0;31m# we may have a nested tuples indexer here\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 890\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_is_nested_tuple_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 891\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_nested_tuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 892\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 893\u001b[0m \u001b[0;31m# we maybe be using a tuple to represent multiple dimensions here\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_nested_tuple\u001b[0;34m(self, tup)\u001b[0m\n\u001b[1;32m 962\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 963\u001b[0m \u001b[0mcurrent_ndim\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 964\u001b[0;31m \u001b[0mobj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 965\u001b[0m \u001b[0maxis\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 966\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1464\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1465\u001b[0m \u001b[0;31m# fall thru to straight lookup\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1466\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_has_valid_type\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1467\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_label\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1468\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_has_valid_type\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 1391\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1392\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1393\u001b[0;31m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_convert_scalar_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1394\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32min\u001b[0m \u001b[0max\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1395\u001b[0m \u001b[0merror\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_convert_scalar_indexer\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0max\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 186\u001b[0m \u001b[0;31m# a scalar\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 187\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0max\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_convert_scalar_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 188\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_convert_slice_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py\u001b[0m in \u001b[0;36m_convert_scalar_indexer\u001b[0;34m(self, key, kind)\u001b[0m\n\u001b[1;32m 1010\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mkind\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'loc'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1011\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mholds_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1012\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_invalid_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'label'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1013\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1014\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/kfor/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py\u001b[0m in \u001b[0;36m_invalid_indexer\u001b[0;34m(self, form, key)\u001b[0m\n\u001b[1;32m 1123\u001b[0m \"indexers [{key}] of {kind}\".format(\n\u001b[1;32m 1124\u001b[0m \u001b[0mform\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mform\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mklass\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1125\u001b[0;31m kind=type(key)))\n\u001b[0m\u001b[1;32m 1126\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1127\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_duplicates\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: cannot do label indexing on with these indexers [1307] of " ] } ], "source": [ "reviews.loc[:,1307]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Top Reviewers\n", "\n", "Let's select all the reviews by the top reviewers, by label." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Index(['corby112', 'Anthony1', 'nickd717', 'rfgetz', 'BigMcLargeHuge'], dtype='object')" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_reviewers = df['profile_name'].value_counts().head(5).index\n", "top_reviewers" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palatereview_tastetext
profile_namebeer_idtime
Anthony1992009-10-06 13:38:407.6142Spaten OptimatorDoppelbock3.53.53.53.03.5Dark reddish brown body with a one finger tan...
1012009-10-06 13:54:265.535Samuel Adams Winter LagerBock3.53.03.03.03.5has a nice brownish/amber appearance, filtere...
1022009-10-06 13:51:495.335Samuel Adams OctoberfestMärzen / Oktoberfest3.54.03.53.53.5Pours a bright, clear copper with a fluffy, l...
1032009-10-06 15:18:175.335Samuel Adams Summer AleAmerican Pale Wheat Ale3.03.04.03.54.0Pours a somewhat hazy medium gold color with ...
1042009-10-06 13:51:024.935Samuel Adams Boston LagerVienna Lager3.53.03.53.03.5Pours a into a pint glass with a light copper...
\n", "
" ], "text/plain": [ " abv brewer_id \\\n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 7.6 142 \n", " 101 2009-10-06 13:54:26 5.5 35 \n", " 102 2009-10-06 13:51:49 5.3 35 \n", " 103 2009-10-06 15:18:17 5.3 35 \n", " 104 2009-10-06 13:51:02 4.9 35 \n", "\n", " beer_name \\\n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 Spaten Optimator \n", " 101 2009-10-06 13:54:26 Samuel Adams Winter Lager \n", " 102 2009-10-06 13:51:49 Samuel Adams Octoberfest \n", " 103 2009-10-06 15:18:17 Samuel Adams Summer Ale \n", " 104 2009-10-06 13:51:02 Samuel Adams Boston Lager \n", "\n", " beer_style \\\n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 Doppelbock \n", " 101 2009-10-06 13:54:26 Bock \n", " 102 2009-10-06 13:51:49 Märzen / Oktoberfest \n", " 103 2009-10-06 15:18:17 American Pale Wheat Ale \n", " 104 2009-10-06 13:51:02 Vienna Lager \n", "\n", " review_appearance review_aroma \\\n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 3.5 3.5 \n", " 101 2009-10-06 13:54:26 3.5 3.0 \n", " 102 2009-10-06 13:51:49 3.5 4.0 \n", " 103 2009-10-06 15:18:17 3.0 3.0 \n", " 104 2009-10-06 13:51:02 3.5 3.0 \n", "\n", " review_overall review_palate \\\n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 3.5 3.0 \n", " 101 2009-10-06 13:54:26 3.0 3.0 \n", " 102 2009-10-06 13:51:49 3.5 3.5 \n", " 103 2009-10-06 15:18:17 4.0 3.5 \n", " 104 2009-10-06 13:51:02 3.5 3.0 \n", "\n", " review_taste \\\n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 3.5 \n", " 101 2009-10-06 13:54:26 3.5 \n", " 102 2009-10-06 13:51:49 3.5 \n", " 103 2009-10-06 15:18:17 4.0 \n", " 104 2009-10-06 13:51:02 3.5 \n", "\n", " text \n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 Dark reddish brown body with a one finger tan... \n", " 101 2009-10-06 13:54:26 has a nice brownish/amber appearance, filtere... \n", " 102 2009-10-06 13:51:49 Pours a bright, clear copper with a fluffy, l... \n", " 103 2009-10-06 15:18:17 Pours a somewhat hazy medium gold color with ... \n", " 104 2009-10-06 13:51:02 Pours a into a pint glass with a light copper... " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[top_reviewers, :, :].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The syntax is a bit trickier when you want to specify a row Indexer *and* a column Indexer:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m reviews.loc[(top_reviewers, 99, :), ['beer_name', 'brewer_name']]\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "reviews.loc[(top_reviewers, 99, :), ['beer_name', 'brewer_name']]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
beer_namebrewer_id
profile_namebeer_idtime
Anthony1992009-10-06 13:38:40Spaten Optimator142
\n", "
" ], "text/plain": [ " beer_name brewer_id\n", "profile_name beer_id time \n", "Anthony1 99 2009-10-06 13:38:40 Spaten Optimator 142" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[pd.IndexSlice[top_reviewers, 99, :], ['beer_name', 'brewer_id']]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Use `.loc` to select the `beer_name` and `beer_style` for the 10 most popular beers, as measured by number of reviews:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Int64Index([52077, 38394, 53159, 52371, 6260, 13896, 52535, 102, 44932, 35738], dtype='int64')" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_beers = df['beer_id'].value_counts().head(10).index\n", "top_beers" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
beer_namebeer_style
profile_namebeer_idtime
ATPete449322009-10-06 22:15:41Autumn MapleFruit / Vegetable Beer
523712009-10-06 23:05:28Sierra Nevada Estate Brewers Harvest AleAmerican IPA
Anthony11022009-10-06 13:51:49Samuel Adams OctoberfestMärzen / Oktoberfest
383942009-10-06 16:54:54PumkingPumpkin Ale
ArrogantB523712009-10-06 01:51:24Sierra Nevada Estate Brewers Harvest AleAmerican IPA
...............
spartanfan520772009-10-06 22:48:27A Little Sumpin' Extra! AleAmerican Double / Imperial IPA
stewart12462602009-10-06 03:37:57Punkin AlePumpkin Ale
tobyandgina523712009-10-06 05:21:57Sierra Nevada Estate Brewers Harvest AleAmerican IPA
ujsplace383942009-10-05 23:41:32PumkingPumpkin Ale
zplug123525352009-10-07 01:42:44Terrapin Midnight Project Depth Charge Espress...Milk / Sweet Stout
\n", "

49 rows × 2 columns

\n", "
" ], "text/plain": [ " beer_name \\\n", "profile_name beer_id time \n", "ATPete 44932 2009-10-06 22:15:41 Autumn Maple \n", " 52371 2009-10-06 23:05:28 Sierra Nevada Estate Brewers Harvest Ale \n", "Anthony1 102 2009-10-06 13:51:49 Samuel Adams Octoberfest \n", " 38394 2009-10-06 16:54:54 Pumking \n", "ArrogantB 52371 2009-10-06 01:51:24 Sierra Nevada Estate Brewers Harvest Ale \n", "... ... \n", "spartanfan 52077 2009-10-06 22:48:27 A Little Sumpin' Extra! Ale \n", "stewart124 6260 2009-10-06 03:37:57 Punkin Ale \n", "tobyandgina 52371 2009-10-06 05:21:57 Sierra Nevada Estate Brewers Harvest Ale \n", "ujsplace 38394 2009-10-05 23:41:32 Pumking \n", "zplug123 52535 2009-10-07 01:42:44 Terrapin Midnight Project Depth Charge Espress... \n", "\n", " beer_style \n", "profile_name beer_id time \n", "ATPete 44932 2009-10-06 22:15:41 Fruit / Vegetable Beer \n", " 52371 2009-10-06 23:05:28 American IPA \n", "Anthony1 102 2009-10-06 13:51:49 Märzen / Oktoberfest \n", " 38394 2009-10-06 16:54:54 Pumpkin Ale \n", "ArrogantB 52371 2009-10-06 01:51:24 American IPA \n", "... ... \n", "spartanfan 52077 2009-10-06 22:48:27 American Double / Imperial IPA \n", "stewart124 6260 2009-10-06 03:37:57 Pumpkin Ale \n", "tobyandgina 52371 2009-10-06 05:21:57 American IPA \n", "ujsplace 38394 2009-10-05 23:41:32 Pumpkin Ale \n", "zplug123 52535 2009-10-07 01:42:44 Milk / Sweet Stout \n", "\n", "[49 rows x 2 columns]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[pd.IndexSlice[:, top_beers], ['beer_name', 'beer_style']]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Beware \"chained indexing\"\n", "\n", "You can sometimes get away with using `[...][...]`, but try to avoid it!" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3 Unearthly (Imperial India Pale Ale)\n", "8 Northern Hemisphere Harvest Wet Hop Ale\n", "16 Hoppe (Imperial Extra Pale Ale)\n", "23 Portsmouth 5 C's IPA\n", "26 Sierra Nevada Anniversary Ale (2007-2009)\n", " ... \n", "959 A Little Sumpin' Extra! Ale\n", "962 Hop-a-lot-amus\n", "971 Founders Devil Dancer\n", "972 Dreadnaught IPA\n", "984 15th Anniversary Wood Aged\n", "Name: beer_name, dtype: object" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['beer_style'].str.contains('IPA')]['beer_name']" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/kfor/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " if __name__ == '__main__':\n" ] } ], "source": [ "df.loc[df['beer_style'].str.contains('IPA')]['beer_name'] = 'yummy'" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3 Unearthly (Imperial India Pale Ale)\n", "8 Northern Hemisphere Harvest Wet Hop Ale\n", "16 Hoppe (Imperial Extra Pale Ale)\n", "23 Portsmouth 5 C's IPA\n", "26 Sierra Nevada Anniversary Ale (2007-2009)\n", " ... \n", "959 A Little Sumpin' Extra! Ale\n", "962 Hop-a-lot-amus\n", "971 Founders Devil Dancer\n", "972 Dreadnaught IPA\n", "984 15th Anniversary Wood Aged\n", "Name: beer_name, dtype: object" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['beer_style'].str.contains('IPA')]['beer_name']" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abvbeer_idbrewer_idbeer_namebeer_stylereview_appearancereview_aromareview_overallreview_palateprofile_namereview_tastetexttime
39.5285773818yummyAmerican Double / Imperial IPA4.04.04.04.0nick764.0The aroma has pine, wood, citrus, caramel, an...2009-10-05 21:32:37
86.76549140yummyAmerican IPA4.04.04.04.0david184.0I like all of Sierra Nevada's beers but felt ...2009-10-05 21:34:31
\n", "
" ], "text/plain": [ " abv beer_id brewer_id beer_name beer_style \\\n", "3 9.5 28577 3818 yummy American Double / Imperial IPA \n", "8 6.7 6549 140 yummy American IPA \n", "\n", " review_appearance review_aroma review_overall review_palate \\\n", "3 4.0 4.0 4.0 4.0 \n", "8 4.0 4.0 4.0 4.0 \n", "\n", " profile_name review_taste \\\n", "3 nick76 4.0 \n", "8 david18 4.0 \n", "\n", " text time \n", "3 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37 \n", "8 I like all of Sierra Nevada's beers but felt ... 2009-10-05 21:34:31 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['beer_style'].str.contains('IPA'), 'beer_name'] = 'yummy'\n", "df.loc[df['beer_style'].str.contains('IPA')].head(2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Dates and Times" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Date and time data are inherently problematic\n", " - An unequal number of days in every month\n", " - An unequal number of days in a year (due to leap years)\n", " - Time zones that vary over space\n", " - etc\n", " \n", "- The datetime built-in library handles temporal information down to the nanosecond" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from datetime import datetime" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2016, 10, 18, 21, 29, 7, 760576)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "now = datetime.now()\n", "now" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "18" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "now.day" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "now.weekday()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Having a custom data type for dates and times is convenient because we can perform operations on them easily. \n", "\n", "For example, we may want to calculate the difference between two times:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "datetime.timedelta(11289, 77347, 760576)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kfor = now - datetime(1985, 11, 21)\n", "kfor" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'30'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'{:g}'.format(kfor.days // 365.25)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "See [the docs](http://pandas.pydata.org/pandas-docs/stable/timeseries.html) for more information on Pandas' complex time and date functionalities..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Example\n", "\n", "In this section, we will manipulate data collected from ocean-going vessels on the eastern seaboard. Vessel operations are monitored using the Automatic Identification System (AIS), a safety at sea navigation technology which vessels are required to maintain and that uses transponders to transmit very high frequency (VHF) radio signals containing static information including ship name, call sign, and country of origin, as well as dynamic information unique to a particular voyage such as vessel location, heading, and speed. \n", "\n", "The International Maritime Organization’s (IMO) International Convention for the Safety of Life at Sea requires functioning AIS capabilities on all vessels 300 gross tons or greater and the US Coast Guard requires AIS on nearly all vessels sailing in U.S. waters. The Coast Guard has established a national network of AIS receivers that provides coverage of nearly all U.S. waters. AIS signals are transmitted several times each minute and the network is capable of handling thousands of reports per minute and updates as often as every two seconds. Therefore, a typical voyage in our study might include the transmission of hundreds or thousands of AIS encoded signals. This provides a rich source of spatial data that includes both spatial and temporal information.\n", "\n", "For our purposes, we will use summarized data that describes the transit of a given vessel through a particular administrative area. The data includes the start and end time of the transit segment, as well as information about the speed of the vessel, how far it travelled, etc." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mmsinametransitsegmentseg_lengthavg_sogmin_sogmax_sogpdgt10st_timeend_time
01Us Govt Ves115.113.29.214.596.52/10/09 16:032/10/09 16:27
11Dredge Capt Frank1113.518.610.420.6100.04/6/09 14:314/6/09 15:20
21Us Gov Vessel114.316.210.320.5100.04/6/09 14:364/6/09 14:55
31Us Gov Vessel219.215.414.516.1100.04/10/09 17:584/10/09 18:34
41Dredge Capt Frank219.215.414.616.2100.04/10/09 17:594/10/09 18:35
\n", "
" ], "text/plain": [ " mmsi name transit segment seg_length avg_sog min_sog \\\n", "0 1 Us Govt Ves 1 1 5.1 13.2 9.2 \n", "1 1 Dredge Capt Frank 1 1 13.5 18.6 10.4 \n", "2 1 Us Gov Vessel 1 1 4.3 16.2 10.3 \n", "3 1 Us Gov Vessel 2 1 9.2 15.4 14.5 \n", "4 1 Dredge Capt Frank 2 1 9.2 15.4 14.6 \n", "\n", " max_sog pdgt10 st_time end_time \n", "0 14.5 96.5 2/10/09 16:03 2/10/09 16:27 \n", "1 20.6 100.0 4/6/09 14:31 4/6/09 15:20 \n", "2 20.5 100.0 4/6/09 14:36 4/6/09 14:55 \n", "3 16.1 100.0 4/10/09 17:58 4/10/09 18:34 \n", "4 16.2 100.0 4/10/09 17:59 4/10/09 18:35 " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segments = pd.read_csv('data/AIS/transit_segments.csv')\n", "segments.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For example, we might be interested in the distribution of transit lengths, so we can plot them as a histogram:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAigAAAFkCAYAAAAKf8APAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3X+UHQWZ5//3J2DQMMsPRdIwkpU9OJB1WaUbIVlH1GE2\njAHuOOAYAiyQzAzqJDlsz5owfhdPd+B79mviLKBJnMNCZlh+dYK4hjEwJuAMOAGEYzerKDQ7qEyL\n+YEt4cemhUDyfP+oaqy+6YR0pZK6Vf15nXOPdtVzbz3VH5L7pG5VXUUEZmZmZq1kQtkNmJmZmTXz\ngGJmZmYtxwOKmZmZtRwPKGZmZtZyPKCYmZlZy/GAYmZmZi3HA4qZmZm1HA8oZmZm1nI8oJiZmVnL\n8YBiZmZmLWdMA4qkCZKukfRTSUOSnpF01Sh1V0vamNbcJ+mEpvWHSFohaVDSK5LuknR0U82Rkm6X\n9JKkrZJuknRoU81xku6RtE3SZklLJXnoMjMzq7ixvpn/JfAZ4M+Bk4BFwCJJ84cLJF0JzAcuB04D\ntgHrJE3MvM71wNnA+cAZwLHAN5q2dQcwFTgzrT0DuCGznQnAvcDBwDTgUuAy4Oox7pOZmZm1GI3l\nywIlfQvYHBF/lll2FzAUEZekP28EvhwR16U/HwZsAS6NiDvTn38JXBAR30xrTgSeAqZFxGOSpgI/\nBjoi4vG05izgHuA9EbFZ0ieAvwOOiYjBtOYzwJeAd0fEG/l/LWZmZlamsR5BeRg4U9L7ACR9APgw\nyZEMJB0PtAHfGX5CRLwMPApMTxedSnLUI1vzNDCQqZkGbB0eTlL3AwGcnql5Yng4Sa0DDgfeP8b9\nMjMzsxZy8BjrvwQcBvRL2kEy4PzXiFiVrm8jGSK2ND1vS7oOYDKwPR1cdlfTBjyfXRkROyS90FQz\n2naG1/2guXlJ7wLOAp4FXt3tXpqZmVmztwPvBdZFxK/298bGOqDMAi4ELgCeBD4IfEXSxoi4tejm\n9oOzgNvLbsLMzKzCLiI5T3S/GuuAshT4/yLi6+nPP5b0XuALwK3AZkAkR0myRzcmA8Mf12wGJko6\nrOkoyuR03XBN81U9BwHvbKr5UFN/kzPrRvMswG233cbUqVN3t49WIZ2dnVx33XVlt2EFcZ7140zr\n46mnnuLiiy+G9L10fxvrgDIJ2NG0bCfpuSwR8TNJm0muvPkhvHmS7OnAirS+F3gjrcmeJDsFeCSt\neQQ4QtIpmfNQziQZfh7N1Pw/ko7KnIcyA3iJ5OjOaF4FmDp1Ku3t7WPbc2tJhx9+uLOsEedZP860\nlg7IKRJjHVC+BVwl6TmSq2zagU7gpkzN9WnNMyRT1jXAc8DdkJw0K2klcK2krcArwFeBhyLisbSm\nX9I64EZJnwMmAsuAnogYPjqynmQQuTW9tPmYdFvLI+L1Me6XVdTmzbs7WGZV5Dzrx5laXmMdUOaT\nDAErSD6C2Qj8dboMgIhYKmkSyT1LjgD+CfhERGzPvE4nyZGYu4BDgG8D85q2dSGwnOTqnZ1p7RWZ\n7eyUdE66/YdJ7rdyM9A1xn2yCvvFL35RdgtWIOdZP87U8hrTgBIR24C/SB97qusGuvew/jVgQfrY\nXc2LwMVvsZ2fA+fsqcbqraOjo+wWrEDOs36cqeXl28Jbpc2ePbvsFqxAzrN+nKnl5QHFKs1/+dWL\n86wfZ2p5eUAxMzOzluMBxSptzpw5ZbdgBXKe9eNMLS8PKFZpM2bMKLsFK5DzrB9nanl5QLFK8+fb\n9eI868eZWl4eUMzMzKzleEAxMzOzluMBxSptw4YNZbdgBXKe9eNMLS8PKFZpS5cuLbsFK5DzrB9n\nanl5QLFKW7VqVdktWIGcZ/04U8vLA4pV2qRJk8puwQrkPOvHmVpeHlDMzMys5XhAMTMzs5bjAcUq\nbeHChWW3YAVynvXjTC0vDyhWaVOmTCm7BSuQ86wfZ2p5KSLK7uGAkdQO9Pb29tLe3l52O2ZmZpXR\n19dHR0cHQEdE9O3v7fkIipmZmbUcDyhmZmbWcjygWKX19/eX3YIVyHnWjzO1vDygWKUtWrSo7Bas\nQM6zfpyp5eUBxSpt+fLlZbdgBXKe9eNMLS8PKFZpvoSxXpxn/ThTy8sDipmZmbWccT+gbNq0ie7u\nbjZt2lR2K2ZmZpbygLJpE4sXL/aAUlFLliwpuwUrkPOsH2dqeY37AcWqbWhoqOwWrEDOs36cqeXl\nAcUqbfHixWW3YAVynvXjTC2vMQ0okn4maecoj2WZmqslbZQ0JOk+SSc0vcYhklZIGpT0iqS7JB3d\nVHOkpNslvSRpq6SbJB3aVHOcpHskbZO0WdJSSR64zMzMamCsb+inAm2Zx38EArgTQNKVwHzgcuA0\nYBuwTtLEzGtcD5wNnA+cARwLfKNpO3cAU4Ez09ozgBuGV6aDyL3AwcA04FLgMuDqMe6PmZmZtaAx\nDSgR8auIeH74AZwL/CQi/iktuQK4JiLWRsSPgEtIBpBPAkg6DJgLdEbEgxHxODAH+LCk09KaqcBZ\nwJ9ExPcj4mFgAXCBpLZ0O2cBJwEXRcQTEbEO+CIwT9LBeX8ZVj2Dg4Nlt2AFcp7140wtr9wfiUh6\nG3ARsDL9+XiSoyrfGa6JiJeBR4Hp6aJTSY56ZGueBgYyNdOArenwMux+kiM1p2dqnoiI7H/564DD\ngffn3Sernrlz55bdghXIedaPM7W89uWcjT8iGQj+Z/pzG8kQsaWpbku6DmAysD0dXHZX0wY8n10Z\nETuAF5pqRtsOmRobB7q7u8tuwQrkPOvHmVpe+zKgzAX+PiI2F9XMgTJz5kwajQaNRoPOzk4ALr30\nUtasWTOibv369TQajV2eP2/ePFauXDliWV9fH41GY5fDmV1dXbvcB2BgYIBGo7HLt3wuW7aMhQsX\njlg2NDREo9Fgw4YNI5b39PQwZ86cXXqbNWvWuNqP9vb2WuwH1COPfd2P9vb2WuwH1COPIvajvb29\nFvsB9chjb/ejp6eHRqPB9OnTaWtrG/F+eaAoIsb+JGkK8FPgkxGxNl12PPAT4IMR8cNM7QPA4xHR\nKenjJB/XHJk9iiLpWeC6iPiKpDnAX0XEuzLrDwJeBT4VEXdLWgycGxHtmZr3pj2dEhE/2E3f7UBv\nb28v7e3JU/v6+ujo6CC7zMzMzEYafr8EOiKib39vL+8RlLkkH6ncO7wgIn4GbCa58gZ486TY04GH\n00W9wBtNNScCU4BH0kWPAEdIOiWzvTMBkZzPMlxzsqSjMjUzgJeAJ3Puk5mZmbWIMQ8okkRySe/N\nEbGzafX1wFWSzpV0MnAL8BxwN7x50uxK4FpJH5PUAfwN8FBEPJbW9JOc8HqjpA9J+jCwDOjJfJy0\nnmQQuVXSv5d0FnANsDwiXh/rPll1NR8qtWpznvXjTC2vPEdQfh84Dvjb5hURsZRkmLiB5GjHO4BP\nRMT2TFknsBa4C3gA2EhyT5SsC4F+ko+D1gLfBT6T2c5O4BxgB8nRmVuAm4GuHPtjFdbXt9+PMtoB\n5Dzrx5laXrnOQakqn4NiZmaWT1XOQTEzMzPbbzygmJmZWcvxgGJmZmYtxwOKVdpoN0Ky6nKe9eNM\nLS8PKFZp8+fPL7sFK5DzrB9nanl5QLFKmzFjRtktWIGcZ/04U8vLA4qZmZm1HA8oZmZm1nI8oFil\nNX+DqFWb86wfZ2p5eUCxSuvp6Sm7BSuQ86wfZ2p5eUCxSlu9enXZLViBnGf9OFPLywOKmZmZtRwP\nKGZmZtZyPKCYmZlZy/GAYpU2Z86csluwAjnP+nGmlpcHFKs036WyXpxn/ThTy8sDilXa7Nmzy27B\nCuQ868eZWl4eUMzMzKzleEAxMzOzluMBxSptw4YNZbdgBXKe9eNMLS8PKFZpS5cuLbsFK5DzrB9n\nanl5QLFKW7VqVdktWIGcZ/04U8vLA4pV2qRJk8puwQrkPOvHmVpeHlDMzMys5XhAMTMzs5bjAcUq\nbeHChWW3YAVynvXjTC0vDyhWaVOmTCm7BSuQ86wfZ2p5jXlAkXSspFslDUoakvQDSe1NNVdL2piu\nv0/SCU3rD5G0In2NVyTdJenoppojJd0u6SVJWyXdJOnQpprjJN0jaZukzZKWSvLQNY4sWLCg7Bas\nQM6zfpyp5TWmN3NJRwAPAa8BZwFTgf8CbM3UXAnMBy4HTgO2AeskTcy81PXA2cD5wBnAscA3mjZ3\nR/r6Z6a1ZwA3ZLYzAbgXOBiYBlwKXAZcPZZ9MjMzs9Zz8Bjr/xIYiIg/zSz7l6aaK4BrImItgKRL\ngC3AJ4E7JR0GzAUuiIgH05o5wFOSTouIxyRNJRmAOiLi8bRmAXCPpM9HxOZ0/UnAxyNiEHhC0heB\nL0nqjog3xrhvZmZm1iLG+nHIucD3Jd0paYukPklvDiuSjgfagO8ML4uIl4FHgenpolNJBqNszdPA\nQKZmGrB1eDhJ3Q8EcHqm5ol0OBm2DjgceP8Y98sqqr+/v+wWrEDOs36cqeU11gHl3wCfA54GZgB/\nDXxV0n9K17eRDBFbmp63JV0HMBnYng4uu6tpA57ProyIHcALTTWjbYdMjdXcokWLym7BCuQ868eZ\nWl5j/YhnAvBYRHwx/fkHkv4d8Fng1kI7M9sLy5cvL7sFK5DzrB9nanmN9QjKJuCppmVPAcPXkW0G\nRHKUJGtyum64ZmJ6Lsqeapqv6jkIeGdTzWjbIVMzqpkzZ9JoNGg0GnR2dgJw6aWXsmbNmhF169ev\np9Fo7PL8efPmsXLlyhHL+vr6aDQaDA4Ojlje1dXFkiVLRiwbGBig0Wjscuhz2bJlu9wzYGhoiEaj\nscs3gvb09DBnzpxdeps1a9a42o8pU6bUYj+gHnns635MmTKlFvsB9cijiP2YMmVKLfYD6pHH3u5H\nT08PjUaD6dOn09bWNuL98kBRROx9sXQ78J6I+Ghm2XXAhyLid9OfNwJfjojr0p8PI/no5ZKI+Hr6\n8y9JTpL9ZlpzIsmgMy09SfYk4MfAqZmTZGeQXLXznojYLOkPgG8BxwyfhyLpcmAJcHREvD5K/+1A\nb29vL+3tyZXRfX19dHR0kF1mZmZmIw2/X5JcwNK3v7c31o94rgMekvQF4E6SE1b/FPizTM31wFWS\nngGeBa4BngPuhuSkWUkrgWslbQVeAb4KPBQRj6U1/ZLWATdK+hwwEVgG9KRX8ACsB54Ebk0vbT4m\n3dby0YYTMzMzq44xfcQTEd8H/giYDTwB/FfgiohYlalZSjJM3EBy9c47gE9ExPbMS3UCa4G7gAeA\njST3RMm6EOgnuXpnLfBd4DOZ7ewEzgF2AA8DtwA3A11j2SertubDn1ZtzrN+nKnlNdYjKETEvSQf\nteypphvo3sP614AF6WN3NS8CF7/Fdn5OMqTYODU0NFR2C1Yg51k/ztTy8m3hrdIWL15cdgtWIOdZ\nP87U8vKAYmZmZi3HA4qZmZm1HA8oVmnN9w2wanOe9eNMLS8PKFZpc+fOLbsFK5DzrB9nanl5QLFK\n6+7uLrsFK5DzrB9nanl5QLFK891/68V51o8ztbw8oJiZmVnL8YBiZmZmLccDilVa87eCWrU5z/px\nppaXBxSrtL6+/f6FmnYAOc/6caaWlwcUq7QVK1aU3YIVyHnWjzO1vDygmJmZWcvxgGJmZmYtxwOK\nmZmZtRwPKFZpjUaj7BasQM6zfpyp5eUBxSpt/vz5ZbdgBXKe9eNMLS8PKFZpM2bMKLsFK5DzrB9n\nanl5QDEzM7OWM+4HlE2bNpXdgpmZmTUZ1wPKwMAA5533KcCDSlWtWbOm7BasQM6zfpyp5TWuB5TB\nwUG2b38VgBdffLHkbiyPnp6esluwAjnP+nGmlte4HlCs+lavXl12C1Yg51k/ztTy8oBiZmZmLccD\nipmZmbUcDyhmZmbWcjygWKXNmTOn7BasQM6zfpyp5eUBxSrNd6msF+dZP87U8hrTgCKpS9LOpseT\nTTVXS9ooaUjSfZJOaFp/iKQVkgYlvSLpLklHN9UcKel2SS9J2irpJkmHNtUcJ+keSdskbZa0VJIH\nrnFm9uzZZbdgBXKe9eNMLa88b+g/AiYDbenjd4dXSLoSmA9cDpwGbAPWSZqYef71wNnA+cAZwLHA\nN5q2cQcwFTgzrT0DuCGznQnAvcDBwDTgUuAy4Ooc+2NmZmYt5uAcz3kjIn65m3VXANdExFoASZcA\nW4BPAndKOgyYC1wQEQ+mNXOApySdFhGPSZoKnAV0RMTjac0C4B5Jn4+Izen6k4CPR8Qg8ISkLwJf\nktQdEW/k2C8zMzNrEXmOoLxP0i8k/UTSbZKOA5B0PMkRle8MF0bEy8CjwPR00akkQ1G25mlgIFMz\nDdg6PJyk7gcCOD1T80Q6nAxbBxwOvD/HPllFbdiwoewWrEDOs36cqeU11gHleyQfpZwFfBY4Hvhu\nen5IG8kQsaXpOVvSdZB8NLQ9HVx2V9MGPJ9dGRE7gBeaakbbDpkaGweWLl1adgtWIOdZP87U8hrT\nRzwRsS7z448kPQb8C/BpoL/Ixsz2xqpVq8puwQrkPOvHmVpe+3TVS0S8BPwf4ARgMyCSoyRZk9N1\npP87MT0XZU81zVf1HAS8s6lmtO2QqdmtmTNn0mg06OzsfHNZV1fXLt+6uX79ehqNxi7PnzdvHitX\nrhyxrK+vj0ajweDg4IjlXV1dLFmyZMSygYEBGo0G/f0jZ7ply5axcOHCEcuGhoZoNBq7HCbt6ekZ\n9f4Cs2bNGlf7MWnSpFrsB9Qjj33dj0mTJtViP6AeeRSxH5MmTarFfkA98tjb/ejp6aHRaDB9+nTa\n2tp2ec88EBQR+Z8s/RbJ+SNfjIgVkjYCX46I69L1h5F89HJJRHw9/fmXJCfJfjOtORF4CpiWniR7\nEvBj4NTMSbIzSK7aeU9EbJb0B8C3gGOGz0ORdDmwBDg6Il7fTb/tQG9vby/t7e309fXR0dEBwG23\n3cZFF12U+3dhZmZWZ5n3zI6I6Nvf2xvTRzySvkwyGPwL8NvAYuB1YPgY3vXAVZKeAZ4FrgGeA+6G\n5KRZSSuBayVtBV4Bvgo8FBGPpTX9ktYBN0r6HDARWAb0pFfwAKwHngRuTS9tPibd1vLdDSdmZmZW\nHWP9iOc9JPco6ScZSn5JcuTjVwARsZRkmLiB5OqddwCfiIjtmdfoBNYCdwEPABtJ7omSdWG6jfvT\n2u8CnxleGRE7gXOAHcDDwC3AzUDXGPfHKq75kKZVm/OsH2dqeY31JNm3vCVgRHQD3XtY/xqwIH3s\nruZF4OK32M7PSYYUG8emTJlSdgtWIOdZP87U8vKt4a3SFizY7ZxrFeQ868eZWl4eUMzMzKzleEAx\nMzOzluMBxSqt+V4AVm3Os36cqeXlAcUqbdGiRWW3YAVynvXjTC0vDyhWacuXLy+7BSuQ86wfZ2p5\neUCxSvMljPXiPOvHmVpeHlDMzMys5XhAMTMzs5bjAcUqrfmbPq3anGf9OFPLywOKVdrQ0FDZLViB\nnGf9OFPLywOKVdrixYvLbsEK5Dzrx5laXh5QzMzMrOV4QDEzM7OW4wHFKm1wcLDsFqxAzrN+nKnl\n5QHFKm3u3Lllt2AFcp7140wtLw8oVmnd3d1lt2AFcp7140wtLw8oVmnt7e1lt2AFcp7140wtLw8o\nZmZm1nI8oJiZmVnL8YBilbZy5cqyW7ACOc/6caaWlwcUq7S+vr6yW7ACOc/6caaWlwcUq7QVK1aU\n3YIVyHnWjzO1vDygmJmZWcvxgGJmZmYtxwOKmZmZtRwPKFZpjUaj7BasQM6zfpyp5eUBxSpt/vz5\nZbdgBXKe9eNMLa99GlAk/aWknZKubVp+taSNkoYk3SfphKb1h0haIWlQ0iuS7pJ0dFPNkZJul/SS\npK2SbpJ0aFPNcZLukbRN0mZJSyV56BpHZsyYUXYLViDnWT/O1PLK/WYu6UPA5cAPmpZfCcxP150G\nbAPWSZqYKbseOBs4HzgDOBb4RtMm7gCmAmemtWcAN2S2MwG4FzgYmAZcClwGXJ13n8zMzKw15BpQ\nJP0WcBvwp8CLTauvAK6JiLUR8SPgEpIB5JPpcw8D5gKdEfFgRDwOzAE+LOm0tGYqcBbwJxHx/Yh4\nGFgAXCCpLd3OWcBJwEUR8URErAO+CMyTdHCe/TIzM7PWkPcIygrgWxHxD9mFko4H2oDvDC+LiJeB\nR4Hp6aJTSY56ZGueBgYyNdOArenwMux+IIDTMzVPRMRgpmYdcDjw/pz7ZRWzZs2asluwAjnP+nGm\nlteYBxRJFwAfBL4wyuo2kiFiS9PyLek6gMnA9nRw2V1NG/B8dmVE7ABeaKoZbTtkaqzmenp6ym7B\nCuQ868eZWl5jGlAkvYfk/JGLIuL1/dPS/jdz5kwajQadnZ1vLuvq6tpl0l+/fv2ol8jNmzdvly/A\n6uvro9FoMDg4OGJ5V1cXS5YsGbFsYGCARqNBf3//iOXLli1j4cKFI5YNDQ3RaDTYsGHDiOU9PT3M\nmTNnl95mzZo1rvZj9erVtdgPqEce+7ofq1evrsV+QD3yKGI/Vq9eXYv9gHrksbf70dPTQ6PRYPr0\n6bS1te3ynnkgKCL2vlj6Q+B/ATsApYsPIjlqsoPknJBngA9GxA8zz3sAeDwiOiV9nOTjmiOzR1Ek\nPQtcFxFfkTQH+KuIeFdm/UHAq8CnIuJuSYuBcyOiPVPzXuCnwCkRMeLk3XR9O9Db29tLe3s7fX19\ndHR0AHDbbbdx0UUX7fXvwszMbDzJvGd2RMR+/xbIsX7Ecz9wMslHPB9IH98nOWH2AxHxU2AzyZU3\nwJsnxZ4OPJwu6gXeaKo5EZgCPJIuegQ4QtIpmW2fSTIUPZqpOVnSUZmaGcBLwJNj3C8zMzNrIWO6\n2iUittH05i9pG/CriHgqXXQ9cJWkZ4BngWuA54C709d4WdJK4FpJW4FXgK8CD0XEY2lNv6R1wI2S\nPgdMBJYBPRGxOd3O+rSXW9NLm49Jt7W8yh8/mZmZWTF3kh3xGVFELCUZJm4gOdrxDuATEbE9U9YJ\nrAXuAh4ANpLcEyXrQqCf5KjNWuC7wGcy29kJnEPy0dLDwC3AzUBXAftkFTHa56hWXc6zfpyp5bXP\n9wuJiN8bZVk30L2H57xGcl+TBXuoeRG4+C22/XOSIcXGKd+lsl6cZ/04U8vLt4VPNZ89bdUwe/bs\nsluwAjnP+nGmlpcHFAAmsHDhlQwMDJTdiJmZmeEBJbWT119/zUdRzMzMWoQHFKu05hsQWbU5z/px\nppaXBxSrtKVLl5bdghXIedaPM7W8PKBYpa1atarsFqxAzrN+nKnl5QHFKm3SpEllt2AFcp7140wt\nLw8oZmZm1nI8oJiZmVnL8YBildb89eJWbc6zfpyp5eUBxSptypQpZbdgBXKe9eNMLS8PKFZpCxbs\n9uucrIKcZ/04U8vLA4qZmZm1HA8oZmZm1nI8oFil9ff3l92CFch51o8ztbw8oFilLVq0qOwWrEDO\ns36cqeXlAcUqbfny5WW3YAVynvXjTC0vDyhWab6EsV6cZ/04U8vLA4qZmZm1HA8oZmZm1nI8oFil\nLVmypOwWrEDOs36cqeXlAcUqbWhoqOwWrEDOs36cqeXlAcUqbfHixWW3YAVynvXjTC0vDyhmZmbW\ncjygmJmZWcvxgGKVNjg4WHYLViDnWT/O1PLygGKVNnfu3LJbsAI5z/pxppaXBxSrtO7u7rJbsAI5\nz/pxppbXmAYUSZ+V9ANJL6WPhyX9QVPN1ZI2ShqSdJ+kE5rWHyJphaRBSa9IukvS0U01R0q6Pd3G\nVkk3STq0qeY4SfdI2iZps6SlkjxwjTPt7e1lt2AFcp7140wtr7G+of8cuBJoBzqAfwDuljQVQNKV\nwHzgcuA0YBuwTtLEzGtcD5wNnA+cARwLfKNpO3cAU4Ez09ozgBuGV6aDyL3AwcA04FLgMuDqMe6P\nmZmZtaAxDSgRcU9EfDsifhIRz0TEVcD/JRkSAK4AromItRHxI+ASkgHkkwCSDgPmAp0R8WBEPA7M\nAT4s6bS0ZipwFvAnEfH9iHgYWABcIKkt3c5ZwEnARRHxRESsA74IzJN0cN5fhpmZmbWG3B+JSJog\n6QJgEvCwpOOBNuA7wzUR8TLwKDA9XXQqyVGPbM3TwECmZhqwNR1eht0PBHB6puaJiMieHr4OOBx4\nf959supZuXJl2S1YgZxn/ThTy2vMA4qkfyfpFeA14GvAH6VDRhvJELGl6Slb0nUAk4Ht6eCyu5o2\n4PnsyojYAbzQVDPadsjU2DjQ19dXdgtWIOdZP87U8spzBKUf+ADJOSZ/Ddwi6aRCu9rPZs6cSaPR\noLOzc8Tyf/zHfxzx8/r162k0Grs8f968ebv8q6Cvr49Go7HLNf9dXV27fFnWwMAAjUaD/v7+EcuX\nLVvGwoULRywbGhqi0WiwYcOGEct7enqYM2fOLr3NmjWLNWvWjJv9WLFiRS32A+qRx77ux4oVK2qx\nH1CPPIrYjxUrVtRiP6AeeeztfvT09NBoNJg+fTptbW2jvmfub4qIfXsB6T7gGWAp8BPggxHxw8z6\nB4DHI6JT0sdJPq45MnsURdKzwHUR8RVJc4C/ioh3ZdYfBLwKfCoi7pa0GDg3ItozNe8FfgqcEhE/\n2E2v7UBvb28v7e3t9PX10dHR8eb64eVmZmY2UuY9syMi9vuhsSIuy50AHBIRPwM2k1x5A7x5Uuzp\nwMPpol7gjaaaE4EpwCPpokeAIySdktnGmYBIzmcZrjlZ0lGZmhnAS8CTBeyTmZmZlWhMV7xI+m/A\n35Oc1PqvgIuAj5IMB5BcQnyVpGeAZ4FrgOeAuyE5aVbSSuBaSVuBV4CvAg9FxGNpTb+kdcCNkj4H\nTASWAT0RsTndznqSQeTW9NLmY9JtLY+I18f8WzAzM7OWMtYjKEcD/5PkPJT7Se6FMiMi/gEgIpaS\nDBM3kBzteAfwiYjYnnmNTmAtcBfwALCR5J4oWRdmtrEW+C7wmeGVEbETOAfYQXJ05hbgZqBrjPtj\nFTfaZ7xWXc6zfpyp5TWmIygR8ad7UdMNdO9h/Wsk9zVZsIeaF4GL32I7PycZUmwcmz9/ftktWIGc\nZ/04U8u9ymd7AAATXUlEQVTLt4a3SpsxY8ZbF1llOM/6caaWlwcUMzMzazkeUMzMzKzleECxSmu+\nOZJVm/OsH2dqeXlAsUrr6ekpuwUrkPOsH2dqeXlAsUpbvXp12S1YgZxn/ThTy8sDipmZmbUcDyhm\nZmbWcjygmJmZWcvxgGKVNtpXhlt1Oc/6caaWlwcUqzTfpbJenGf9OFPLywOKVdrs2bPLbsEK5Dzr\nx5laXh5QzMzMrOV4QDEzM7OW4wHFKm3Dhg1lt2AFcp7140wtLw8oVmlLly4tuwUrkPOsH2dqeXlA\nsUpbtWpV2S1YgZxn/ThTy8sDilXapEmTym7BCuQ868eZWl4eUMzMzKzleEAxMzOzluMBxSpt4cKF\nZbdgBXKe9eNMLS8PKFZpU6ZMKbsFK5DzrB9nanl5QMnYtGlT2S3YGC1YsKDsFqxAzrN+nKnl5QHl\nTRM477w/ZmBgoOxGzMzMxj0PKG/ayfbtv2ZwcLDsRszMzMY9DyhWaf39/WW3YAVynvXjTC0vDyhW\naYsWLSq7BSuQ86wfZ2p5eUCxSlu+fHnZLViBnGf9OFPLa0wDiqQvSHpM0suStkj6pqTfGaXuakkb\nJQ1Juk/SCU3rD5G0QtKgpFck3SXp6KaaIyXdLuklSVsl3STp0Kaa4yTdI2mbpM2Slkry0DWO+BLG\nenGe9eNMLa+xvpl/BFgGnA78PvA2YL2kdwwXSLoSmA9cDpwGbAPWSZqYeZ3rgbOB84EzgGOBbzRt\n6w5gKnBmWnsGcENmOxOAe4GDgWnApcBlwNVj3CczMzNrMQePpTgiZmZ/lnQZ8DzQAWxIF18BXBMR\na9OaS4AtwCeBOyUdBswFLoiIB9OaOcBTkk6LiMckTQXOAjoi4vG0ZgFwj6TPR8TmdP1JwMcjYhB4\nQtIXgS9J6o6IN8b6yzAzM7PWsK8fhxwBBPACgKTjgTbgO8MFEfEy8CgwPV10KslglK15GhjI1EwD\ntg4PJ6n7022dnql5Ih1Ohq0DDgfev4/7ZRWxZMmSsluwAjnP+nGmllfuAUWSSD6q2RART6aL20iG\niC1N5VvSdQCTge3p4LK7mjaSIzNviogdJINQtma07ZCpsZobGhoquwUrkPOsH2dqee3LEZSvAf8W\nuKCgXg6YmTNn0mg06Ozs3GPd+vXraTQauyyfN28eK1euHLGsr6+PRqOxy43eurq6dvkXxMDAAI1G\nY5f7AyxbtmyXL9YaGhqi0WiwYcOGEct7enqYM2fOLr3NmjWLNWvWjJv9WLx4cS32A+qRx77ux+LF\ni2uxH1CPPIrYj8WLF9diP6AeeeztfvT09NBoNJg+fTptbW179Z5ZNEXE2J8kLQfOBT4SEQOZ5ccD\nPwE+GBE/zCx/AHg8IjolfZzk45ojs0dRJD0LXBcRX0nPSfmriHhXZv1BwKvApyLibkmLgXMjoj1T\n817gp8ApEfGDUfpuB3p7e3tpb2+nr6+Pjo6OETXD68zMzOw3Mu+ZHRHRt7+3N+YjKOlw8ockJ6eO\n+OKaiPgZsJnkypvh+sNIzht5OF3UC7zRVHMiMAV4JF30CHCEpFMyL38mIJLzWYZrTpZ0VKZmBvAS\n8CRmZmZWWWO9D8rXgIuAC4Ftkianj7dnyq4HrpJ0rqSTgVuA54C74c2TZlcC10r6mKQO4G+AhyLi\nsbSmn+SE1xslfUjSh0kub+5Jr+ABWE8yiNwq6d9LOgu4BlgeEa/n+F1YBfm7k+rFedaPM7W8xnoE\n5bPAYcADwMbM49PDBRGxlGSYuIHkaMc7gE9ExPbM63QCa4G7Mq91ftO2LgT6ST4OWgt8F/hMZjs7\ngXOAHSRHZ24Bbga6xrhPVmFz584tuwUrkPOsH2dqeY31Pih7NdBERDfQvYf1rwEL0sfual4ELn6L\n7fycZEixcaq7u7vsFqxAzrN+nKnl5dvCW6X5hOZ6cZ7140wtLw8oZmZm1nI8oJiZmVnL8YBildZ8\nwyOrNudZP87U8vKAYpXW17ff7xVkB5DzrB9nanl5QLFKW7FiRdktWIGcZ/04U8vLA4qZmZm1HA8o\nZmZm1nI8oJiZmVnL8YBilTba15lbdTnP+nGmlpcHFKu0+fPnl92CFch51o8ztbw8oDTZtGlT2S3Y\nGMyYMaPsFqxAzrN+nKnl5QFlhAmcd94fMzAwUHYjZmZm45oHlBF2sn37rxkcHCy7ETMzs3HNA4pV\n2po1a8puwQrkPOvHmVpeHlCs0np6espuwQrkPOvHmVpeHlCs0lavXl12C1Yg51k/ztTy8oBiZmZm\nLccDipmZmbUcDyhmZmbWcjygWKXNmTOn7BasQM6zfpyp5eUBxSrNd6msF+dZP87U8vKAYpU2e/bs\nsluwAjnP+nGmlpcHFDMzM2s5HlDMzMys5YzrAcXfXFx9GzZsKLsFK5DzrB9nanmN2wFlYGCA8877\nVNlt2D5aunRp2S1YgZxn/ThTy2vMA4qkj0j6O0m/kLRTUmOUmqslbZQ0JOk+SSc0rT9E0gpJg5Je\nkXSXpKObao6UdLuklyRtlXSTpEObao6TdI+kbZI2S1oqaa/2aXBwkO3bXx3r7luLWbVqVdktWIGc\nZ/04U8srzxGUQ4H/Dfw5EM0rJV0JzAcuB04DtgHrJE3MlF0PnA2cD5wBHAt8o+ml7gCmAmemtWcA\nN2S2MwG4FzgYmAZcClwGXJ1jn6yiJk2aVHYLViDnWT/O1PI6eKxPiIhvA98GkKRRSq4AromItWnN\nJcAW4JPAnZIOA+YCF0TEg2nNHOApSadFxGOSpgJnAR0R8XhaswC4R9LnI2Jzuv4k4OMRMQg8IemL\nwJckdUfEG2PdNzMzM2sNhZ6DIul4oA34zvCyiHgZeBSYni46lWQwytY8DQxkaqYBW4eHk9T9JEds\nTs/UPJEOJ8PWAYcD7y9ol8zMzKwERZ8k20YyRGxpWr4lXQcwGdieDi67q2kDns+ujIgdwAtNNaNt\nh0yN1dzChQvLbsEK5Dzrx5laXuP2Kh6rhylTppTdghXIedaPM7W8ih5QNgMiOUqSNTldN1wzMT0X\nZU81zVf1HAS8s6lmtO2QqRnVzJkz6ezs3FMJAOvXr6fR2OUiJebNm8fKlStHLOvr66PRaDA4ODhi\neVdXF0uWLBmxbGBggEajQX9//4jly5Yt2+VfG0NDQzQajV3uJdDT0zPql3DNmjWLNWvWjJv9WLBg\nQS32A+qRx77ux4IFC2qxH1CPPIrYjwULFtRiP6AeeeztfvT09NBoNJg+fTptbW00Go29et8skiJ2\nuRBn758s7QQ+GRF/l1m2EfhyRFyX/nwYyUcvl0TE19Off0lykuw305oTgaeAaelJsicBPwZOzZwk\nO4Pkqp33RMRmSX8AfAs4Zvg8FEmXA0uAoyPi9VH6bQd6e3t7Aejo6Bh1v3p7e2lvb8/9ezEzM6ub\nvr6+4ffNjojo29/bG/NVPOm9SE4gOVIC8G8kfQB4ISJ+TnIJ8VWSngGeBa4BngPuhuSkWUkrgWsl\nbQVeAb4KPBQRj6U1/ZLWATdK+hwwEVgG9KRX8ACsB54Ebk0vbT4m3dby0YYTMzMzq448H/GcCjwO\n9JKcEPvfgT5gMUBELCUZJm4guXrnHcAnImJ75jU6gbXAXcADwEaSe6JkXQj0k1y9sxb4LvCZ4ZUR\nsRM4B9gBPAzcAtwMdOXYJ6uo5sOcVm3Os36cqeU15gElIh6MiAkRcVDTY26mpjsijo2ISRFxVkQ8\n0/Qar0XEgog4KiL+VUT8cUQ0X7XzYkRcHBGHR8SREfFnETHUVPPziDgnIn4rIiZHxJXp4GLjxKJF\ni8puwQrkPOvHmVpevorHKm358uVlt2AFcp7140wtLw8oVmm+hLFenGf9OFPLa1wOKMNX8ZiZmVlr\nGpcDyv/4HzeW3YKZmZntwbgcUH5zhbRVXfNNjKzanGf9OFPLa5wOKLBp06Zc66y1DA0NvXWRVYbz\nrB9nannt051kq2b4TrInn/xBnn66n+3bXx2lagITJx7CP/9zv0/uMjMzSx3oO8mOyyMov/710G6G\nE4CdbN/+612+E8HMzMwOnHE5oDzzzDNvXWRmZmalGZcDCvhms3XhI1314jzrx5laXuN0QLG6mDt3\n7lsXWWU4z/pxppaXBxSrtO7u7rJbsAI5z/pxppaXBxSrtPb29rJbsAI5z/pxppaXBxQzMzNrOR5Q\nzMzMrOV4QLFKW7lyZdktWIGcZ/04U8vLA4pVWl/ffr+ZoR1AzrN+nKnlNS5vdb83tWvXruXss8/e\nzx2ZmZlVg2913xImcN55f8wjjzzCwMBA2c2YmZmNOx5QRpV8H89HP/p7nHjiVA8pZmZmB5gHlN0S\nr7/+Kq++OuRbNZuZmR1gHlB2a/ycm1NljUaj7BasQM6zfpyp5eUBxSpt/vz5ZbdgBXKe9eNMLS8P\nKFZpM2bMKLsFK5DzrB9nanl5QDEzM7OW4wHFzMzMWo4HFKu0NWvWlN2CFch51o8ztbw8oFilLVmy\npOwWrEDOs36cqeVV+QFF0jxJP5P0a0nfk/ShsnuyA+fd73532S1YgZxn/ThTy6vSA4qkWcB/B7qA\nU4AfAOskHVXkdh588EHfTdbMzOwAqvSAAnQCN0TELRHRD3wWGALmFrcJ8Rd/8Xne977f4Zvf/KYH\nFTMzswPg4LIbyEvS24AO4L8NL4uIkHQ/ML24LQUQbN++nfPO+xQTJ76NVat6aGtre7Pit3/7t9/8\n/1OmTClu02ZmZuNUZQcU4CjgIGBL0/ItwIm7ec7b829ueFB5g/PO+zS/uRW+OOSQQ5gwYQKS+NKX\n/l+OOqrQT5hsDx566CF6enrKbsMK4jzrx5nuX+985zsP2Hk+Tz311PD/3Yf30r2niGp+54ykY4Bf\nANMj4tHM8iXAGRGxy1EUSRcCtx+4Ls3MzGrnooi4Y39vpMpHUAaBHcDkpuWTgc27ec464CLgWeDV\n/daZmZlZ/bwdeC/Je+l+V9kjKACSvgc8GhFXpD8LGAC+GhFfLrU5MzMzy63KR1AArgVultQLPEZy\nVc8k4OYymzIzM7N9U+kBJSLuTO95cjXJRzv/GzgrIn5ZbmdmZma2Lyr9EY+ZmZnVU9Vv1GZmZmY1\n5AHFzMzMWs64GVD8pYLVIKlL0s6mx5NNNVdL2ihpSNJ9kk5oWn+IpBWSBiW9IukuSUcf2D0ZnyR9\nRNLfSfpFml1jlJp9zk/SkZJul/SSpK2SbpJ06P7ev/HmrfKU9Lej/Hm9t6nGebYISV+Q9JiklyVt\nkfRNSb8zSl1L/BkdFwPKgfpSQSvMj0hOem5LH787vELSlcB84HLgNGAbSZYTM8+/HjgbOB84AzgW\n+MYB6dwOJTlZ/c/5ze2W31RgfncAU4Ez09ozgBuK3BED3iLP1N8z8s/r7Kb1zrN1fARYBpwO/D7w\nNmC9pHcMF7TUn9GIqP0D+B7wlczPAp4DFpXdmx+7ZNUF9O1h/UagM/PzYcCvgU9nfn4N+KNMzYnA\nTuC0svdvPD3S33mj6PzSv/R2Aqdkas4C3gDayt7vuj52k+ffAv9rD89xni38IPnKmJ3A72aWtcyf\n0dofQcl8qeB3hpdF8tsq+EsFrUDvSw8p/0TSbZKOA5B0PMm/0LJZvgw8ym+yPJXk8vlszdMkN/Bz\n3iUqML9pwNaIeDzz8veT/Av/9P3Vv+3Wx9KPC/olfU3SOzPrOnCerewIkt/zC9B6f0ZrP6Cw5y8V\nbNu13Er2PeAykmn7s8DxwHfTzy7bSP4D31OWk4Ht6R+q3dVYOYrKrw14PrsyInaQ/CXrjA+svwcu\nAX4PWAR8FLg3vas3JHk4zxaUZnQ9sCEihs/za6k/o5W+UZvVT0Rkv+PhR5IeA/4F+DTQX05XZjaa\niLgz8+OPJT0B/AT4GPCPpTRle+trwL8FPlx2I7szHo6g5PlSQWsREfES8H+AE0jyEnvOcjMwUdJh\ne6ixchSV32ag+YqBg4B34oxLFRE/I/k7d/iqD+fZgiQtB2YCH4uITZlVLfVntPYDSkS8DvSSnEkM\nvHlo60zg4bL6sr0j6bdI/rLbmP7lt5mRWR5G8pnmcJa9JCdiZWtOBKYAjxygtm0UBeb3CHCEpFMy\nL38myV+sj+6v/u2tSXoP8C5g+E3PebaYdDj5Q+DjETGQXddyf0bLPov4AJ2p/GlgiOSz0pNILnX6\nFfDusnvzY5esvkxyOdq/Bv4DcB/JZ5vvStcvSrM7FzgZWAP8MzAx8xpfA35Gcpi5A3gI+Key9208\nPEguS/0A8EGSs/j/c/rzcUXmB9wLfB/4EMkh6qeBW8ve/7o99pRnum4pyZvXv07fgL4PPAW8zXm2\n3iPNYivJ5caTM4+3Z2pa5s9o6b+wAxjMnwPPklwu9Qhwatk9+TFqTj0kl4D/muSs8DuA45tqukku\nhRsC1gEnNK0/hORa/0HgFeDrwNFl79t4eJCcJLmT5GPV7ONvisyP5OqD24CX0r9wbwQmlb3/dXvs\nKU/g7cC3Sf7F/SrwU+CvafqHn/NsncdustwBXNJU1xJ/Rv1lgWZmZtZyan8OipmZmVWPBxQzMzNr\nOR5QzMzMrOV4QDEzM7OW4wHFzMzMWo4HFDMzM2s5HlDMzMys5XhAMTMzs5bjAcXMzMxajgcUMzMz\nazkeUMzMzKzl/P/wqlg1pftJ/gAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "segments['seg_length'].hist(bins=500)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Though most of the transits appear to be short, there are a few longer distances that make the plot difficult to read. \n", "\n", "This is where a transformation is useful:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhsAAAFkCAYAAACJu/k0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3X90XOV95/H3V4BNnV1CWhKUpPU2KaC62zaNzY+42UDA\nWZOQZgDx08FNazchEJvD8S42SZscDHQb7G6BxHZTSpzNpjYyLF2cQH6YkBBagwtEouA2NpANYQjI\nDpNg4CBkWdazf9w78mj0w5I8917N8/28ztE50p1nZr4fP7L01b3PvddCCIiIiIhkpaXoAkRERCRu\najZEREQkU2o2REREJFNqNkRERCRTajZEREQkU2o2REREJFNqNkRERCRTajZEREQkU2o2REREJFNq\nNkRERCRTE242zOx9ZvYNM3vezAbMrDTCmOvM7AUz6zGz75rZcXWPTzezdWZWMbNXzexOM3tL3Zg3\nmdlGM3vZzF4ysy+b2RvqxvyGmX3TzF4zs11mttrM1ECJiIhMIZP5xfwG4F+BTwHDbqxiZlcDS4FL\ngZOB14AtZjatZtjNwIeB84BTgbcB/1j3UrcBs4B56dhTgVtq3qcF+BZwOPAe4E+APwWum0QmERER\nyYgdyo3YzGwAOCeE8I2abS8Afx1CuCn9+ihgN/AnIYQ70q9fBC4OIdyVjmkDdgDvCSE8YmazgH8H\n5oQQHkvHnAl8E/j1EMIuM/sQ8A3grSGESjrmk8ANwJtDCP2TDiYiIiIN09BDDmb2DqAV+F51Wwjh\nFeBhYG666USSvRG1Y54EyjVj3gO8VG00UveR7Ek5pWbM9mqjkdoCvBH4zw2KJCIiIofo8Aa/XitJ\nQ7C7bvvu9DGAY4G+tAkZbUwr8PPaB0MI+83sl3VjRnqf6mOP1xdnZr8GnAn8FOg9eBwRERFJHQn8\nJrAlhPCLiTyx0c3GVHcmsLHoIkRERJrYJSTrKset0c3GLsBI9l7U7nU4FnisZsw0Mzuqbu/Gselj\n1TH1Z6ccBvxq3ZiT6t7/2JrHRvJTgA0bNjBr1qxxxGley5Yt46abbiq6jMwpZ1y85AQ/WZUzHjt2\n7GDhwoWQ/i6diIY2GyGEZ8xsF8kZJE/A4ALRU4B16bBOoD8dU7tAdCawLR2zDTjazN5ds25jHkkj\n83DNmD83s2Nq1m3MB14GfjRKib0As2bNYvbs2YeYdmp74xvfGH1GUM7YeMkJfrIqZ5QmvAxhws1G\neq2L40h+8QO808zeBfwyhPAcyWmtnzWzH5N0P9cDPwO+DsmCUTNbD9xoZi8BrwJfBB4MITySjtlp\nZluAW83scmAasAboCCFU91rcS9JU/EN6uu1b0/daG0LYN9Fcsdm1a7SdO3FRzrh4yQl+siqnwOT2\nbJwI3E+yEDQAf5Nu/9/A4hDCajObQXJNjKOBfwY+FELoq3mNZcB+4E5gOvAdYEnd+3wUWEtyFspA\nOvbK6oMhhAEz+yPgS8BDJNfz+CpwzSQyRef5558vuoRcKGdcvOQEP1mVU2ASzUYI4QEOcspsCGEl\nsHKMx/cCV6Qfo43ZAyw8yPs8B/zRWGO8mjNnTtEl5EI54+IlJ/jJqpwCujdKtBYsWFB0CblQzrh4\nyQl+siqnwCFeQbTZmNlsoLOzs9PTQh4REZFD1tXVVd2DMyeE0DWR52rPhoiIiGRKzUakFi1aVHQJ\nuVDOuHjJCX6yKqeAmo1ozZ8/v+gScqGccfGSE/xkVU4BrdkQERGRcdCaDREREZmy1GyIiIhIptRs\nRGrr1q1Fl5AL5YyLl5zgJ6tyCqjZiNbq1auLLiEXyhkXLznBT1blFNAC0Wj19PQwY8aMosvInHLG\nxUtO8JNVOeOhBaIyTOzf9FXKGRcvOcFPVuUUULMhIiIiGVOzITLFlctlyuVy0WWIiEyamo1ILV++\nvOgSchF7znK5TFvbLN75zt9y0XDEPp+1vGRVTgE1G9GaOXNm0SXkIvaclUqF3t4e9u/vp1KpFF1O\n5mKfz1pesiqngM5GEZnSalZ/o+9bESmSzkYRERGRKUvNhoiIiGRKzUakdu7cWXQJufCS0wtP8+kl\nq3IKqNmI1ooVK4ouIRex53zxxReLLiFXsc9nLS9ZlVNAzUa01q5dW3QJuYg9p4czUGrFPp+1vGRV\nTgE1G9HychqWl5xeeJpPL1mVU0DNhoiIiGRMzYaIiIhkSs1GpFatWlV0CbnwktMLT/PpJatyCqjZ\niFZPT0/RJeTCS04vPM2nl6zKKaDLlYtMaRs3bmThwoWALlcuIsXS5cpFRERkylKzISIiIplSsxEp\nLxeD8pLTC0/z6SWrcgqo2YjW4sWLiy4hF15yeuFpPr1kVU4BNRvRWrlyZdEl5MJLTi88zaeXrMop\noGYjWl7OWvCS0wtP8+klq3IKqNkQERGRjKnZEBERkUyp2YjU+vXriy4hF15yeuFpPr1kVU4BNRvR\n6uqa0MXdmpaXnF54mk8vWZVTQJcrF5nSdLlyEZkqdLlyERERmbLUbIiIiEim1GyINKlyuUy5XC66\nDBGRg1KzEalSqVR0CbnwkrNeuVymrW0WbW2zomo4PM2nl6zKKaBmI1pLly4tuoRceMlZr1Kp0Nvb\nQ29vT1Q3gPI0n16yKqeAmo1ozZ8/v+gScuElpxee5tNLVuUUULMhIiIiGVOzISIiIplSsxGpzZs3\nF11CLrzk9MLTfHrJqpwCajai1dHRUXQJufCS0wtP8+klq3IKqNmI1u233150CbnwktMLT/PpJaty\nCqjZEBERkYw1vNkwsxYzu97MfmJmPWb2YzP77AjjrjOzF9Ix3zWz4+oen25m68ysYmavmtmdZvaW\nujFvMrONZvaymb1kZl82szc0OpOIiIhMXhZ7Nj4NfBL4FPDbwApghZkNXvHEzK4GlgKXAicDrwFb\nzGxazevcDHwYOA84FXgb8I9173UbMAuYl449Fbil8ZFERERksrJoNuYCXw8hfCeEUA4h/F/gXpKm\noupK4PoQwj0hhH8DPkbSTJwDYGZHAYuBZSGEB0IIjwGLgPea2cnpmFnAmcCfhRB+GEJ4CLgCuNjM\nWjPI1VQWLVpUdAm58JLTC0/z6SWrcgpk02w8BMwzs+MBzOxdwHuBb6VfvwNoBb5XfUII4RXgYZJG\nBeBE4PC6MU8C5Zox7wFeShuRqvuAAJzS8FRNxsvV7Lzk9MLTfHrJqpwCyS/0RrsBOArYaWb7SRqa\nvwghbEofbyVpCHbXPW93+hjAsUBf2oSMNqYV+HntgyGE/Wb2y5oxbi1YsKDoEnLhJacXnubTS1bl\nFMhmz8ZFwEeBi4F3A38CLDezP87gvSblrLPOolQqDfmYO3fusIuy3HvvvSPeyW/JkiWsX79+yLau\nri5KpdKwm2Jdc801rFq1asi2crlMqVRi586dQ7avWbOG5cuXD9nW09NDqVRi69atQ7Z3dHSMuNvu\noosuUo6IcjzzzDPDxgH83d/93bBtUzlHLPOhHMrhJUdHR8fg78bW1lZKpRLLli0b9pzxshDCpJ88\n4gualYHPhxC+VLPtL4BLQgi/kx5G+X/AH4QQnqgZ8wPgsRDCMjM7neSQyJtq926Y2U+Bm0IIXzCz\nRcD/DCH8Ws3jhwG9wPkhhK+PUNtsoLOzs5PZs2c3NLdIFjZu3MjChQsBqP2+7erqYs6cOcO2i4hk\npebnzpwQQtdEnpvFno0ZwP66bQPV9wohPAPsIjmDBBhcEHoKyXoPgE6gv25MGzAT2JZu2gYcbWbv\nrnmfeYCRrP9wrb6TjZWXnF54mk8vWZVTIJtm427gs2Z2lpn9JzM7F1gG/N+aMTenYz5iZr8HfA34\nGfB1GFwwuh640czeb2ZzgK8AD4YQHknH7AS2ALea2Ulm9l5gDdARQtiVQa6msnr16qJLyIWXnF54\nmk8vWZVTIJsFokuB64F1wFuAF4AvpdsACCGsNrMZJNfEOBr4Z+BDIYS+mtdZRrKH5E5gOvAdYEnd\ne30UWEtyyGUgHXtl4yM1n02bNh18UAS85ATo7u4uuoTMeZpPL1mVUyCDZiOE8Brw39KPscatBFaO\n8fhekutmXDHGmD3AwsnUGbsZM2YUXUIuvOSEFtrbL+Dpp3cyc+bMoovJjJ/59JNVOQV0bxSRJjFA\nX9/rw1ari4g0AzUbIiIikik1G5GqP986Vl5yeuFpPr1kVU4BNRvRivm4fi0vOb3wNJ9esiqngJqN\naF1xxajraqPiJacXnubTS1blFFCzISIiIhlTsyEiIiKZUrMRqfqb9MTKS04vPM2nl6zKKaBmI1or\nVqwouoRcxJ7T23U1Yp/PWl6yKqeAmo1orV27tugSchFzznK5zFVX+foBFvN81vOSVTkF1GxEy8tp\nWDHnrFQq9Pf3HXxgRGKez3pesiqngJoNERERyZiaDREREcmUmo1IrVq1qugScuElpxee5tNLVuUU\nULMRrZ6enqJLyIWXnF54mk8vWZVTACyEUHQNuTGz2UBnZ2cns2fPLrockTF1dXUxZ86cIduq37u1\nj+n7WUTyUPNzZ04IoWsiz9WeDREREcmUmg0RERHJlJqNSHm58qSXnF54mk8vWZVTQM1GtBYvXlx0\nCbnwktMLT/PpJatyCqjZiNbKlSuLLiEXXnJ64Wk+vWRVTgE1G9HycnaCl5xeeJpPL1mVU0DNhoiI\niGRMzYaIiIhkSs1GpNavX190CbnwktMLT/PpJatyCqjZiFZX14Qu7ta0vOT0wtN8esmqnAJqNqK1\nbt26okvIhZecXniaTy9ZlVMADi+6ABEZqlwuF12CiEhDac+GyBRSLpdpa5tFW9ssuru7iy5HRKQh\ntGdDZAqpVCr09ia3qt6zZ0/B1YiINIb2bESqVCoVXUIuvOT0wtN8esmqnAJqNqK1dOnSokvIhZec\nXniaTy9ZlVNAzUa05s+fX3QJufCS0wtP8+klq3IKqNkQERGRjKnZEBERkUyp2YjU5s2biy4hF15y\neuFpPr1kVU4BNRvR6ujoKLqEXHjJ6YWn+fSSVTkF1GxE6/bbby+6hFx4yemFp/n0klU5BdRsiIiI\nSMbUbIiIiEim1GyIiIhIptRsRGrRokVFl5ALLzm98DSfXrIqp4CajWh5uZqdl5yHolwuN81t6z3N\np5esyimgZiNaCxYsKLqEXHjJOVm1t6xvhobD03x6yaqcAmo2RJpKd3f3hJqG6i3re3t7qFQqGVYm\nIjI6NRsiTaS9/QLa2mbR3d1ddCkiIuOmZiNSW7duLbqEXHjJWdXX9zq9vT3s2bOn6FIy4Wk+vWRV\nTgE1G9FavXp10SXkwkvOhBVdQOY8zaeXrMopoGYjWps2bSq6hFx4yZkIRReQOU/z6SWrcgqo2YjW\njBkzii4hF15yeuFpPr1kVU4BNRsiTUlnlohIM8mk2TCzt5nZP5hZxcx6zOxxM5tdN+Y6M3shffy7\nZnZc3ePTzWxd+hqvmtmdZvaWujFvMrONZvaymb1kZl82szdkkUlk6mjhqquuLroIEZFxa3izYWZH\nAw8Ce4EzgVnAfwdeqhlzNbAUuBQ4GXgN2GJm02pe6mbgw8B5wKnA24B/rHu729LXn5eOPRW4pdGZ\nmtHy5cuLLiEXXnIONUB//96ii8iEp/n0klU5BeDwDF7z00A5hPDxmm3P1o25Erg+hHAPgJl9DNgN\nnAPcYWZHAYuBi0MID6RjFgE7zOzkEMIjZjaLpJmZE0J4LB1zBfBNM7sqhLArg2xNY+bMmUWXkAsv\nOb3wNJ9esiqnQDaHUT4C/NDM7jCz3WbWZWaDjYeZvQNoBb5X3RZCeAV4GJibbjqRpBGqHfMkUK4Z\n8x7gpWqjkbqPZMn+KQ1P1WSuuOKKokvIhZecXniaTy9ZlVMgm2bjncDlwJPAfOBLwBfN7I/Tx1tJ\nGoLddc/bnT4GcCzQlzYho41pBX5e+2AIYT/wy5oxIiIiUrAsmo0WoDOE8LkQwuMhhFuBW4HLMniv\nSTnrrLMolUpDPubOncvmzZuHjLv33nsplUrDnr9kyRLWr18/ZFtXVxelUmnYWQLXXHMNq1atGrKt\nXC5TKpXYuXPnkO1r1qwZdtyvp6eHUqk07Op0HR0dI97S+KKLLlKOJs5RP3YiRssxEs2HciiHcoyV\no6OjY/B3Y2trK6VSiWXLlg17zriFEBr6AfwU+Pu6bZcBz6WfvwMYAH6/bswPgJvSz08H9gNHjfDa\nV6afLwJ+Uff4YcA+4OxRapsNhM7OzhC7HTt2FF1CLmLL2dnZGUj2/IUNGzYMfj7Wx1jfz7Wv1wzf\n97HN51i8ZFXOeNT8PJkdJtgbZLFn40GgrW5bG+ki0RDCM8AukjNIAEgXhJ4CPJRu6gT668a0ATOB\nbemmbcDRZvbumveZR3JN54cblKVprVixougScuEl51hefPHFoktoGE/z6SWrcgpkczbKTcCDZvYZ\n4A6SJuLjwCdqxtwMfNbMfkyyt+J64GfA1yFZMGpm64Ebzewl4FXgi8CDIYRH0jE7zWwLcKuZXQ5M\nA9YAHcH5mSgAa9euLbqEXHjJOZaYLvDlaT69ZFVOgQyajRDCD83sXOAG4HPAMySHPjbVjFltZjNI\nrolxNPDPwIdCCH01L7WM5FDKncB04DvAkrq3+yiwluQslIF07JWNztSMvJyGFXPOmJqI8Yp5Put5\nyaqcAtns2SCE8C3gWwcZsxJYOcbje4Er0o/RxuwBFk6qSJEpTVcJFZF46N4oIlNSvFcJFRF/1GxE\nqv50qVh5yemFp/n0klU5BdRsRKunp6foEnLhJacXnubTS1blFAALyfUnXEjvPNvZ2dnJ7NmzDzpe\nJG9dXV3MmTNnQs/ZsGEDl1xyyUFfT9/3InIoan6ezAkhdE3kudqzISIiIplSsyEiIiKZUrMRKS/X\naPCS0wtP8+klq3IKqNmI1uLFi4suIRdeco6lUqlQLpeLLqMhPM2nl6zKKaBmI1orV64suoRceMk5\nluXLr6atbVYUDYen+fSSVTkF1GxEy8tZB15yjmXfvr309vZEsRvX03x6yaqcAmo2RJqUFV2AiMi4\nqdkQaUp+ro8jIs1PzUak1q9fX3QJuYg7p7+9F3HP51BesiqngJqNaHV1Tejibk0r7pz+9l7EPZ9D\necmqnAJqNqK1bt26okvIhZecXniaTy9ZlVNAzYaIiIhkTM2GiBPd3d1FlyAiTqnZEHGhhfb2C6K4\n8JeINB81G5EqlUpFl5CL2HJmt/dhgL6+16f8hb9im8+xeMmqnAJqNqK1dOnSokvIRUw5y+Uy7e3n\nF11GoWKaz4PxklU5BdRsRGv+/PlFl5CLmHJWKhX6+nqLLqNQMc3nwXjJqpwCajZEREQkY2o2RERE\nJFNqNiK1efPmokvIhZecXniaTy9ZlVNAzUa0Ojo6ii4hF15yji6u+6d4mk8vWZVTQM1GtG6//fai\nS8iFl5yji+v+KZ7m00tW5RRQsyEiIiIZU7MhIiIimVKzISIiIplSsxGpRYsWFV1CLrzk9MLTfHrJ\nqpwCajai5eVqdl5yeuFpPr1kVU4BNRvRWrBgQdEl5MJLTi88zaeXrMopoGZDREREMqZmQ0RERDKl\nZiNSW7duLbqEXHjJ6YWn+fSSVTkF1GxEa/Xq1UWXkAsvOb3wNJ9esiqngJqNaG3atKnoEnLhJacX\nnubTS1blFFCzEa0ZM2YUXUIuvOT0wtN8esmqnAJqNkRERCRjajZEREQkU2o2IrV8+fKiS8iFl5xe\neJpPL1mVU0DNRrRmzpxZdAm58JLTC0/z6SWrcgqo2YjWFVdcUXQJufCS0wtP8+klq3IKqNkQERGR\njKnZEBERkUyp2YjUzp07iy4hF15yeuFpPr1kVU4BNRvRWrFiRdEl5MJLTi88zaeXrMopoGYjWmvX\nri26hFx4yemFp/n0klU5BdRsRMvLaVhecnrhaT69ZFVOATUbIiIikjE1GyIiIpKpzJsNM/u0mQ2Y\n2Y11268zsxfMrMfMvmtmx9U9Pt3M1plZxcxeNbM7zewtdWPeZGYbzexlM3vJzL5sZm/IOlMzWLVq\nVdEl5MJLTi88zaeXrMopkHGzYWYnAZcCj9dtvxpYmj52MvAasMXMptUMuxn4MHAecCrwNuAf697i\nNmAWMC8deypwS8ODNKGenp6iS8iFl5yT1d3dPWxbuVymXC4XUM3BeZpPL1mVUwAIIWTyAfwH4Eng\nDOB+4Maax14AltV8fRTwOnBhzdd7gXNrxrQBA8DJ6dez0q/fXTPmTKAfaB2lptlA6OzsDCJTTWdn\nZwAm/VH/ff3ss8+GadOOHDLmnnvuCUceOSMceeSM8OyzzxaUVESaUc3PqNlhgj1Blns21gF3hxC+\nX7vRzN4BtALfq24LIbwCPAzMTTedCBxeN+ZJoFwz5j3ASyGEx2pe/j6Sf4hTGppEpAlVKhX6+nqH\nbNuzZw+9vT309vZQqVQKqkxEvDk8ixc1s4uBPyBpGuq1kjQEu+u2704fAzgW6EubkNHGtAI/r30w\nhLDfzH5ZM0ZEREQK1vA9G2b26yTrLS4JIexr9Os3wllnnUWpVBryMXfuXDZv3jxk3L333kupVBr2\n/CVLlrB+/foh27q6uiiVSsP+WrzmmmuGLRwql8uUSqVhl7dds2YNy5cvH7Ktp6eHUqnE1q1bh2zv\n6Ohg0aJFw2q76KKL2Lx585A6mjlHrZFyVCqVKHIA3HDDDcO2TUR3d/eIOUazbNmyTHIcynxUKpUp\nMx9Zf1/VvkYz56g1Uo5KpRJFDhh7Pr72ta9FkaM6Hx0dHYO/G1tbWymVSixbtmzYc8ZtosddDvYB\nnA3sB/qAfenHQM22d6Zf/37d834A3JR+fno6/qi6MT8Frkw/XwT8ou7xw9L3O3uU2tys2fjIRz5S\ndAm5iCnnoa7ZuOeee4aswxjp9TZs2DDqGo+pIKb5PBgvWZUzHlNtzcZ9wO+RHEZ5V/rxQ2AD8K4Q\nwk+AXSRnkABgZkeRrLN4KN3USbLQs3ZMGzAT2JZu2gYcbWbvrnnveYCRrP9wbeXKlUWXkAsvOcej\nvf0C2tpmjXmmyVRfp+FpPr1kVU6BDNZshBBeA35Uu83MXiPZC7Ej3XQz8Fkz+zHJ3orrgZ8BX09f\n4xUzWw/caGYvAa8CXwQeDCE8ko7ZaWZbgFvN7HJgGrAG6Agh7Gp0rmYze/bsokvIhZec49HX9zqQ\nNBQjXzq5hauuujrfoibI03x6yaqcAhktEB1BGPJFCKvNbAbJNTGOBv4Z+FAIoa9m2DKSQyl3AtOB\n7wBL6l73o8Bakr0pA+nYK7MIINL8Bujv31t0ESLiUC7NRgjhjBG2rQRWjvGcvcAV6cdoY/YACw+9\nQhEREcmK7o0SqfrVzrHyktMLT/PpJatyCqjZiFZXV1fRJeTCS04vPM2nl6zKKaBmI1rr1q0ruoRc\neMnphaf59JJVOQXUbIiIiEjG1GyIiIhIptRsiIiISKbUbERqpGvyx8hLzoOzogtoCE/z6SWrcgqo\n2YjW0qVLiy4hF15yHlw4+JAm4Gk+vWRVTgE1G9GaP39+0SXkwktOLzzNp5esyimgZkPEmTgOt4hI\nc1GzIeJKHIdbRKS5qNmI1ObNm4suIRdecnrhaT69ZFVOATUb0ero6Ci6hFx4yTkZ3d3dRZcwYZ7m\n00tW5RQAC8HPblUzmw10dnZ2Mnv27KLLERmiq6uLOXPmHPLrdHZ2cswxx3D88W309fWOOU7/D0Rk\nvGp+Rs0JIUzoZjDasyESoe3bt4/ZaIiI5EnNhkhkuru7aW8/v+gyREQGqdkQicyePXu0V0NEphQ1\nG5FatGhR0SXkwkvOiahUKkWXMGme5tNLVuUUULMRLS9Xs/OScyKuuurqokuYNE/z6SWrcgrobBSR\nKaNRZ6OMl/4fiMhE6GwUERERmbLUbIiIiEim1GxEauvWrUWXkAsvOb3wNJ9esiqngJqNaK1evbro\nEnLhJacXnubTS1blFFCzEa1NmzYVXUIuvOT0wtN8esmqnAJqNqI1Y8aMokvIhZecXniaTy9ZlVNA\nzYaIiIhkTM2GiIiIZErNRqSWL19edAm58JLTC0/z6SWrcgqo2YjWzJkziy4hF15yeuFpPr1kVU4B\nXa5cZMoo6nLl5XIZ0A9LERmbLlcuIpNSLpdpa5tFW9uswaZDRKTR1GyIOFapVOjt7aG3t6epb00v\nIlObmo1I7dy5s+gScuElpxee5tNLVuUUULMRrRUrVhRdQi685PTC03x6yaqcAmo2orV27dqiS8hF\nLDnL5TLd3d25vmfe7zcesczneHjJqpwCajai5eXMghhyVhdptrefn+O7tnDuuefxxBNPDG7p7u4u\nfJFoDPM5Xl6yKqcAHF50ASLeVRdp5muAffv28YlPXDa4pb39AlpajCef3KEfnCLSUNqzIeLWAP39\newe/6ut7XWeliEgm1GxEatWqVUWXkAsvOb3wNJ9esiqngJqNaPX05L1bvhhecnrhaT69ZFVOAV2u\nXKRwBy4BbEDx/x/1/0NERqLLlYtEoRGNhh3yK0zFU2JFpLmp2RCJyqE2LC20t19Q+CmwIhIXNRuR\n8nJGgZec+Rmgr+/1wv5dPc2nl6zKKaBmI1qLFy8uuoRceMnphaf59JJVOQXUbERr5cqVRZeQCy85\nvfA0n16yKqeAmo1oeTmbwEtOLzzNp5esyimgZkNEREQypmZDxLWRT5WdCjdlE5F4qNmI1Pr164su\nIRdecmZn5FNl29svoK1tVu4Nh6f59JJVOQUyaDbM7DNm9oiZvWJmu83sLjM7YYRx15nZC2bWY2bf\nNbPj6h6fbmbrzKxiZq+a2Z1m9pa6MW8ys41m9rKZvWRmXzazNzQ6UzPq6prQxd2alpeceSvqpmye\n5tNLVuUUyOBy5Wb2LaAD+CHJLew/D/wuMCuE8Ho65mrgauBjwE+BvwR+Lx3Tl475EvAh4E+AV4B1\nwP4Qwvtq3uvbwLHApcA04KvAIyGEhaPUpsuVy5SzZcsWPvjBDxZdxoj0f0VEqg7lcuWHN7qYEMJZ\ntV+b2Z8CPwfmAFvTzVcC14cQ7knHfAzYDZwD3GFmRwGLgYtDCA+kYxYBO8zs5BDCI2Y2CziTJPRj\n6ZgrgG+TYCJxAAAUCElEQVSa2VUhhF2NziaSBV0MSERil8eajaNJDgz/EsDM3gG0At+rDgghvAI8\nDMxNN51I0gjVjnkSKNeMeQ/wUrXRSN2XvtcpWQQRideh31NFRGQ0mTYbZmbAzcDWEMKP0s2tJA3B\n7rrhu9PHIDk00pc2IaONaSXZYzIohLCfpKlpRUQmoPi7zYpIvLLes/G3wO8AF2f8PhNy1llnUSqV\nhnzMnTuXzZs3Dxl37733UiqVhj1/yZIlw1Yed3V1USqVhu0Sv+aaa1i1atWQbeVymVKpxM6dO4ds\nX7NmDcuXLx+yraenh1KpxNatW4ds7+joYNGiRcNqu+iii9i8efOQups5R62RcpRKpShyTFXVO8BO\nJMehzEepVJoy85H191XtezZzjloj5SiVSlHkgLHn46STTooiR3U+Ojo6Bn83tra2UiqVWLZs2bDn\njFsIIZMPYC3wLDCzbvs7gAHg9+u2/wC4Kf38dGA/cFTdmJ8CV6afLwJ+Uff4YcA+4OxRapoNhM7O\nzhC7LVu2FF1CLmLIuWHDhkCya2GKfbSEadN+JTz77LO5/VvEMJ/j5SWrcsajs7Oz+vNhdphgT5DJ\nng0zWwucDZweQhhyon4I4RlgFzCvZvxRJOssHko3dQL9dWPagJnAtnTTNuBoM3t3zcvPIzn4/HAj\n8zSj+fPnF11CLrzkLEb+d4D1NJ9esiqnQAZno5jZ3wILgBLwmpkdmz70cgihN/38ZuCzZvZjkr0V\n1wM/A74OyYJRM1sP3GhmLwGvAl8EHgwhPJKO2WlmW4BbzexyklNf1wAdQWeiiDRM9VCKiMhkZbFn\n4zLgKJLDIi/UfFxYHRBCWE3SGNxCshfiV4APhfQaG6llwD3AnTWvdV7de30U2ElyFso9wD8Bn2xw\nHhHHWmhvv0CXLheRQ9LwZiOE0BJCOGyEj6/VjVsZQnhbCGFGCOHMEMKP6x7fG0K4IoRwTAjhP4YQ\nLggh1J99sieEsDCE8MYQwptCCJ8IIfQ0OlMzql90FSsvOYuTHErZvn17Lg2Hp/n0klU5BXRvlGh1\ndHQUXUIuvOQs2rnnnp/LvVI8zaeXrMopkMHlyqcyXa5cpqKNGzeycOGIV9ifIozqdTj0f0fEr0O5\nXLn2bIjIQfj5g0REsqFmQ6RA5XJZ90YRkeg1/NRXERmfcrlMW9ss+vv7iy5FRCRT2rMRqZEuRRuj\nZs5ZqVTo7e2hv7/v4IOdaOb5nCgvWZVTQM1GtLxczc5LTi88zaeXrMopoLNRRApTs7K7aej/johf\nOhtFREREpiw1GyIiIpIpNRuR2rp1a9El5MJLTi88zaeXrMopoGYjWqtXry66hFw0a85yuay7qY6g\nWedzMrxkVU4BLRCNVk9PDzNmzCi6jMw1Y87q9TUGBgbo6+stupwJyfr/TjPO52R5yaqc8TiUBaK6\nqFekYv+mr2rGnNXra8hwzTifk+Ulq3IK6DCKiIiIZEzNhoiIiGRKzUakli9fXnQJufCS0wtP8+kl\nq3IKqNmI1syZM4suIRfNllNnoYyt2ebzUHjJqpwCOhtFJDfNfBZK1Ve+8hXmzZunH6wiDuly5SJN\noHoWSrM2GmAsXvxxTjjhtymXy0UXIyJNRM2GiIxTAAbYu/d1KpXKpF6hXC6rURFxSM1GpHbu3Fl0\nCbnwkjMG1cNIbW2zRm04PM2nl6zKKaBmI1orVqwouoRceMkZg+phpN7enlH3jHiaTy9ZlVNAzUa0\n1q5dW3QJuWiWnN7PQunu7uaWW2456Lhmmc9G8JJVOQV0ufJoeTlboBly1p6F4lV3dzd///d/f9Bx\nzTCfjeIlq3IKqNkQyZzuhSIi3ukwikhhrOgCJs3zISERmTg1G5FatWpV0SXkorlzNusF9Vpob78g\nk1NYm3s+J8ZLVuUUULMRrZ4eH7vtveScWgbo65v8tTbG4mk+vWRVTgFdrlyk4ap/8VcXjNVc4jcq\nE/l/VP9voP+DIs1HlysXmSLGc+GqZl6rUau7u3vch1KSNR5x5BaRiVOzIdJA47lwVfOu1Rjq3HPP\nP0hTdcCePXuIJbeITJyajUhlcTx9KvKSc+ox9u3rpbe3h+3btx909HjnydN8esmqnAJqNqK1ePHi\nokvIRXPljOkwQnUvRQvnnnse27ZtG3yk/mZr5XKZq64a36Wcm2s+D42XrMopoIt6RWvlypVFl5CL\n5soZ42GEAfbt28f73z+Pp59ObkTV1jYLgCef3MHMmTOpVCr09/eN69Waaz4PjZesyimgZiNaXlb6\nF5mz/qwTv5JTYR944AG2bt06eLXUSqUy4X8bL9+34Cercgqo2RCZlOpZJ3DgL3jfWli8+M/o799X\ndCEiMgVpzYbIBJXLZbZv337Qs066u7vZtm0bTzzxRM4VFmEgbTRiWpciIo2iZiNS69evL7qEXOSd\ns7pHo739/MFttfcJqf38nHPO4w//8L+waJGnhWMH1qVUr8MxkWbLy/ct+MmqnAJqNqLV1TWhi7s1\nrbxz7tixg97eHvr6etMtydkYd911F3fffTdnn31Out3o798LDBDnwtCDaeGcc9o57rgTWLToz4Y9\nOtqN3Lx834KfrMopoMuVi0zIxo0bWbhwYd1WA4zDDmth//7+IsqaooyRG60Wpk2bztNP79RaF5Em\nosuVi+Rk5PUZARhQozHMaH/IZHcjNxGZmtRsiIzTtm3buOqq5UWXISLSdHTq6yh0DQV/RprzcrnM\n888/D8D733+GTu0UEZkE7dlg5Msrj3TnzvpxU1mpVCq6hFw0KudIc75t2zaOP/4E/vAP38dpp51e\nsyhUsuLl+xb8ZFVOATUbI/6SGenOneO7dfjUsXTp0qJLyExt0zdWzu7ublauXDnszIdyucyjjz46\n+Fj9dTMeeOABli1bxmmnnUFf315gP/v27c0ykqRi/r6t5yWrcgroMMpgY1H9fLTDJiONq9/tPpUO\nvcyfP7/oEjJRf+XO+py1c9Dd3c21115LqVRi374Dhz/a2mYxMDBAX18vJ510EueffyEDAwPpoy0s\nXvyJ9LTVsYx2poUcili/b0fiJatyCqjZmLT6X3ow/AZUU6n5iMVYzeFIcwLJHo73vvd9ANx55x2D\nzwfYs2fPkK+TK2GOZy+GGo1D9fjjj7N3717e/va36/+ISOTUbExS/S89YNjXo907Q01INkaaExja\nUOzZs6eQ2qSesXjxnwHG9OnTeeqpnaM26Pr/ItL8XK7ZuP/++zN/j/p1H9X1A48++uiwtR/VNQjj\nWYA61pja17j11lsbnulg9WS5gLb62vXrLzZv3nzQ99X1HKaiQPX6JHv3vs727dvZtm0bv/Vbxw9b\npHvCCb/NCSe0cddddzXFeqnx2rx5c9El5EI5BZzu2Xjqqadyf8/q+oHjjz9+xD0gIQwAhpmNehhm\ntMME1bG1r9PX18uZZ5456l+IE/kLsvYHfPU00Le//e08//zznH76PMyM227bwIIFlwCBjo7bqleZ\nGzK+Xm0dIz1Wfe/Ozk4WLLiEEAYYXFpBshv+85//POXyz4DAX/7l9UP+vRMtQ66N8fTTT6efJWsu\nHn300SFfD/+cMbZJYxh/9EclDj/8cPr7++jv72P79u10dnZy4YUX09/fBxjt7eczbdoRbNrUQWtr\na9Mfflm1ahXnnHPOwQc2OeUUiKDZMLMlwFVAK/A4cEUI4dGxnzV11O76r90GBw7D/NM//YBvfvOb\nzJ07d3Ds9u3bOf/8CwkhcPnln+SDH/zgsNfZsWMHX/nKV/jwhz9Mf38/Z5zxAQC+//37OOOMDxDC\nAPff//3BxqH28WpzUN0ewgAhBPr6koWW06YdQQgMnqVx4YUL0rUOB34pAEPGm7WkzVALLS0HGpSB\ngf0MDARaWlowa8EscP/93wfgtNNOZ9++PpJf9EMbgmQ3/IGvly9fkX6e3K/kU5+6jGQNxoEx1157\nXfp58npf+MIXa75mhM/H2iaNkfzbJk1F4pxzzkuvaTJQMybQ19dHe/t5gA02HnPmzJlU01H04Zk3\nv/nNhbxv3pRToMmbDTO7CPgb4FLgEWAZsMXMTgghNPW+89om5KmnnuLaa69lw4YNg4/XrkO4+eab\nOfHEE0d8jWuvvZbPf341yQ/q5DoRP/nJT9LntnDaaafT0tKSNhLJ46eddgb79+/j8MOnDXlereSU\n0AMOLKqs/lLYO8r4Aw3DgQYlsX9/9bGkLqDulNOxGoLarwfYt6+PL3zhC3WPjfWcetqTka/af2sb\nY5Hugbns69tLe/v5HHHEEdx+e7K3Axjc4zFWM7Ft2zZOP30ekDS2c+fOHfJ40Y2ISGyautkgaS5u\nCSF8DcDMLgM+DCwGVhdZ2FTS1/f6KI8MjHj9iH37eg/yvENx4JfK8F8o1cdGrmuy71PM82XyJvJv\nnzSWyd4OqO7xuP766/jzP/8chx3WwqZNtw02IgC7du2qOTyTNLa1zcqOHTu49NLLhzy39jCgGhCR\niWvaZsPMjgDmAH9V3RZCCGZ2HzB31CeKSGSG7vHq6+vj6quvBpK9ZQcakZGeU9+sHHjswHONI444\nHLPDMEvWJNU2L1X165Kef/75YetKtMfkgOqtAJp97Y2MT9M2G8AxwGHA7rrtu4G2UZ5zJMBzzz1H\nR0cHAM8888zgg9/+9rd58sknx7XNzCb19UMPPTTquNG2V59T+9zaz0f6erRtY20XicPB1t+MNX6k\n7YF9+/YDyZ6Q9vYLSA6zDTV9+vTBQ5IAvb29HHnkkaxe/Vccc8wxVCoVPvOZzxFC4IYb/pIHH3xw\n8OdQzEbKWalUuPrqP+f114f+GzWzIufzV3/1V3NZM7Jjx+CJCUdO9LlW/Y/RbMzsrcDzwNwQwsM1\n21cBp4YQhu3dMLOPAhvzq1JERCQ6l4QQbpvIE5p5z0YF2A8cW7f9WGDXKM/ZAlwC/BTQXbVERETG\n70jgN0l+l05I0+7ZADCzfwEeDiFcmX5tQBn4YgjhrwstTkRERIDm3rMBcCPwVTPr5MCprzOArxZZ\nlIiIiBzQ1M1GCOEOMzsGuI7k8Mm/AmeGEF4stjIRERGpaurDKCIiIjL1ubwRm4iIiORHzYaIiIhk\nyk2zYWZLzOwZM3vdzP7FzE4quqZGM7P3mdk3zOx5Mxsws1LRNWXBzD5jZo+Y2StmttvM7jKzE4qu\nq9HM7DIze9zMXk4/HjKzDxZdV9bM7NPp9++NRdfSSGZ2TZqr9uNHRdeVBTN7m5n9g5lVzKwn/T6e\nXXRdjZT+PqmfzwEzW1N0bY1kZi1mdr2Z/SSdyx+b2Wcn+joumo2aG7ZdA7yb5O6wW9LFpTF5A8ki\n2U8R98093gesAU4BPgAcAdxrZr9SaFWN9xxwNTCb5NL83we+bmazCq0qQ+kfAZeS/B+N0b+RLGZv\nTT/+S7HlNJ6ZHQ08COwFzgRmAf8deKnIujJwIgfmsRX4ryQ/d+8osqgMfBr4JMnvld8GVgArzGzp\nRF7ExQLRUa7H8RzJ9TiivGGbmQ0A54QQvlF0LVlLm8afk1w5dmvR9WTJzH4BXBVC+F9F19JoZvYf\ngE7gcuBzwGMhhP9WbFWNY2bXAGeHEKL6C7+emd1AcmXn04quJU9mdjNwVgghqr2sZnY3sCuE8Ima\nbXcCPSGEj433daLfs1Fzw7bvVbeFpMPSDdvicTTJXxS/LLqQrKS7Mi8muY7MtqLrycg64O4QwveL\nLiRDx6eHOf+fmW0ws98ouqAMfAT4oZndkR7m7DKzjxddVJbS3zOXAOuLriUDDwHzzOx4ADN7F/Be\n4FsTeZGmvs7GOE3mhm3SJNK9VDcDW0MI0R3/NrPfJWkujgReBc4NIewstqrGSxupPyDZNR2rfwH+\nFHgSeCuwEvgnM/vdEMJrBdbVaO8k2Tv1N8D/AE4Gvmhme0MI/1BoZdk5F3gj8L+LLiQDNwBHATvN\nbD/JToq/CCFsmsiLeGg2JG5/C/wOSacdo53Au0h+kJ0PfM3MTo2p4TCzXydpGD8QQthXdD1ZCSHU\n3k/i38zsEeBZ4EIgpsNiLcAjIYTPpV8/njbNlwGxNhuLgW+HEEa7L1czuwj4KHAx8COSPwq+YGYv\nTKR59NBsTOaGbdIEzGwtcBbwvhBCd9H1ZCGE0A/8JP3yMTM7GbiS5C/HWMwB3gx0pXuqINkbeWq6\nCG16iHBxWQjhZTN7Cjiu6FoarBvYUbdtB9BeQC2ZM7OZJAvVzym6loysBj4fQvg/6df/bma/CXyG\nCTSP0a/ZSP9S6gTmVbelP9DmkRyLkiaUNhpnA6eHEMpF15OjFmB60UU02H3A75H8xfSu9OOHwAbg\nXTE2GjC4IPY4kl/OMXmQ4Yeo20j24sRoMclh+QmtYWgiM0j+YK81wAT7Bw97NsDJDdvM7A0kP7yq\nfx2+M13M88sQwnPFVdZYZva3wAKgBLxmZtW9Vi+HEHqLq6yxzOyvgG+T3Mn4P5IsQDsNmF9kXY2W\nrlcYst7GzF4DfhFCqP8LuWmZ2V8Dd5P80n07cC2wD+gosq4M3AQ8aGafITkN9BTg48AnxnxWE0r/\ncP1T4KshhIGCy8nK3cBnzexnwL+TnIq/DPjyRF7ERbPh6IZtJwL3k5yZEUgWaEGyaGlxUUVl4DKS\nfD+o274I+Fru1WTnLSRz91bgZeAJYH7kZ2tUxbg349eB24BfA14EtgLvCSH8otCqGiyE8EMzO5dk\nYeHngGeAKye6oLBJfAD4DeJac1NvKXA9ydlibwFeAL6Ubhs3F9fZEBERkeJEv2ZDREREiqVmQ0RE\nRDKlZkNEREQypWZDREREMqVmQ0RERDKlZkNEREQypWZDREREMqVmQ0RERDKlZkNEREQypWZDRERE\nMqVmQ0RERDL1/wGwBYYrGDuyWwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "segments['seg_length'].apply(np.log).hist(bins=500)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can see that although there are date/time fields in the dataset, they are not in any specialized format, such as `datetime`." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "dtype('O')" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segments['st_time'].dtype" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our first order of business will be to convert these data to `datetime`. \n", "\n", "The `strptime` method parses a string representation of a date and/or time field, according to the expected format of this information." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2009, 2, 10, 16, 3)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datetime.strptime(segments['st_time'].ix[0], '%m/%d/%y %H:%M')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As a convenience, Pandas has a `to_datetime` method that will parse and convert an entire Series of formatted strings into `datetime` objects." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 2009-02-10 16:03:00\n", "1 2009-04-06 14:31:00\n", "2 2009-04-06 14:36:00\n", "3 2009-04-10 17:58:00\n", "4 2009-04-10 17:59:00\n", "5 2010-03-20 16:06:00\n", "6 2010-03-20 18:05:00\n", "7 2011-05-04 11:28:00\n", "8 2010-06-05 11:23:00\n", "9 2010-06-08 11:03:00\n", "Name: st_time, dtype: datetime64[ns]" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_datetime(segments['st_time'][:10])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Pandas also has a custom NA value for missing datetime objects, `NaT`." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_datetime([None])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, if `to_datetime()` has problems parsing any particular date/time format, you can pass the spec in using the `format=` argument." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Merging and joining `DataFrame`s" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In Pandas, we can combine tables according to the value of one or more *keys* that are used to identify rows, much like an index." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageid
0200
1211
2202
3243
\n", "
" ], "text/plain": [ " age id\n", "0 20 0\n", "1 21 1\n", "2 20 2\n", "3 24 3" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame({'id': range(4), \n", " 'age': np.random.randint(18, 31, size=4)})\n", "df1" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idscore
000.025323
110.336570
220.866701
300.952417
410.402052
520.195401
\n", "
" ], "text/plain": [ " id score\n", "0 0 0.025323\n", "1 1 0.336570\n", "2 2 0.866701\n", "3 0 0.952417\n", "4 1 0.402052\n", "5 2 0.195401" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame({'id': list(range(3))*2, \n", " 'score': np.random.random(size=6)})\n", "df2" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageidscore
02000.025323
12000.952417
22110.336570
32110.402052
42020.866701
52020.195401
\n", "
" ], "text/plain": [ " age id score\n", "0 20 0 0.025323\n", "1 20 0 0.952417\n", "2 21 1 0.336570\n", "3 21 1 0.402052\n", "4 20 2 0.866701\n", "5 20 2 0.195401" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(df1, df2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Notice that without any information about which column to use as a key, Pandas did the right thing and used the `id` column in both tables. Unless specified otherwise, `merge` will used any common column names as keys for merging the tables. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Notice also that `id=3` from `df1` was omitted from the merged table. This is because, by default, `merge` performs an **inner join** on the tables, meaning that the merged table represents an intersection of the two tables." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageidscore
02000.025323
12000.952417
22110.336570
32110.402052
42020.866701
52020.195401
6243NaN
\n", "
" ], "text/plain": [ " age id score\n", "0 20 0 0.025323\n", "1 20 0 0.952417\n", "2 21 1 0.336570\n", "3 21 1 0.402052\n", "4 20 2 0.866701\n", "5 20 2 0.195401\n", "6 24 3 NaN" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(df1, df2, how='outer')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The **outer join** above yields the union of the two tables, so all rows are represented, with missing values inserted as appropriate. \n", "\n", "One can also perform **right** and **left** joins to include all rows of the right or left table (*i.e.* first or second argument to `merge`), but not necessarily the other." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Back to the example\n", "\n", "Now that we have the vessel transit information as we need it, we may want a little more information regarding the vessels themselves. \n", "\n", "In the `data/AIS` folder there is a second table that contains information about each of the ships that traveled the segments in the `segments` table." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
num_namesnamessovflagflag_typenum_loasloamax_loanum_typestype
mmsi
18Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing
93000000009/Raven/ShearwaterNUnknownUnknown250.0/62.062.02Pleasure/Tug
\n", "
" ], "text/plain": [ " num_names names sov \\\n", "mmsi \n", "1 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y \n", "9 3 000000009/Raven/Shearwater N \n", "\n", " flag flag_type num_loas loa \\\n", "mmsi \n", "1 Unknown Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 \n", "9 Unknown Unknown 2 50.0/62.0 \n", "\n", " max_loa num_types type \n", "mmsi \n", "1 156.0 4 Dredging/MilOps/Reserved/Towing \n", "9 62.0 2 Pleasure/Tug " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vessels = pd.read_csv('data/AIS/vessel_information.csv', index_col='mmsi')\n", "vessels.head(2)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mmsinametransitsegmentseg_lengthavg_sogmin_sogmax_sogpdgt10st_timeend_time
01Us Govt Ves115.113.29.214.596.52/10/09 16:032/10/09 16:27
11Dredge Capt Frank1113.518.610.420.6100.04/6/09 14:314/6/09 15:20
\n", "
" ], "text/plain": [ " mmsi name transit segment seg_length avg_sog min_sog \\\n", "0 1 Us Govt Ves 1 1 5.1 13.2 9.2 \n", "1 1 Dredge Capt Frank 1 1 13.5 18.6 10.4 \n", "\n", " max_sog pdgt10 st_time end_time \n", "0 14.5 96.5 2/10/09 16:03 2/10/09 16:27 \n", "1 20.6 100.0 4/6/09 14:31 4/6/09 15:20 " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segments.head(2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that there is a `mmsi` value (a vessel identifier) in each table, but it is used as an index for the `vessels` table. In this case, we have to specify to join on the index for this table, and on the `mmsi` column for the other." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": true }, "outputs": [], "source": [ "segments_merged = pd.merge(vessels, segments, left_index=True, right_on='mmsi')" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
num_namesnamessovflagflag_typenum_loasloamax_loanum_typestype...nametransitsegmentseg_lengthavg_sogmin_sogmax_sogpdgt10st_timeend_time
08Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Us Govt Ves115.113.29.214.596.52/10/09 16:032/10/09 16:27
18Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Dredge Capt Frank1113.518.610.420.6100.04/6/09 14:314/6/09 15:20
28Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Us Gov Vessel114.316.210.320.5100.04/6/09 14:364/6/09 14:55
38Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Us Gov Vessel219.215.414.516.1100.04/10/09 17:584/10/09 18:34
48Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Dredge Capt Frank219.215.414.616.2100.04/10/09 17:594/10/09 18:35
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " num_names names sov flag \\\n", "0 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "1 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "2 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "3 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "4 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "\n", " flag_type num_loas loa max_loa \\\n", "0 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "1 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "2 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "3 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "4 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "\n", " num_types type ... \\\n", "0 4 Dredging/MilOps/Reserved/Towing ... \n", "1 4 Dredging/MilOps/Reserved/Towing ... \n", "2 4 Dredging/MilOps/Reserved/Towing ... \n", "3 4 Dredging/MilOps/Reserved/Towing ... \n", "4 4 Dredging/MilOps/Reserved/Towing ... \n", "\n", " name transit segment seg_length avg_sog min_sog max_sog \\\n", "0 Us Govt Ves 1 1 5.1 13.2 9.2 14.5 \n", "1 Dredge Capt Frank 1 1 13.5 18.6 10.4 20.6 \n", "2 Us Gov Vessel 1 1 4.3 16.2 10.3 20.5 \n", "3 Us Gov Vessel 2 1 9.2 15.4 14.5 16.1 \n", "4 Dredge Capt Frank 2 1 9.2 15.4 14.6 16.2 \n", "\n", " pdgt10 st_time end_time \n", "0 96.5 2/10/09 16:03 2/10/09 16:27 \n", "1 100.0 4/6/09 14:31 4/6/09 15:20 \n", "2 100.0 4/6/09 14:36 4/6/09 14:55 \n", "3 100.0 4/10/09 17:58 4/10/09 18:34 \n", "4 100.0 4/10/09 17:59 4/10/09 18:35 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segments_merged.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Notice that `mmsi` field that was an index on the `vessels` table is no longer an index on the merged table." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Each `DataFrame` also has a `.merge()` method that could have been used:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
num_namesnamessovflagflag_typenum_loasloamax_loanum_typestype...nametransitsegmentseg_lengthavg_sogmin_sogmax_sogpdgt10st_timeend_time
08Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Us Govt Ves115.113.29.214.596.52/10/09 16:032/10/09 16:27
18Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Dredge Capt Frank1113.518.610.420.6100.04/6/09 14:314/6/09 15:20
28Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Us Gov Vessel114.316.210.320.5100.04/6/09 14:364/6/09 14:55
38Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Us Gov Vessel219.215.414.516.1100.04/10/09 17:584/10/09 18:34
48Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...Dredge Capt Frank219.215.414.616.2100.04/10/09 17:594/10/09 18:35
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " num_names names sov flag \\\n", "0 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "1 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "2 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "3 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "4 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "\n", " flag_type num_loas loa max_loa \\\n", "0 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "1 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "2 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "3 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "4 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "\n", " num_types type ... \\\n", "0 4 Dredging/MilOps/Reserved/Towing ... \n", "1 4 Dredging/MilOps/Reserved/Towing ... \n", "2 4 Dredging/MilOps/Reserved/Towing ... \n", "3 4 Dredging/MilOps/Reserved/Towing ... \n", "4 4 Dredging/MilOps/Reserved/Towing ... \n", "\n", " name transit segment seg_length avg_sog min_sog max_sog \\\n", "0 Us Govt Ves 1 1 5.1 13.2 9.2 14.5 \n", "1 Dredge Capt Frank 1 1 13.5 18.6 10.4 20.6 \n", "2 Us Gov Vessel 1 1 4.3 16.2 10.3 20.5 \n", "3 Us Gov Vessel 2 1 9.2 15.4 14.5 16.1 \n", "4 Dredge Capt Frank 2 1 9.2 15.4 14.6 16.2 \n", "\n", " pdgt10 st_time end_time \n", "0 96.5 2/10/09 16:03 2/10/09 16:27 \n", "1 100.0 4/6/09 14:31 4/6/09 15:20 \n", "2 100.0 4/6/09 14:36 4/6/09 14:55 \n", "3 100.0 4/10/09 17:58 4/10/09 18:34 \n", "4 100.0 4/10/09 17:59 4/10/09 18:35 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vessels.merge(segments, left_index=True, right_on='mmsi').head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Occasionally, there will be fields with the same in both tables that we do not wish to use to join the tables; they may contain different information, despite having the same name. \n", "\n", "In this case, Pandas will by default append suffixes `_x` and `_y` to the columns to uniquely identify them." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
num_namesnamessovflagflag_typenum_loasloamax_loanum_typestype_x...transitsegmentseg_lengthavg_sogmin_sogmax_sogpdgt10st_timeend_timetype_y
08Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...115.113.29.214.596.52/10/09 16:032/10/09 16:27foo
18Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...1113.518.610.420.6100.04/6/09 14:314/6/09 15:20foo
28Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...114.316.210.320.5100.04/6/09 14:364/6/09 14:55foo
38Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...219.215.414.516.1100.04/10/09 17:584/10/09 18:34foo
48Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...YUnknownUnknown742.0/48.0/57.0/90.0/138.0/154.0/156.0156.04Dredging/MilOps/Reserved/Towing...219.215.414.616.2100.04/10/09 17:594/10/09 18:35foo
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " num_names names sov flag \\\n", "0 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "1 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "2 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "3 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "4 8 Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho... Y Unknown \n", "\n", " flag_type num_loas loa max_loa \\\n", "0 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "1 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "2 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "3 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "4 Unknown 7 42.0/48.0/57.0/90.0/138.0/154.0/156.0 156.0 \n", "\n", " num_types type_x ... transit segment \\\n", "0 4 Dredging/MilOps/Reserved/Towing ... 1 1 \n", "1 4 Dredging/MilOps/Reserved/Towing ... 1 1 \n", "2 4 Dredging/MilOps/Reserved/Towing ... 1 1 \n", "3 4 Dredging/MilOps/Reserved/Towing ... 2 1 \n", "4 4 Dredging/MilOps/Reserved/Towing ... 2 1 \n", "\n", " seg_length avg_sog min_sog max_sog pdgt10 st_time \\\n", "0 5.1 13.2 9.2 14.5 96.5 2/10/09 16:03 \n", "1 13.5 18.6 10.4 20.6 100.0 4/6/09 14:31 \n", "2 4.3 16.2 10.3 20.5 100.0 4/6/09 14:36 \n", "3 9.2 15.4 14.5 16.1 100.0 4/10/09 17:58 \n", "4 9.2 15.4 14.6 16.2 100.0 4/10/09 17:59 \n", "\n", " end_time type_y \n", "0 2/10/09 16:27 foo \n", "1 4/6/09 15:20 foo \n", "2 4/6/09 14:55 foo \n", "3 4/10/09 18:34 foo \n", "4 4/10/09 18:35 foo \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segments['type'] = 'foo'\n", "pd.merge(vessels, segments, left_index=True, right_on='mmsi').head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This behavior can be overridden by specifying a `suffixes` argument, containing a list of the suffixes to be used for the columns of the left and right columns, respectively." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Reshaping `DataFrame`s" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This dataset in from Table 6.9 of [Statistical Methods for the Analysis of Repeated Measurements](http://www.amazon.com/Statistical-Methods-Analysis-Repeated-Measurements/dp/0387953701) by Charles S. Davis, pp. 161-163 (Springer, 2002). These data are from a multicenter, randomized controlled trial of botulinum toxin type B (BotB) in patients with cervical dystonia from nine U.S. sites.\n", "\n", "* Randomized to placebo (N=36), 5000 units of BotB (N=36), 10,000 units of BotB (N=37)\n", "* Response variable: total score on Toronto Western Spasmodic Torticollis Rating Scale (TWSTRS), measuring severity, pain, and disability of cervical dystonia (high scores mean more impairment)\n", "* TWSTRS measured at baseline (week 0) and weeks 2, 4, 8, 12, 16 after treatment began" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientobsweeksiteidtreatagesextwstrs
0110115000U65F32
1122115000U65F30
2134115000U65F24
3148115000U65F37
41512115000U65F39
\n", "
" ], "text/plain": [ " patient obs week site id treat age sex twstrs\n", "0 1 1 0 1 1 5000U 65 F 32\n", "1 1 2 2 1 1 5000U 65 F 30\n", "2 1 3 4 1 1 5000U 65 F 24\n", "3 1 4 8 1 1 5000U 65 F 37\n", "4 1 5 12 1 1 5000U 65 F 39" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia = pd.read_csv('data/cdystonia.csv', index_col=None)\n", "cdystonia.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This dataset includes repeated measurements of the same individuals (longitudinal data). Its possible to present such information in (at least) two ways: showing each repeated measurement in their own row, or in multiple columns representing multiple measurements." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`.stack()` rotates the data frame so that columns are represented in rows:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 patient 1\n", " obs 1\n", " week 0\n", " site 1\n", " id 1\n", " ... \n", "630 id 11\n", " treat 5000U\n", " age 57\n", " sex M\n", " twstrs 51\n", "dtype: object" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stacked = cdystonia.stack()\n", "stacked" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And there's a corresponding `.unstack()` which pivots back into columns:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientobsweeksiteidtreatagesextwstrs
0110115000U65F32
1122115000U65F30
2134115000U65F24
3148115000U65F37
41512115000U65F39
\n", "
" ], "text/plain": [ " patient obs week site id treat age sex twstrs\n", "0 1 1 0 1 1 5000U 65 F 32\n", "1 1 2 2 1 1 5000U 65 F 30\n", "2 1 3 4 1 1 5000U 65 F 24\n", "3 1 4 8 1 1 5000U 65 F 37\n", "4 1 5 12 1 1 5000U 65 F 39" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stacked.unstack().head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For this dataset, it makes sense to create a hierarchical index based on the patient and observation:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weeksiteidtreatagesextwstrs
patientobs
110115000U65F32
22115000U65F30
34115000U65F24
48115000U65F37
512115000U65F39
\n", "
" ], "text/plain": [ " week site id treat age sex twstrs\n", "patient obs \n", "1 1 0 1 1 5000U 65 F 32\n", " 2 2 1 1 5000U 65 F 30\n", " 3 4 1 1 5000U 65 F 24\n", " 4 8 1 1 5000U 65 F 37\n", " 5 12 1 1 5000U 65 F 39" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia2 = cdystonia.set_index(['patient','obs'])\n", "cdystonia2.head()" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,\n", " ...\n", " 100, 101, 102, 103, 104, 105, 106, 107, 108, 109],\n", " dtype='int64', name='patient', length=109)\n", "Int64Index([1, 2, 3, 4, 5, 6], dtype='int64', name='obs')\n" ] } ], "source": [ "print(cdystonia2.index.levels[0])\n", "print(cdystonia2.index.levels[1])" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "FrozenList([[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, ...], [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, ...]])" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia2.index.labels" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([(1, 1), (1, 2), (1, 3), (1, 4), (1, 5)], dtype=object)" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia2.index.values[0:5]" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Int64Index([1, 2, 3, 4, 5, 6, 1, 2, 3, 4,\n", " ...\n", " 2, 3, 4, 5, 6, 1, 2, 4, 5, 6],\n", " dtype='int64', name='obs', length=631)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia2.index.get_level_values('obs')" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia2.index.is_unique" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If we want to transform this data so that repeated measurements are in columns, we can `unstack` the `twstrs` measurements according to `obs`:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weeksiteidtreatagesextwstrs
patientobs
110115000U65F32
22115000U65F30
34115000U65F24
48115000U65F37
512115000U65F39
\n", "
" ], "text/plain": [ " week site id treat age sex twstrs\n", "patient obs \n", "1 1 0 1 1 5000U 65 F 32\n", " 2 2 1 1 5000U 65 F 30\n", " 3 4 1 1 5000U 65 F 24\n", " 4 8 1 1 5000U 65 F 37\n", " 5 12 1 1 5000U 65 F 39" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia2.head()" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
obs123456
patient
132.030.024.037.039.036.0
260.026.027.041.065.067.0
344.020.023.026.035.035.0
453.061.064.062.0NaNNaN
553.035.048.049.041.051.0
\n", "
" ], "text/plain": [ "obs 1 2 3 4 5 6\n", "patient \n", "1 32.0 30.0 24.0 37.0 39.0 36.0\n", "2 60.0 26.0 27.0 41.0 65.0 67.0\n", "3 44.0 20.0 23.0 26.0 35.0 35.0\n", "4 53.0 61.0 64.0 62.0 NaN NaN\n", "5 53.0 35.0 48.0 49.0 41.0 51.0" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twstrs_wide = cdystonia2['twstrs'].unstack('obs')\n", "twstrs_wide.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And if we want to keep the other variables:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientsiteidtreatagesex123456
01115000U65F32.030.024.037.039.036.0
621210000U70F60.026.027.041.065.067.0
123135000U64F44.020.023.026.035.035.0
18414Placebo59F53.061.064.062.0NaNNaN
2251510000U76F53.035.048.049.041.051.0
\n", "
" ], "text/plain": [ " patient site id treat age sex 1 2 3 4 5 6\n", "0 1 1 1 5000U 65 F 32.0 30.0 24.0 37.0 39.0 36.0\n", "6 2 1 2 10000U 70 F 60.0 26.0 27.0 41.0 65.0 67.0\n", "12 3 1 3 5000U 64 F 44.0 20.0 23.0 26.0 35.0 35.0\n", "18 4 1 4 Placebo 59 F 53.0 61.0 64.0 62.0 NaN NaN\n", "22 5 1 5 10000U 76 F 53.0 35.0 48.0 49.0 41.0 51.0" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia_wide = (cdystonia[['patient','site','id','treat','age','sex']]\n", " .drop_duplicates()\n", " .merge(twstrs_wide, right_index=True, left_on='patient', how='inner')\n", " .head())\n", "cdystonia_wide" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Or to simplify things, we can set the patient-level information as an index before unstacking:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
week02481216
patientsiteidtreatagesex
1115000U65F32.030.024.037.039.036.0
21210000U70F60.026.027.041.065.067.0
3135000U64F44.020.023.026.035.035.0
414Placebo59F53.061.064.062.0NaNNaN
51510000U76F53.035.048.049.041.051.0
\n", "
" ], "text/plain": [ "week 0 2 4 8 12 16\n", "patient site id treat age sex \n", "1 1 1 5000U 65 F 32.0 30.0 24.0 37.0 39.0 36.0\n", "2 1 2 10000U 70 F 60.0 26.0 27.0 41.0 65.0 67.0\n", "3 1 3 5000U 64 F 44.0 20.0 23.0 26.0 35.0 35.0\n", "4 1 4 Placebo 59 F 53.0 61.0 64.0 62.0 NaN NaN\n", "5 1 5 10000U 76 F 53.0 35.0 48.0 49.0 41.0 51.0" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(cdystonia.set_index(['patient','site','id','treat','age','sex','week'])['twstrs']\n", " .unstack('week').head())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### [`.melt()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)\n", "\n", "- To convert our \"wide\" format back to long, we can use the `melt` function. \n", "- This function is useful for `DataFrame`s where one or more columns are identifier variables (`id_vars`), with the remaining columns being measured variables (`value_vars`). \n", "- The measured variables are \"unpivoted\" to the row axis, leaving just two non-identifier columns, a *variable* and its corresponding *value*, which can both be renamed using optional arguments." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientsiteidtreatagesexobstwsters
01115000U65F132.0
121210000U70F160.0
23135000U64F144.0
3414Placebo59F153.0
451510000U76F153.0
\n", "
" ], "text/plain": [ " patient site id treat age sex obs twsters\n", "0 1 1 1 5000U 65 F 1 32.0\n", "1 2 1 2 10000U 70 F 1 60.0\n", "2 3 1 3 5000U 64 F 1 44.0\n", "3 4 1 4 Placebo 59 F 1 53.0\n", "4 5 1 5 10000U 76 F 1 53.0" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.melt(cdystonia_wide, id_vars=['patient','site','id','treat','age','sex'], \n", " var_name='obs', value_name='twsters').head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Pivoting" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `pivot` method allows a DataFrame to be transformed easily between long and wide formats in the same way as a pivot table is created in a spreadsheet. \n", "\n", "It takes three arguments: `index`, `columns` and `values`, corresponding to the DataFrame index (the row headers), columns and cell values, respectively." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For example, we may want the `twstrs` variable (the response variable) in wide format according to patient, as we saw with the unstacking method above:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
obs123456
patient
132.030.024.037.039.036.0
260.026.027.041.065.067.0
344.020.023.026.035.035.0
453.061.064.062.0NaNNaN
553.035.048.049.041.051.0
\n", "
" ], "text/plain": [ "obs 1 2 3 4 5 6\n", "patient \n", "1 32.0 30.0 24.0 37.0 39.0 36.0\n", "2 60.0 26.0 27.0 41.0 65.0 67.0\n", "3 44.0 20.0 23.0 26.0 35.0 35.0\n", "4 53.0 61.0 64.0 62.0 NaN NaN\n", "5 53.0 35.0 48.0 49.0 41.0 51.0" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia.pivot(index='patient', columns='obs', values='twstrs').head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we omit the `values` argument, we get a `DataFrame` with hierarchical columns, just as when we applied `unstack` to the hierarchically-indexed table:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weeksite...sextwstrs
obs1234561234...3456123456
patient
10.02.04.08.012.016.01.01.01.01.0...FFFF32.030.024.037.039.036.0
20.02.04.08.012.016.01.01.01.01.0...FFFF60.026.027.041.065.067.0
30.02.04.08.012.016.01.01.01.01.0...FFFF44.020.023.026.035.035.0
40.02.04.08.0NaNNaN1.01.01.01.0...FFNoneNone53.061.064.062.0NaNNaN
50.02.04.08.012.016.01.01.01.01.0...FFFF53.035.048.049.041.051.0
..................................................................
1050.02.04.08.012.016.09.09.09.09.0...FFFF45.046.033.044.046.048.0
1060.02.04.08.012.016.09.09.09.09.0...MMMM67.063.071.066.068.071.0
1070.0NaN4.08.0NaN16.09.0NaN9.09.0...MMNoneM57.0NaN36.023.0NaN52.0
1080.02.04.08.012.016.09.09.09.09.0...FFFF63.051.046.050.050.054.0
1090.02.0NaN8.012.016.09.09.0NaN9.0...NoneMMM53.038.0NaN33.036.051.0
\n", "

109 rows × 42 columns

\n", "
" ], "text/plain": [ " week site ... sex \\\n", "obs 1 2 3 4 5 6 1 2 3 4 ... 3 4 \n", "patient ... \n", "1 0.0 2.0 4.0 8.0 12.0 16.0 1.0 1.0 1.0 1.0 ... F F \n", "2 0.0 2.0 4.0 8.0 12.0 16.0 1.0 1.0 1.0 1.0 ... F F \n", "3 0.0 2.0 4.0 8.0 12.0 16.0 1.0 1.0 1.0 1.0 ... F F \n", "4 0.0 2.0 4.0 8.0 NaN NaN 1.0 1.0 1.0 1.0 ... F F \n", "5 0.0 2.0 4.0 8.0 12.0 16.0 1.0 1.0 1.0 1.0 ... F F \n", "... ... ... ... ... ... ... ... ... ... ... ... ... .. \n", "105 0.0 2.0 4.0 8.0 12.0 16.0 9.0 9.0 9.0 9.0 ... F F \n", "106 0.0 2.0 4.0 8.0 12.0 16.0 9.0 9.0 9.0 9.0 ... M M \n", "107 0.0 NaN 4.0 8.0 NaN 16.0 9.0 NaN 9.0 9.0 ... M M \n", "108 0.0 2.0 4.0 8.0 12.0 16.0 9.0 9.0 9.0 9.0 ... F F \n", "109 0.0 2.0 NaN 8.0 12.0 16.0 9.0 9.0 NaN 9.0 ... None M \n", "\n", " twstrs \n", "obs 5 6 1 2 3 4 5 6 \n", "patient \n", "1 F F 32.0 30.0 24.0 37.0 39.0 36.0 \n", "2 F F 60.0 26.0 27.0 41.0 65.0 67.0 \n", "3 F F 44.0 20.0 23.0 26.0 35.0 35.0 \n", "4 None None 53.0 61.0 64.0 62.0 NaN NaN \n", "5 F F 53.0 35.0 48.0 49.0 41.0 51.0 \n", "... ... ... ... ... ... ... ... ... \n", "105 F F 45.0 46.0 33.0 44.0 46.0 48.0 \n", "106 M M 67.0 63.0 71.0 66.0 68.0 71.0 \n", "107 None M 57.0 NaN 36.0 23.0 NaN 52.0 \n", "108 F F 63.0 51.0 46.0 50.0 50.0 54.0 \n", "109 M M 53.0 38.0 NaN 33.0 36.0 51.0 \n", "\n", "[109 rows x 42 columns]" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia.pivot('patient', 'obs')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A related method, `pivot_table`, creates a spreadsheet-like table with a hierarchical index, and allows the values of the table to be populated using an arbitrary aggregation function." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientobsweeksiteidtreatagesextwstrs
0110115000U65F32
1122115000U65F30
2134115000U65F24
3148115000U65F37
41512115000U65F39
\n", "
" ], "text/plain": [ " patient obs week site id treat age sex twstrs\n", "0 1 1 0 1 1 5000U 65 F 32\n", "1 1 2 2 1 1 5000U 65 F 30\n", "2 1 3 4 1 1 5000U 65 F 24\n", "3 1 4 8 1 1 5000U 65 F 37\n", "4 1 5 12 1 1 5000U 65 F 39" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia.head()" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
week02481216
sitetreat
110000U604148496567
5000U443234434246
Placebo536164623238
210000U656060646766
5000U676465646264
........................
610000U555647535151
5000U595550565953
Placebo545351575757
710000U534745455053
5000U534552515253
\n", "

20 rows × 6 columns

\n", "
" ], "text/plain": [ "week 0 2 4 8 12 16\n", "site treat \n", "1 10000U 60 41 48 49 65 67\n", " 5000U 44 32 34 43 42 46\n", " Placebo 53 61 64 62 32 38\n", "2 10000U 65 60 60 64 67 66\n", " 5000U 67 64 65 64 62 64\n", "... .. .. .. .. .. ..\n", "6 10000U 55 56 47 53 51 51\n", " 5000U 59 55 50 56 59 53\n", " Placebo 54 53 51 57 57 57\n", "7 10000U 53 47 45 45 50 53\n", " 5000U 53 45 52 51 52 53\n", "\n", "[20 rows x 6 columns]" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia.pivot_table(index=['site', 'treat'], columns='week', values='twstrs', \n", " aggfunc=max).head(20)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Crosstabs and Summaries" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For a simple cross-tabulation of group frequencies, the `crosstab` function (not a method) aggregates counts of data according to factors in rows and columns. The factors may be hierarchical if desired." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
site123456789
sex
F525342302254664828
M18293018113365833
\n", "
" ], "text/plain": [ "site 1 2 3 4 5 6 7 8 9\n", "sex \n", "F 52 53 42 30 22 54 66 48 28\n", "M 18 29 30 18 11 33 6 58 33" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(cdystonia['sex'], cdystonia['site'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And the `.describe()` method gives some useful summary information on the `DataFrame`:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientobsweeksiteidagetwstrs
count631.000000631.000000631.000000631.000000631.000000631.000000631.000000
mean54.4025363.4849456.9572115.1188597.02060255.61648241.478605
std31.3084951.7162195.6453162.6696674.28521512.12391012.716784
min1.0000001.0000000.0000001.0000001.00000026.0000006.000000
25%27.5000002.0000002.0000003.0000003.00000046.00000032.500000
50%54.0000003.0000004.0000006.0000007.00000056.00000043.000000
75%81.0000005.00000012.0000008.00000010.00000065.00000051.000000
max109.0000006.00000016.0000009.00000019.00000083.00000071.000000
\n", "
" ], "text/plain": [ " patient obs week site id age \\\n", "count 631.000000 631.000000 631.000000 631.000000 631.000000 631.000000 \n", "mean 54.402536 3.484945 6.957211 5.118859 7.020602 55.616482 \n", "std 31.308495 1.716219 5.645316 2.669667 4.285215 12.123910 \n", "min 1.000000 1.000000 0.000000 1.000000 1.000000 26.000000 \n", "25% 27.500000 2.000000 2.000000 3.000000 3.000000 46.000000 \n", "50% 54.000000 3.000000 4.000000 6.000000 7.000000 56.000000 \n", "75% 81.000000 5.000000 12.000000 8.000000 10.000000 65.000000 \n", "max 109.000000 6.000000 16.000000 9.000000 19.000000 83.000000 \n", "\n", " twstrs \n", "count 631.000000 \n", "mean 41.478605 \n", "std 12.716784 \n", "min 6.000000 \n", "25% 32.500000 \n", "50% 43.000000 \n", "75% 51.000000 \n", "max 71.000000 " ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdystonia.describe()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercise 4\n", "\n", "Open up [Lecture 4/Exercise 4.ipynb](./Exercise 4.ipynb) in your Jupyter notebook server.\n", "\n", "Solutions are at [Lecture 4/Exercise 4 - Solutions.ipynb](./Exercise 4 - Solutions.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## References\n", "\n", "Slide materials inspired by and adapted from [Chris Fonnesbeck](https://github.com/fonnesbeck/statistical-analysis-python-tutorial) and [Tom Augspurger](https://github.com/TomAugspurger/pydata-chi-h2t)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda root]", "language": "python", "name": "conda-root-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "livereveal": { "height": 768, "scroll": true, "slideNumber": true, "start_slideshow_at": "selected", "theme": "league", "width": 1024 } }, "nbformat": 4, "nbformat_minor": 1 }