{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "Content under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2017 L.A. Barba, N.C. Clementi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cheers! Stats with Beers\n", "\n", "\n", "\n", "Welcome to the second module in _Engineering Computations_, our series in computational thinking for undergraduate science and engineering students. This module explores practical statistical analysis with Python.\n", "\n", "This first lesson explores how we can answer questions using data combined with practical methods from statistics.\n", "\n", "We'll need some fun data to work with. We found a neat data set of canned craft beers in the US, scraped from the web and cleaned up by Jean-Nicholas Hould ([@nickhould](https://github.com/nickhould) on GitHub)—who we want to thank for having a permissive license on his GitHub repository so we can reuse his [work](https://github.com/nickhould/craft-beers-dataset)!\n", "\n", "The data source doesn't say that the set includes *all* the canned beers brewed in the country. So we have to asume that the data is a sample and may contain biases.\n", "\n", "To manipulate the data, you'll start with **NumPy**—the array library for Python that you learned about in [Module 1, lesson 4](http://go.gwu.edu/engcomp1lesson4). But you'll also learn about a new Python library for data analysis:\n", "[`pandas`](http://pandas.pydata.org/). It is an open-source library providing high-performance, easy-to-use data structures and data-analysis tools. Even though `pandas` is great for data analysis, we won't exploit all its power in this lesson. But you'll learn more about it later on!\n", "\n", "With `pandas`, you will read the data file (in `csv` format, for comma-separated values), display it in a nice table, and extract the columns that we need, which we'll convert to `numpy` arrays to work with.\n", "\n", "Let's start by importing the two Python libraries that we need." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas\n", "import numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Read the data file\n", "\n", "Below, we'll take a peek into the data file, `beers.csv,` using the system command `head` (which we can use with a bang, thanks to IPython).\n", "\n", "But first, we will download the data using a Python library for opening a URL on the Internet. We created a short URL for the data file in the public repository with our course materials.\n", "\n", "The cell below should download the data in your current working directory. The next cell shows you the first few lines of the data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('beers.csv', )" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from urllib.request import urlretrieve\n", "URL = 'http://go.gwu.edu/engcomp2data1'\n", "urlretrieve(URL, 'beers.csv')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",abv,ibu,id,name,style,brewery_id,ounces\r\n", "0,0.05,,1436,Pub Beer,American Pale Lager,408,12.0\r\n", "1,0.066,,2265,Devil's Cup,American Pale Ale (APA),177,12.0\r\n", "2,0.071,,2264,Rise of the Phoenix,American IPA,177,12.0\r\n", "3,0.09,,2263,Sinister,American Double / Imperial IPA,177,12.0\r\n", "4,0.075,,2262,Sex and Candy,American IPA,177,12.0\r\n", "5,0.077,,2261,Black Exodus,Oatmeal Stout,177,12.0\r\n", "6,0.045,,2260,Lake Street Express,American Pale Ale (APA),177,12.0\r\n", "7,0.065,,2259,Foreman,American Porter,177,12.0\r\n", "8,0.055,,2258,Jade,American Pale Ale (APA),177,12.0\r\n" ] } ], "source": [ "!head \"beers.csv\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use `pandas` to read the data from the `csv` file, and save it into a new variable called `beers`. Let's then check the type of this new variable—rememeber that we can use the function `type()` to do this." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "beers = pandas.read_csv(\"beers.csv\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(beers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a new data type for us: a `pandas DataFrame`. From the `pandas` documentation: \"A `DataFrame` is a 2-dimensional labeled data structure with columns of potentially different types\" [4]. You can think of it as the contens of a spreadsheet, saved into one handy Python variable. If you print it out, you get a nicely laid-out table: " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0abvibuidnamestylebrewery_idounces
000.050NaN1436Pub BeerAmerican Pale Lager40812.0
110.066NaN2265Devil's CupAmerican Pale Ale (APA)17712.0
220.071NaN2264Rise of the PhoenixAmerican IPA17712.0
330.090NaN2263SinisterAmerican Double / Imperial IPA17712.0
440.075NaN2262Sex and CandyAmerican IPA17712.0
550.077NaN2261Black ExodusOatmeal Stout17712.0
660.045NaN2260Lake Street ExpressAmerican Pale Ale (APA)17712.0
770.065NaN2259ForemanAmerican Porter17712.0
880.055NaN2258JadeAmerican Pale Ale (APA)17712.0
990.086NaN2131Cone CrusherAmerican Double / Imperial IPA17712.0
10100.072NaN2099Sophomoric SaisonSaison / Farmhouse Ale17712.0
11110.073NaN2098Regional Ring Of FireSaison / Farmhouse Ale17712.0
12120.069NaN2097Garce SeléSaison / Farmhouse Ale17712.0
13130.085NaN1980Troll DestroyerBelgian IPA17712.0
14140.06160.01979Bitter BitchAmerican Pale Ale (APA)17712.0
15150.060NaN2318Ginja NinjaCider15412.0
16160.060NaN2170Cherried AwayCider15412.0
17170.060NaN2169RhubarbarianCider15412.0
18180.060NaN1502BrightCiderCider15412.0
19190.082NaN1593He Said Baltic-Style PorterBaltic Porter36812.0
20200.082NaN1592He Said Belgian-Style TripelTripel36812.0
21210.09992.01036Lower De BoomAmerican Barleywine3688.4
22220.07945.01024Fireside ChatWinter Warmer36812.0
23230.079NaN976Marooned On Hog IslandAmerican Stout36812.0
24240.04442.0876Bitter AmericanAmerican Pale Ale (APA)36812.0
25250.04917.0802Hell or High Watermelon Wheat (2009)Fruit / Vegetable Beer36812.0
26260.04917.0801Hell or High Watermelon Wheat (2009)Fruit / Vegetable Beer36812.0
27270.04917.080021st Amendment Watermelon Wheat Beer (2006)Fruit / Vegetable Beer36812.0
28280.07070.079921st Amendment IPA (2006)American IPA36812.0
29290.07070.0797Brew Free! or Die IPA (2008)American IPA36812.0
...........................
238023800.08031.0761P-51 PorterAmerican Porter50916.0
238123810.055NaN2149#001 Golden Amber LagerAmerican Amber / Red Lager21112.0
238223820.07160.02148#002 American I.P.A.American IPA21112.0
238323830.052NaN2147#003 Brown & Robust PorterAmerican Porter21112.0
238423840.04838.02146#004 Session I.P.A.American IPA21112.0
238523850.059NaN2047TarasqueSaison / Farmhouse Ale23912.0
238623860.06261.01470Ananda India Pale AleAmerican IPA23912.0
238723870.04523.01469Tiny BombAmerican Pilsner23912.0
238823880.05872.02627Train HopperAmerican IPA1412.0
238923890.045NaN2626Edward’s Portly BrownAmerican Brown Ale1412.0
239023900.059135.01676Troopers Alley IPAAmerican IPA34412.0
239123910.04715.01468Wolverine Premium LagerAmerican Pale Lager40212.0
239223920.050NaN822Woodchuck Amber Hard CiderCider50112.0
239323930.06582.024174000 Footer IPAAmerican IPA10912.0
239423940.02815.02306Summer BrewAmerican Pilsner10912.0
239523950.06569.01697Be Hoppy IPAAmerican IPA33916.0
239623960.06969.02194Worthy IPAAmerican IPA19912.0
239723970.04525.01514Easy Day KolschKölsch19912.0
239823980.07730.01513Lights Out Vanilla Cream Extra StoutAmerican Double / Imperial IPA19912.0
239923990.06969.01512Worthy IPA (2013)American IPA19912.0
240024000.06050.01511Worthy PaleAmerican Pale Ale (APA)19912.0
240124010.042NaN1345Patty's Chile BeerChile Beer42412.0
240224020.082NaN1316Colorojo Imperial Red AleAmerican Strong Ale42412.0
240324030.055NaN1045Wynkoop Pumpkin AlePumpkin Ale42412.0
240424040.075NaN1035Rocky Mountain Oyster StoutAmerican Stout42412.0
240524050.06745.0928BelgoradoBelgian IPA42412.0
240624060.052NaN807Rail Yard AleAmerican Amber / Red Ale42412.0
240724070.055NaN620B3K Black LagerSchwarzbier42412.0
240824080.05540.0145Silverback Pale AleAmerican Pale Ale (APA)42412.0
240924090.052NaN84Rail Yard Ale (2009)American Amber / Red Ale42412.0
\n", "

2410 rows × 8 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 abv ibu id \\\n", "0 0 0.050 NaN 1436 \n", "1 1 0.066 NaN 2265 \n", "2 2 0.071 NaN 2264 \n", "3 3 0.090 NaN 2263 \n", "4 4 0.075 NaN 2262 \n", "5 5 0.077 NaN 2261 \n", "6 6 0.045 NaN 2260 \n", "7 7 0.065 NaN 2259 \n", "8 8 0.055 NaN 2258 \n", "9 9 0.086 NaN 2131 \n", "10 10 0.072 NaN 2099 \n", "11 11 0.073 NaN 2098 \n", "12 12 0.069 NaN 2097 \n", "13 13 0.085 NaN 1980 \n", "14 14 0.061 60.0 1979 \n", "15 15 0.060 NaN 2318 \n", "16 16 0.060 NaN 2170 \n", "17 17 0.060 NaN 2169 \n", "18 18 0.060 NaN 1502 \n", "19 19 0.082 NaN 1593 \n", "20 20 0.082 NaN 1592 \n", "21 21 0.099 92.0 1036 \n", "22 22 0.079 45.0 1024 \n", "23 23 0.079 NaN 976 \n", "24 24 0.044 42.0 876 \n", "25 25 0.049 17.0 802 \n", "26 26 0.049 17.0 801 \n", "27 27 0.049 17.0 800 \n", "28 28 0.070 70.0 799 \n", "29 29 0.070 70.0 797 \n", "... ... ... ... ... \n", "2380 2380 0.080 31.0 761 \n", "2381 2381 0.055 NaN 2149 \n", "2382 2382 0.071 60.0 2148 \n", "2383 2383 0.052 NaN 2147 \n", "2384 2384 0.048 38.0 2146 \n", "2385 2385 0.059 NaN 2047 \n", "2386 2386 0.062 61.0 1470 \n", "2387 2387 0.045 23.0 1469 \n", "2388 2388 0.058 72.0 2627 \n", "2389 2389 0.045 NaN 2626 \n", "2390 2390 0.059 135.0 1676 \n", "2391 2391 0.047 15.0 1468 \n", "2392 2392 0.050 NaN 822 \n", "2393 2393 0.065 82.0 2417 \n", "2394 2394 0.028 15.0 2306 \n", "2395 2395 0.065 69.0 1697 \n", "2396 2396 0.069 69.0 2194 \n", "2397 2397 0.045 25.0 1514 \n", "2398 2398 0.077 30.0 1513 \n", "2399 2399 0.069 69.0 1512 \n", "2400 2400 0.060 50.0 1511 \n", "2401 2401 0.042 NaN 1345 \n", "2402 2402 0.082 NaN 1316 \n", "2403 2403 0.055 NaN 1045 \n", "2404 2404 0.075 NaN 1035 \n", "2405 2405 0.067 45.0 928 \n", "2406 2406 0.052 NaN 807 \n", "2407 2407 0.055 NaN 620 \n", "2408 2408 0.055 40.0 145 \n", "2409 2409 0.052 NaN 84 \n", "\n", " name \\\n", "0 Pub Beer \n", "1 Devil's Cup \n", "2 Rise of the Phoenix \n", "3 Sinister \n", "4 Sex and Candy \n", "5 Black Exodus \n", "6 Lake Street Express \n", "7 Foreman \n", "8 Jade \n", "9 Cone Crusher \n", "10 Sophomoric Saison \n", "11 Regional Ring Of Fire \n", "12 Garce Selé \n", "13 Troll Destroyer \n", "14 Bitter Bitch \n", "15 Ginja Ninja \n", "16 Cherried Away \n", "17 Rhubarbarian \n", "18 BrightCider \n", "19 He Said Baltic-Style Porter \n", "20 He Said Belgian-Style Tripel \n", "21 Lower De Boom \n", "22 Fireside Chat \n", "23 Marooned On Hog Island \n", "24 Bitter American \n", "25 Hell or High Watermelon Wheat (2009) \n", "26 Hell or High Watermelon Wheat (2009) \n", "27 21st Amendment Watermelon Wheat Beer (2006) \n", "28 21st Amendment IPA (2006) \n", "29 Brew Free! or Die IPA (2008) \n", "... ... \n", "2380 P-51 Porter \n", "2381 #001 Golden Amber Lager \n", "2382 #002 American I.P.A. \n", "2383 #003 Brown & Robust Porter \n", "2384 #004 Session I.P.A. \n", "2385 Tarasque \n", "2386 Ananda India Pale Ale \n", "2387 Tiny Bomb \n", "2388 Train Hopper \n", "2389 Edward’s Portly Brown \n", "2390 Troopers Alley IPA \n", "2391 Wolverine Premium Lager \n", "2392 Woodchuck Amber Hard Cider \n", "2393 4000 Footer IPA \n", "2394 Summer Brew \n", "2395 Be Hoppy IPA \n", "2396 Worthy IPA \n", "2397 Easy Day Kolsch \n", "2398 Lights Out Vanilla Cream Extra Stout \n", "2399 Worthy IPA (2013) \n", "2400 Worthy Pale \n", "2401 Patty's Chile Beer \n", "2402 Colorojo Imperial Red Ale \n", "2403 Wynkoop Pumpkin Ale \n", "2404 Rocky Mountain Oyster Stout \n", "2405 Belgorado \n", "2406 Rail Yard Ale \n", "2407 B3K Black Lager \n", "2408 Silverback Pale Ale \n", "2409 Rail Yard Ale (2009) \n", "\n", " style brewery_id ounces \n", "0 American Pale Lager 408 12.0 \n", "1 American Pale Ale (APA) 177 12.0 \n", "2 American IPA 177 12.0 \n", "3 American Double / Imperial IPA 177 12.0 \n", "4 American IPA 177 12.0 \n", "5 Oatmeal Stout 177 12.0 \n", "6 American Pale Ale (APA) 177 12.0 \n", "7 American Porter 177 12.0 \n", "8 American Pale Ale (APA) 177 12.0 \n", "9 American Double / Imperial IPA 177 12.0 \n", "10 Saison / Farmhouse Ale 177 12.0 \n", "11 Saison / Farmhouse Ale 177 12.0 \n", "12 Saison / Farmhouse Ale 177 12.0 \n", "13 Belgian IPA 177 12.0 \n", "14 American Pale Ale (APA) 177 12.0 \n", "15 Cider 154 12.0 \n", "16 Cider 154 12.0 \n", "17 Cider 154 12.0 \n", "18 Cider 154 12.0 \n", "19 Baltic Porter 368 12.0 \n", "20 Tripel 368 12.0 \n", "21 American Barleywine 368 8.4 \n", "22 Winter Warmer 368 12.0 \n", "23 American Stout 368 12.0 \n", "24 American Pale Ale (APA) 368 12.0 \n", "25 Fruit / Vegetable Beer 368 12.0 \n", "26 Fruit / Vegetable Beer 368 12.0 \n", "27 Fruit / Vegetable Beer 368 12.0 \n", "28 American IPA 368 12.0 \n", "29 American IPA 368 12.0 \n", "... ... ... ... \n", "2380 American Porter 509 16.0 \n", "2381 American Amber / Red Lager 211 12.0 \n", "2382 American IPA 211 12.0 \n", "2383 American Porter 211 12.0 \n", "2384 American IPA 211 12.0 \n", "2385 Saison / Farmhouse Ale 239 12.0 \n", "2386 American IPA 239 12.0 \n", "2387 American Pilsner 239 12.0 \n", "2388 American IPA 14 12.0 \n", "2389 American Brown Ale 14 12.0 \n", "2390 American IPA 344 12.0 \n", "2391 American Pale Lager 402 12.0 \n", "2392 Cider 501 12.0 \n", "2393 American IPA 109 12.0 \n", "2394 American Pilsner 109 12.0 \n", "2395 American IPA 339 16.0 \n", "2396 American IPA 199 12.0 \n", "2397 Kölsch 199 12.0 \n", "2398 American Double / Imperial IPA 199 12.0 \n", "2399 American IPA 199 12.0 \n", "2400 American Pale Ale (APA) 199 12.0 \n", "2401 Chile Beer 424 12.0 \n", "2402 American Strong Ale 424 12.0 \n", "2403 Pumpkin Ale 424 12.0 \n", "2404 American Stout 424 12.0 \n", "2405 Belgian IPA 424 12.0 \n", "2406 American Amber / Red Ale 424 12.0 \n", "2407 Schwarzbier 424 12.0 \n", "2408 American Pale Ale (APA) 424 12.0 \n", "2409 American Amber / Red Ale 424 12.0 \n", "\n", "[2410 rows x 8 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "beers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect the table above. The first column is a numbering scheme for the beers. The other columns contain the following data:\n", "\n", "- `abv`: Alcohol-by-volume of the beer.\n", "- `ibu`: International Bittering Units of the beer.\n", "- `id`: Unique identifier of the beer.\n", "- `name`: Name of the beer.\n", "- `style`: Style of the beer.\n", "- `brewery_id`: Unique identifier of the brewery.\n", "- `ounces`: Ounces of beer in the can." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Explore the data\n", "\n", "In the field of statistics, [Exploratory Data Analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis) (EDA) has the goal of summarizing the main features of our data, and seeing what the data can tell us without formal modeling or hypothesis-testing. [2]\n", "\n", "Let's start by extracting the columns with the `abv` and `ibu` values, and converting them to NumPy arrays. One of the advantages of data frames in `pandas` is that we can access a column simply using its header, like this:\n", "\n", "```python\n", "data_frame['name_of_column']\n", "```\n", "
\n", "\n", "The output of this action is a `pandas Series`. From the documentation: \"a `Series` is a 1-dimensional labeled array capable of holding any data type.\" [4]\n", "\n", "Check the type of a column extracted by header:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(beers['abv'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, you can index and slice a data series like you know how to do with strings, lists and arrays. Here, we display the first ten elements of the `abv` series:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.050\n", "1 0.066\n", "2 0.071\n", "3 0.090\n", "4 0.075\n", "5 0.077\n", "6 0.045\n", "7 0.065\n", "8 0.055\n", "9 0.086\n", "Name: abv, dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "beers['abv'][:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect the data in the table again: you'll notice that there are `NaN` (not-a-number) elements in both the `abv` and `ibu` columns. Those values mean that there was no data reported for that beer. A typical task when cleaning up data is to deal with these pesky `NaN`s.\n", "\n", "Let's extract the two series corresponding to the `abv` and `ibu` columns, clean the data by removing all `NaN` values, and then access the values of each series and assign them to a NumPy array. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "abv_series = beers['abv']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2410" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(abv_series)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another advantage of `pandas` is that it has the ability to handle missing data. The data-frame method `dropna()` returns a new data frame with only the good values of the original: all the null values are thrown out. This is super useful!" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "abv_clean = abv_series.dropna()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check out the length of the cleaned-up `abv` data; you'll see that it's shorter than the original. `NaN`s gone!" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2348" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(abv_clean)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember that a a `pandas` _Series_ consists of a column of values, and their labels. You can extract the values via the [`series.values`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.values.html) attribute, which returns a `numpy.ndarray` (multidimensional array). In the case of the `abv_clean` series, you get a one-dimensional array. We save it into the variable name `abv`. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "abv = abv_clean.values" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.05 0.066 0.071 ... 0.055 0.055 0.052]\n" ] } ], "source": [ "print(abv)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(abv)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we repeat the whole process for the `ibu` column: extract the column into a series, clean it up removing `NaN`s, extract the series values as an array, check how many values we lost." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2410" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ibu_series = beers['ibu']\n", "\n", "len(ibu_series)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1405" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ibu_clean = ibu_series.dropna()\n", "\n", "ibu = ibu_clean.values\n", "\n", "len(ibu)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise:_\n", "\n", "Write a Python function that calculates the percentage of missing values for a certain data series. Use the function to calculate the percentage of missing values for the `abv` and `ibu` data sets. \n", "\n", "For the original series, before cleaning, remember that you can access the values with `series.values` (e.g., `abv_series.values`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Important:_\n", "\n", "Notice that in the case of the variable `ibu` we are missing almost 42% of the values. This is important, because it will affect our analysis. When we do descriptive statistics, we will ignore these missing values, and having 42% missing will very likely cause bias." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Ready, stats, go!\n", "\n", "\n", "Now that we have NumPy arrays with clean data, let's see how we can manipulate them to get some useful information. \n", "\n", "Focusing on the numerical variables `abv` and `ibu`, we'll walk through some \"descriptive statistics,\" below. In other words, we aim to generate statistics that summarize the data concisely." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Maximum and minimum \n", "\n", "The maximum and minimum values of a dataset are helpful as they tell us the _range_ of our sample: the range gives some indication of the _variability_ in the data.\n", "We can obtain them for our `abv` and `ibu` arrays with the `min()` and `max()` functions from NumPy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**abv**" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "abv_min = numpy.min(abv)\n", "abv_max = numpy.max(abv)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The minimum value for abv is: 0.001\n", "The maximum value for abv is: 0.128\n" ] } ], "source": [ "print('The minimum value for abv is: ', abv_min)\n", "print('The maximum value for abv is: ', abv_max)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**ibu**" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "ibu_min = numpy.min(ibu)\n", "ibu_max = numpy.max(ibu)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The minimum value for ibu is: 4.0\n", "The maximum value for ibu is: 138.0\n" ] } ], "source": [ "print('The minimum value for ibu is: ', ibu_min)\n", "print('The maximum value for ibu is: ', ibu_max)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean value\n", "\n", "The **mean** value is one of the main measures to describe the central tendency of the data: an indication of where's the \"center\" of the data. If we have a sample of $N$ values, $x_i$, the mean, $\\bar{x}$, is calculated by:\n", "\n", "\\begin{equation*}\n", " \\bar{x} = \\frac{1}{N}\\sum_{i} x_i\n", "\\end{equation*}\n", "\n", "In words, that is the sum of the data values divided by the number of values, $N$. \n", "\n", "You've already learned how to write a function to compute the mean in [Module 1 Lesson 5](http://go.gwu.edu/engcomp1lesson5), but you also learned that NumPy has a built-in `mean()` function. We'll use this to get the mean of the `abv` and `ibu` values." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "abv_mean = numpy.mean(abv)\n", "ibu_mean = numpy.mean(ibu)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll print these two variables, but we'll use some fancy new way of printing with Python's string formatter, `string.format()`. There's a sweet site dedicated to Python's string formatter, called [PyFormat](https://pyformat.info), where you can learn lots of tricks!\n", "\n", "The basic trick is to use curly brackets `{}` as placeholder for a variable value that you want to print in the middle of a string (say, a sentence that explains what you are printing), and to pass the variable name as argument to `.format()`, preceded by the string.\n", "\n", "Let's try something out…" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean value for abv is 0.059773424190800686 and for ibu 42.71316725978647\n" ] } ], "source": [ "print('The mean value for abv is {} and for ibu {}'.format(abv_mean, ibu_mean))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ugh! That doesn't look very good, does it? Here's where Python's string formatting gets fancy. We can print fewer decimal digits, so the sentence is more readable. For example, if we want to have four decimal digits, we specify it this way:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean value for abv is 0.0598 and for ibu 42.7132\n" ] } ], "source": [ "print('The mean value for abv is {:.4f} and for ibu {:.4f}'.format(abv_mean, ibu_mean))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inside the curly brackets—the placeholders for the values we want to print—the `f` is for `float` and the `.4` is for four digits after the decimal dot. The colon here marks the beginning of the format specification (as there are options that can be passed before). There are so many tricks to Python's string formatter that you'll usually look up just what you need.\n", "Another useful resource for string formatting is the [Python String Format Cookbook](https://mkaz.blog/code/python-string-format-cookbook/). Check it out!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variance and standard deviation\n", "\n", "While the mean indicates where's the center of your data, the **variance** and **standard deviation** describe the *spread* or variability of the data. We already mentioned that the _range_ (difference between largest and smallest data values) is also an indication of variability. But the standard deviation is the most common measure of variability.\n", "\n", "We really like the way [Prof. Kristin Sainani](https://profiles.stanford.edu/kristin-sainani), of Stanford University, presents this in her online course on [Statistics in Medicine](https://www.youtube.com/@StatsSpring2013). In her lecture \"Describing Quantitative Data: What is the variability in the data?\", available [on YouTube](https://youtu.be/hlFeEQF5tDc), she asks: _What if someone were to ask you to devise a statistic that gives the avarage distance from the mean?_ Think about this a little bit.\n", "\n", "The distance from the mean, for any data value, is $x_i - \\bar{x}$. So what is the average of the distances from the mean? If we try to simply compute the average of all the values $x_i - \\bar{x}$, some of which are negative, you'll just get zero! It doesn't work.\n", "\n", "Since the problem is the negative distances from the mean, you might suggest using absolute values. But this is just mathematically inconvenient. Another way to get rid of negative values is to take the squares. And that's how we get to the expression for the _variance_: it is the average of the squares of the deviations from the mean. For a set of $N$ values,\n", "\n", "\\begin{equation*}\n", " \\text{var} = \\frac{1}{N}\\sum_{i} (x_i - \\bar{x})^2\n", "\\end{equation*}\n", "\n", "\n", "The variance itself is hard to interpret. The problem with it is that the units are strange (they are the square of the original units). The **standard deviation**, the square root of the variance, is more meaningful because it has the same units as the original variable. Often, the symbol $\\sigma$ is used for it:\n", "\n", "\\begin{equation*} \n", " \\sigma = \\sqrt{\\text{var}} = \\sqrt{\\frac{1}{N}\\sum_{i} (x_i - \\bar{x})^2}\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sample vs. population\n", "\n", "The above definitions are used when $N$ (the number of values) represents the entire population. But if we have a _sample_ of that population, the formulas have to be adjusted: instead of dividing by $N$ we divide by $N-1$. This is important, especially when we work with real data since usually we have samples of populations. \n", "\n", "The **standard deviation** of a sample is denoted by $s$, and the formula is:\n", "\n", "\\begin{equation*} \n", " s = \\sqrt{\\frac{1}{N-1}\\sum_{i} (x_i - \\bar{x})^2}\n", "\\end{equation*}\n", "\n", "Why? This gets a little technical, but the reason is that if you have a _sample_ of the population, you don't know the _real_ value of the mean, and $\\bar{x}$ is actually an _estimate_ of the mean. That's why you'll often find the symbol $\\mu$ used to denote the population mean, and distinguish it with the sample mean, $\\bar{x}$. Using $\\bar{x}$ to compute the standard deviation introduces a small bias: $\\bar{x}$ is computed _from the sample values_, and the data are on average (slightly) closer to $\\bar{x}$ than the population is to $\\mu$. Dividing by $N-1$ instead of $N$ corrects this bias!\n", "\n", "Prof. Sainani explains it by saying that we lost one degree of freedom when we estimated the mean using $\\bar{x}$. For example, say we have 100 people and I give you their mean age, and the actual age for 99 people from the sample: you'll be able to calculate the age of that 100th person. Once we calculated the mean, we only have 99 degrees of freedom left because that 100th person's age is fixed. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's code!\n", "\n", "Now that we have the math sorted out, we can program functions to compute the variance and the standard deviation. In our case, we are working with samples of the population of craft beers, so we need to use the formulas with $N-1$ in the denominator. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "def sample_var(array):\n", " \"\"\" Calculates the variance of an array that contains values of a sample of a \n", " population. \n", " \n", " Arguments\n", " ---------\n", " array : array, contains sample of values. \n", " \n", " Returns\n", " -------\n", " var : float, variance of the array .\n", " \"\"\"\n", " \n", " sum_sqr = 0 \n", " mean = numpy.mean(array)\n", " \n", " for element in array:\n", " sum_sqr += (element - mean)**2\n", " \n", " N = len(array)\n", " var = sum_sqr / (N - 1)\n", " \n", " return var\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that we used `numpy.mean()` in our function: do you think we can make this function even more Pythonic? \n", "\n", "*Hint:* Yes!, we totally can.\n", "\n", "_Exercise:_\n", "\n", "Re-write the function `sample_var()` using `numpy.sum()` to replace the `for`-loop. Name the function `var_pythonic`.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have the sample variance, so we take its square root to get the standard deviation. We can make it a function, even though it's just one line of Python, to make our code more readable:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "def sample_std(array):\n", " \"\"\" Computes the standard deviation of an array that contains values\n", " of a sample of a population.\n", " \n", " Arguments\n", " ---------\n", " array : array, contains sample of values. \n", " \n", " Returns\n", " -------\n", " std : float, standard deviation of the array.\n", " \"\"\"\n", " \n", " std = numpy.sqrt(sample_var(array))\n", " \n", " return std" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's call our brand new functions and assign the output values to new variables:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "abv_std = sample_std(abv)\n", "ibu_std = sample_std(ibu)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we print these values using the string formatter, only printing 4 decimal digits, we can display our descriptive statistics in a pleasant, human-readable way." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The standard deviation for abv is 0.0135 and for ibu 25.9541\n" ] } ], "source": [ "print('The standard deviation for abv is {:.4f} and for ibu {:.4f}'.format(abv_std, ibu_std))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These numbers tell us that the `abv` values are quite concentrated around the mean value, while the `ibu` values are quite spread out from their mean. How could we check these descriptions of the data? A good way of doing so is using graphics: various types of plots can tell us things about the data. \n", "\n", "We'll learn about _histograms_ in this lesson, and in the following lesson we'll explore _box plots_. " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "source": [ "## Step 4: Distribution plots \n", "\n", "Every time that we work with data, visualizing it is very useful. Visualizations give us a better idea of how our data behaves. One way of visualizing data is with a frequency-distribution plot known as **histogram**: a graphical representation of how the data is distributed. To make a histogram, first we need to \"bin\" the range of values (divide the range into intervals) and then we count how many data values fall into each interval. The intervals are usually consecutive (not always), of equal size and non-overlapping. \n", "\n", "Thanks to Python and Matplotlib, making histograms is easy. We recommend that you always read the documentation, in this case about [histograms](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.hist.html). We'll show you here an example using the `hist()` function from `pyplot`, but this is just a starting point. \n", "\n", "Let's import the libraries that we need for plotting, as you learned in [Module 1 Lesson 5](http://go.gwu.edu/engcomp1lesson5), then study the plotting commands used below. Try changing some of the plot options and seeing the effect." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot\n", "%matplotlib inline\n", "\n", "#set font styles\n", "pyplot.rc('font', family='serif', size=16)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmcAAAFaCAYAAABbvvr/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAHU9JREFUeJzt3X+4ZmVd7/H3Z9o4ojBdEENgOAwykFBE5ExKJslRwwQ1sl+alIkO5S9QEcnIyEMIiJVHM4ewY4WntDgnCQ4HRVGIM+EMKnJCwxFQNAMkcNAOAwPf/lhr5+M4wzwze6+975nn/bqufe217met9Xyfmz3sz77v9SNVhSRJktqwYL4LkCRJ0rcZziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTNLGSrExyW5L3zXctkjRtar4LkKT5UlUXJHkcsHS+a5GkaY6cSZIkNcRwJmmnl+R3kqxO8rEkl/ajZaN2TfLn/TbXJDkgyZIkNye5J8lb++Ocl+TuJKfPw8eQNCEMZ5Imwb3AT1TVfwH+Fjh3k9efAZxeVUcClwPvr6ovAy+g+//kW/rtzgMuqapz5qZsSZPIc84kTYLbgauSLAAWAY/a5PVrq+pr/fJfAr+fZElV3ZjkVuBngb8Cfhn467kqWtJkcuRM0k4tyUHAB4E3VNVRwCnAYzbZ7J6R5bv77/v23/8C+NV++ZnAlQOVKkmA4UzSzu8IYH1VrenXd9nMNnuOLO/Vf58eSXs/cHSSo4Gbq+qhYcqUpI7hTNLObh2wR5KD+/Vnb2abo5JMj5T9KrC6P+eMqroD+BhwEd2UpyQNKlU13zVI0qCS/FfgBOAG4F+BX6Ob6vwH4E3Ap4EHgP2BB4Ffq6pbRvb/JboLBo6Y49IlTSDDmSRJUkOc1pQkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGjI13wVsr7322quWLl0632VIkiRt1fXXX//1qlo8zrY7bDhbunQpa9eune8yJEmStirJl8bd1mlNSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFM0qzYsPHhneI9JGm+7bAPPpfUloVTC1ixat2g77HmpGWDHl+SWuDImSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDZka6sBJ/hG4v199qKqekWRP4BzgFuAg4E1VdUe//RuARcAewIer6pKhapMkSWrVYOEM+D9VdeYmbWcDV1bVB5M8FzgfOCHJk4Gjq+o5SXYBbkpydVXdO2B9kiRJzRlyWvOwJG9McmaSY/u2Y4HV/fK1/TrAcdPtVfUg8DngqAFrkyRJatKQI2fnVtUnk3wPcHWS+4C9gfv619cDeySZ6ts/N7Lv+r7tOyRZCawEWLJkyYClS5IkzY/BRs6q6pP994eAa4CjgTuB3ftNFgH3VNXGTdqnX7tzM8e8oKqWV9XyxYsXD1W6JEnSvBkknCV5YpITR5oOAtYBlwFH9m1P7dcBLp1u70fSDgWuHqI2adJs2PjwfJcgSdoGQ01rrgeOS/I4ulGw24G/Ai4Hzk1yMHAgcCpAVV2X5KokZ9Ndrfk6LwaQZsfCqQWsWLVu8PdZc9Kywd9DkibBIOGsqv4FOH4zL/0b8PIt7PO2IWqRJEnakXgTWkmSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhoyNeTBk+wKXAd8uKpOTfJo4Hzgq8BBwDlVdXO/7YuBI4CHgC9W1aoha5MkSWrRoOEMOAv49Mj6KcCXq+q8JIcB7wWelmQ/4FTgiKqqJGuSfKyqvjBwfZIkSU0ZbFozyQnAtcCtI83HAqsBqupG4PAki4BjgOurqvrtVgM/M1RtkiRJrRoknCU5FDikqv7nJi/tDdw3sr6+b9tS+6bHXZlkbZK1d9111yxXLUmSNP+GGjk7Hrg/yenATwI/nuQU4E5g95HtFvVtW2r/DlV1QVUtr6rlixcvHqh0SZKk+TPIOWdV9fvTy/1FALtV1R/1y0cC1/TnnN1QVeuTXAG8Okn6qc0jgXcOUZskSVLLhr5a8wXAUcCjkrwQeAdwfpIzgGXAiQBV9ZUk5wN/mOQh4EIvBpAkSZNo0HBWVRcDF2/S/MotbHsRcNGQ9UiSJLXOm9BKkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkPGCmdJjhq6EEmSJI0/cvb2JKcl+b5Bq5EkSZpwU2Nu9xrgq8DJSXYF/rKqPjtcWZI0PzZsfJiFU8Of8TFX7yNpxzNuOLu9qr6S5BPAG4Fn98sfqKprhitPkubWwqkFrFi1bvD3WXPSssHfQ9KOadxwdlGSPYDbgHOr6qMASc4DDGeSJEmzZNxwtgE4vqpumW5I8ijgBwapSpIkaUKNe8LDbwMFkOTgJN9TVQ9U1a8MV5okSdLkGTec/Rbw+H55X+DsYcqRJEmabOOGs3+sqqsBquoTwD3DlSRJkjS5xg1nS5NMAfTflwxXkiRJ0uQa94KAK4Bbk9wN7Am8criSJEmSJtdY4ayqLklyNbAMWFdV9w5bliRJ0mTalttTB7gTWJTkzGHKkSRJmmxjjZwleS/wZLpwFrpzzs4crixJkqTJNO45Z7tV1Q9PryR5+jDlSJIkTbZxpzXXJNltZH2PIYqRJEmadOOGs1cDdya5NcmtwIUD1iRJkjSxxg1nf1VVj6mqA6rqAOC0IYuSJEmaVGOFs6o6PcmCJHslSVW9d+jCJEmSJtFY4SzJTwO3AH8GvCjJSYNWJUmSNKHGvVrzucATgZOr6v1JznukjZMsAP4euA54FHAg8FJgV+AcuqB3EPCmqrqj3+cNwCK6iw0+XFWXbPvHkSRJ2rGNG86+UlX3J6l+fZwnBKyuqrMAknwI+DngacCVVfXBJM8FzgdOSPJk4Oiqek6SXYCbklztkwgkSdKkGfeCgIOTnA4cmuRVwH6PtHFVPTwSzKb67f8ZOBZY3W92bb8OcNx0e1U9CHwOOGobPockSdJOYdxwdgrdlONewD6MebVmkmOAS4FLq2otsDdwX//yemCPPryNtk+/tvdmjrcyydoka++6664xS5ckSdpxjHu15n1V9aaqOq6qzmArI2cj+11RVc8GDkjyCrrHP+3ev7wIuKeqNm7SPv3anZs53gVVtbyqli9evHicEiTtRDZsfHi+S5CkwY37bM03b9J0FPDMR9j+UOCAqrqsb7oVeAJwGXAkcDvw1H4dutG13+33nQIOBa4e7yNImhQLpxawYtW6Qd9jzUnLBj2+JG3NuBcEHAH8Xb+8BPj8VrbfAJyY5AhgF+AQ4DXAA8C5SQ6mu4LzVICqui7JVUnOprta83VeDCBJkibRuOFsZVX950le/UUBW1RVX6S7OnNzXr6Ffd42Zi2SJEk7rXHD2SFJDumXFwHPAt41TEmSJEmTa9xw9g7g00Dorqo8a7CKJEmSJti44ew3quq6QSuRJEnS2OHsF5L8C93I2ahXVNXps1yTJEnSxBo3nD0b+HngS8BS4Ot0N4pdAhjOJEmSZsm4Twj4AN19y34KOAD466o6Gnj1YJVJkiRNoHHD2fdWVUH33Ezgcf3y/x6qMEmSpEk07rTm9yf5Y+ALwMF0t9OQJEnSLBt35OxE4J+Ag/rvJw5WkSRJ0gQba+Ssqh5I8kFgH+C2qtowbFmSJEmTaayRsyQvBq4B3gw8I8lvD1qVJEnShBp3WvPwqjoEuL6qPgQ8ZsCaJEmSJta44ewb/ffqvzutKUmSNIBtuVrzPcC+Sc7n2yFNkiRJs2jckbNTgE8BtwP/DLxxsIokSZIm2LgjZ9fSPUfzgiGLkSRJmnTjjpx9tqo+Nb2S5PsGqkeSJGmijRvObk/y7CT7J1mC05qSJEmDGHda8xXA50fWlwCnzX45kiRJk+0Rw1mSPwCuBH6rqt430v7MgeuSJEmaSFub1twAfBQ4PMkfJFkGUFVXDl6ZJEnSBNpaOLu/f47mqcCCqlo3BzVJkiRNrK2FswKoqoeAh6cbk7xgyKIkSZIm1dYuCDgmyW798tOSnNcvPwW4eLiyJEmSJtPWwtkDwLf65UtH2h8cphxJkqTJtrVwdlpVrdm0McmTBqpHkiRpoj3iOWebC2Z9+/XDlCNJkjTZxn1CgCRJkuaA4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIZs7cHn2yXJgcBZwKeA/YC7q+otSfYEzgFuAQ4C3lRVd/T7vAFYBOwBfLiqLhmiNkmSpJYNEs6APYG/rqoPASS5KcllwMuBK6vqg0meC5wPnJDkycDRVfWcJLsANyW5uqruHag+SZKkJg0yrVlVa6aD2cj7fAs4Fljdt13brwMcN91eVQ8CnwOOGqI2qSUbNj483yVIkhoz1MjZf0pyPHBFVX0+yd7Aff1L64E9kkwBe9MFMkZe23szx1oJrARYsmTJoHVLc2Hh1AJWrFo36HusOWnZoMeXJM2uQS8ISHI0cDTw2r7pTmD3fnkRcE9Vbdykffq1Ozc9XlVdUFXLq2r54sWLhytckiRpngwWzpIcCxwDnAzsk+RI4DLgyH6Tp/brAJdOt/cjaYcCVw9VmyRJUquGulrzScAHgLXAVcBjgT8G3gScm+Rg4EDgVICqui7JVUnOprta83VeDCBJkibRIOGsqq4HdtvCyy/fwj5vG6IWSZKkHYk3oZUkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0maBxs2PrxTvIek2Tc13wVI0iRaOLWAFavWDfoea05aNujxJQ3DkTNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaMjXEQZPsA5wFHF5VK/q2RwPnA18FDgLOqaqb+9deDBwBPAR8sapWDVGXJE2SDRsfZuHU8H+Dz9X7SJNikHAG/CTwIeBHR9pOAb5cVeclOQx4L/C0JPsBpwJHVFUlWZPkY1X1hYFqk6SJsHBqAStWrRv8fdactGzw95AmySB/6lTV3wL3bdJ8LLC6f/1G4PAki4BjgOurqvrtVgM/M0RdkiRJrZvLcei9+c7Atr5v21L7d0myMsnaJGvvuuuuwQqVJEmaL3MZzu4Edh9ZX9S3ban9u1TVBVW1vKqWL168eLBCJUmS5stchrPLgCMB+nPObqiq9cAVwJOSpN/uSODyOaxLkiSpGYOEsyQ/BZwA7JvkjCS7Au8A9k9yBvB64ESAqvoK3VWcf5jk7cCFXgwgSZIm1SBXa1bVJ4BPbOalV25h+4uAi4aoRZIkaUfijWkkSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMktS8DRsf3ineQxrH1HwXIEnS1iycWsCKVesGfY81Jy0b9PjSuBw5kyRJaojhTJIkqSGGM0nSjHiuljS7POdMkjQjng8mzS5HziRJkhpiOJMkibmbnnUaWFvjtKYkSczN9Cw4Rautc+RMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTNoCr6iSJM0Hr9aUtsAba0qS5oMjZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNaSpxzcleSbwc8CdQFXV781zSWrQho0Ps3DKvyskSTunZsJZkscA7wF+qKo2JLk4yTOq6qPzXZvaMhfPvASfeylJmh8tDT8cCXypqjb069cCx85jPUA3SrMzvIckqQ07y++Vufrddf9O9FnGlaqa7xoASPJC4Jeq6mf79ZcBT6+qF49ssxJY2a/+IPDPA5SyF/D1AY47Key/mbH/tp99NzP238zYfzMzCf23f1UtHmfDZqY16c4z231kfVHf9p+q6gLggiGLSLK2qpYP+R47M/tvZuy/7WffzYz9NzP238zYf9+ppWnN1cD+SRb2608FLpvHeiRJkuZcMyNnVfXvSX4T+G9J7gI+68UAkiRp0jQTzgCq6iPAR+a5jEGnTSeA/Tcz9t/2s+9mxv6bGftvZuy/Ec1cECBJkqS2zjmTJEmaeE1Naw5pa08fSPJo4Hzgq8BBwDlVdXP/2ouBI4CHgC9W1aq5rL0F29t/SVYApwCfprv9ySer6k/ntPgGzOTnr399b7o+fGtVvWvOCm/EDP/9PgV4FvAwcDTw61V1+xyWP+9m2H9/ADxI98f8Y4BXV1VbN4Ua2DhPr0nyi8BbgZOr6tJt2Xdnt739l+RA4CzgU8B+wN1V9ZY5K3w+VdVO/0X3P5R1wMJ+/WLgGZtsczpwWr98GHBNv7wf8Bm+PQW8Bjhovj/TDtR/zwN+vF/eBbgH2Gu+P9OO0n/9+gLgT4EPAa+a78+zI/Uf3S15Lh7Z7gnAY+f7M+1A/fdk4IaR7W4Anjrfn6nB/juALvh/HDhuW/bd2b9m2H8rgOePrN8EPGm+P9NcfE3KtOY4Tx84lu52HlTVjcDhSRYBxwDXV/+T0W/zM8OX3JTt7r+quqSqPjmy3Ua6v8InyUx+/gDeCFxIF2wn0Uz67znAN5O8LsmbgR+rqm/NUd2tmEn/3Q3slmQqyRRQwK1zU3Yzttp/VXVrVV21PftOgO3uv6paU1UfGmlaAEzEv99JmdbcG7hvZH193zbONuPsu7ObSf+tH2l7FXB2VX1jiCIbtt39l+RJwL9X1XX9rWYm0Ux+/vanG/15Gd1pCVcluXsLv0h3Vtvdf1W1LskFwN/QTQtfCdw1YK0tmsnvAH9/zFIfJDkeuKKqPj9bhbVsUkbOtvr0gUfYZpx9d3Yz6T8AkryIbjrpD4cqsmEz6b/nA7smOZ1uuulZSX59wFpbNJP+Ww98uqoerO48qdXATw1Ya4u2u/+SPA84uqqOr6oX0E0/vXzIYhs0k98B/v6YhT5IcjTdtOdrZ7Gupk1KONvs0weS7DkydXQZ3fArSQ6jO89iPXAF8KQk6bc7Erh87kpvwkz6b/o5qXtX1VlJDkty8BzXP9+2u/+q6pSqOqeqzgFuBD5SVf99rj/APJvJz99VwNKRY+0P3MxkmUn/PR7415FjfQ149NyU3Yxx+m+b9h2ozlbNpP9Icizd6UUnA/skOXK4UtsxMfc5S/Is4OfphuQfrKrfS3Ie8G9VdU6SXemuVvoasIxu+m30as3ldNMiN9dkXq25Xf2X5PnAX9BdaQjwfXRXe318zj/EPJrJz1+//0vppoW/Cry7qibqD4QZ/vt9BV1AexDYFTi1Ju9qw+399/tYYBXwJbr//x0A/Maknbc3Rv8F+G3gROAfgIuq6oot7TsvH2IebW//9ad1fAJY2x/qscAfV9X75vxDzLGJCWeSJEk7gkmZ1pQkSdohGM4kSZIaYjiTJElqiOFMkiSpIYYzSWpcku+Z5eP5/36pYf4DlfSIkixK8s3+snaS/GSSTyV5+nYc69gktyZZOub2j09ycZIzN/PaeUk+vh017Jfk75Os6x+KPt1+TpLLt1TbttY+W/r7BD77EV7fpv8efTB7e5L9ZqlESbPMcCZpa36F7qHrKwGq6h+Az27PgarqMrp7Zo27/e3A32/h5XdvZw1fAV4BfD8w+iiYe4FXVtVtW9hvm2qfDUmOAp7Sv/dmbet/j/4eb2+ju/+gpAZNyrM1JW2/g+gem3JTktdX1TdHX0zyVOAldHfeXwGcX1WfTPI7wC50fwQ+UFVvGdntF5M8ATgEeG5VrU9yCPD6/jhPBN5WVZ/bSm2LkpwK7EsXtk4E3gq8ADgBWAd8AHh/Vb1neqequj3JJ/pt3tnfBPOAqrql/0wrgYPpAtti4HVV9dDIZ34K8B7gFOAm4ALgM1V1ZpJzgRfShcenAZ/pj7MC+GZVvbQ/xvF0I2K30D254HVVdf8mn28l8Lcj7/undDci3g34WlW9fWTb5/R3T39K/953A/8DuLaqfj3JrwEnAS+qqtuSLEzyI1W1XUFb0oCqyi+//PJrs190Dw0/vl++CFjZL78PeDoQ4CvA9/ftP9zvcwxw6chxLgd+ul/+OPCsfvldwAv65dXA8pH3/b/98kuAMzdT21Lgy8CCfv1PgFfShcGbgcf19b17C5/tecCN/fJPA7/QLx8CfHZkuz8Z+dwfB5aO9sHmagT+P12AWkAXkg7s22+ge0rGHnR349+1bz8TOHkzNX4aWDGy/vyR5c8Au4/U8rJ+eR+6Ry6F7oHv7+rbXwQcNbL/B4EXzvfPmF9++fXdX05rSnokvwT8WH/O1wa6kZdRewGLquoOgKr6f1V1HfAjdCNC09YBh2+yDvB1vv1Q5NF9Nt1+S26rbz+KaR3wQ/36Krqpy+ew5WfhXgbskeQn6B4w/3d9+w8Dtz1C7eO4o6q+2ddyX1V9sW+/h+7zLgMKODndQ+33pBtl3NRCYOPI+r5Jzu73WUQX9KbdAlBV/0r3mJvFwPuB45LsThfMrh7ZfvpxVpIa47SmpM3qH0p8b41MRyb54vSFAb2vA99IsndV3ZnkR+hGjG6gG1mbdhBwycj65p4bdwNwIPBv/fafGaPM/ZMs6EPQwXQPhwd4L/ApuunOl29ux6p6KMmfAWcA11fVg/1LN9I9Q3K09us3c4j76AISwJIxah21Drifbgp4Y5ID6Ub6NnU7XXAjyeHAaVX1hH79eZts+wTgY0n2Bf4duKuqKsnfAO8Ertlk+z3pRh4lNcZwJum7JFlId07VAyNtT6QLTm8DfpTuQdhrgF8G3prkC3QB44zqziF7SpK30k2vra6qj/QPQN4feGmS9wFHAYcluYzufLHT+uP8IPCyJI8Hnks3wnVoVd00UuYrgG8Bb06yJ92I1IUAVXVvksuBm+uRH3J+IfBbwGumG6rq80nemeSPgG/0ffBnSY7ta/8N4HTgL4Hf6a/e3Ad4YpJDgZ8Avrc/p4x++SV0FxPsTzf9eEaS1wLvSHIHsB/w5s3U97/oziH7KN1U7eeSXEh3IcMP9P34YbpRx/v7KzBXAC+pqukA/G7gOuA3pw+aZKqvZXQkTVIjfPC5pJ1KkoVVtaE/Mf/3q2r9fNe0vfoQ9efA71bVuq1tvw3HfTPdBQyXbHVjSXPOkTNJO5vT+5G/23fkYAbQT3n+Kt3Vq7Oiv6HtxVX1T7N1TEmzy5EzSZKkhni1piRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkN+Q9LoA/vrEeZ/wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#You can set the size of the figure by doing:\n", "pyplot.figure(figsize=(10,5))\n", "\n", "#Plotting\n", "pyplot.hist(abv, bins=20, color='#3498db', histtype='bar', edgecolor='white') \n", "#The \\n is to leave a blank line between the title and the plot\n", "pyplot.title('abv \\n')\n", "pyplot.xlabel('Alcohol by Volume (abv) ')\n", "pyplot.ylabel('Frequency');" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmcAAAFaCAYAAABbvvr/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAHfdJREFUeJzt3Xm4XXV97/H3BxKDClGQILEIUQaFSpESauPAUK2o6FVqW2uFW265praWaikoapVgHSs42yqPeLm39CoOvcUWFWewlirBKlZEy2RRsURFg0NCpN/7x1oHd47ncPaR7LN/Ofv9ep79ZK3fWnut7/6d7JxPfmtKVSFJkqQ27DDuAiRJkvRThjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJE2EJB9IclQ//XdJNk3NS1JLloy7AElaIL8D3ApQVb+R5IbxliNJMzOcSZoIVbVx3DVI0jA8rClp0UtyWpJvJVk3bdGaJB9M8sUkr0uyY5KDknx+amQtyZokVyf55ELXLWkyGc4kLXpV9RrgQzMsegjwBOBw4NeAk6rqKuC5A++9DHjVQtQpSWA4kzTZ3ludTcB76c5Lk6SxMpxJmmS3DEx/B1g5rkIkaYrhTNIk221genfgpn76NmDZwLJ7L1hFkiae4UzSJHtaOjsBvwW8q2//GrB7kj2S7AAcM7YKJU0cb6UhadFLchrwOGBTkhuBY4E9ga8CH6Y7nPlR4FyAqvpGkrOBfwKuBNYDJyd5U1WdPIaPIGmCpKrGXYMkSZJ6HtaUJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhqyZNwF/Lx23333WrVq1bjLkCRJmtMVV1zx7apaMcy62204W7VqFevXrx93GZIkSXNK8rVh1/WwpiRJUkMMZ5IkSQ0ZyWHNJPsCLwM+B+wFfKeqXppkHXDUwKovr6qP9O85DVgO7Ap8uKreP4raJEmSWjaqc852A95VVRcCJLkqyUUAVXXU9JWTPAw4uqqekGQpcFWSS6vqeyOqT5IkqUkjCWdVdfm0ph2AHwIkeRGwGdgReFNV/Qh4InBZ/94tSb4MHAE4eiZJkibKyM85S3IccHFVXQ28B3h9VZ0F3Aq8qV9tj35+ysa+TZIkaaKMNJwlORo4GvhTgKr6UlX9sF/8ceDX+umbgV0G3rq8b5u+vbVJ1idZv2HDhtEVLkmSNCYjC2dJjgWOAZ4D7JlkTZLXDKyyP3BNP/2PwJr+fUuAg4BLp2+zqs6pqtVVtXrFiqHu4yZJkrRdGdXVmocBFwDrgU8A9wTeAvwkyRvoRsUOBp4NUFWfSfKJJK+gu1rzFC8GkCRJk2hUFwRcAew8z/e8Zu61JEmSFjdvQitJktQQw9kEqS2bF8U+JElazLbbB59r/rJ0GTedse9I97HyzGtHun1JkhY7R84kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJasiSUWw0yb7Ay4DPAXsB36mqlybZDXgVcB2wP/DCqvrP/j2nAcuBXYEPV9X7R1GbJElSy0YSzoDdgHdV1YUASa5KchHwTOCjVfXuJE8CzgJOSPIw4OiqekKSpcBVSS6tqu+NqD5JkqQmjeSwZlVdPhXMBvbzQ+BY4LK+7dP9PMATp9qragvwZeCIUdQmSZLUspGfc5bkOODiqroa2AO4tV+0Edg1yZJp7VPL9phhW2uTrE+yfsOGDSOuXJIkaeGNNJwlORo4GvjTvulmYJd+ejlwS1X9ZFr71LKbp2+vqs6pqtVVtXrFihWjK1ySJGlMRhbOkhwLHAM8B9gzyRrgImBNv8oj+nmAf5xq70fSDgIuHVVtkiRJrRrV1ZqHARcA64FPAPcE3gK8EHh1kgOAfYFTAarqM0k+keQVdFdrnuLFAJIkaRKNJJxV1RXAzrMsfuYs73nNKGqRJEnanngTWkmSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNtU7Vl86LYhyRJ47Jk3AVoccnSZdx0xr4j3cfKM68d6fYlSRonR84kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIUuGWSnJEVV16bAbTbIn8DLgkKo6vG87EXgWsKlf7dyq+pt+2fHAocDtwLVV9bahP4EkSdIiMlQ4A85O8h66QPWdIdZ/JHAh8NBp7b9TVTcMNiTZCzgVOLSqKsnlST5eVf8+ZG2SJEmLxrDh7E+AbwDPSXJ34G+q6srZVq6q9yY5aoZFf5zkW8A9gDdX1XeBY4Arqqr6dS4DHg8YziRJ0sQZ9pyzG6vqP4BLgEOAv03y5iSPmse+LgFeXVVnAeuB9/TtewC3Dqy3sW/7GUnWJlmfZP2GDRvmsWtJkqTtw7Dh7PwkX6AbQXt1VR1cVX8MPGnYHVXV9VU1lag+DhyZZEfgZmCXgVWX920zbeOcqlpdVatXrFgx7K4lSZK2G8OGs83AcVX15Kr6GECSuwG/MOyOkrwyydRh1P2B66vqduBi4LAk6ZetAT447HYlSZIWk2HPOXsRUABJDqC7ovI24BkzrZzkSOAEYGWSPwfOBr4F/HWS64GD++VU1deTnAW8LsntwNu9GECSJE2qYcPZC4A3ANcDK4GTgOfPtnJVXUJ3jtmgN9zJ+ucD5w9ZiyRJ0qI17GHNf5m6z1kfvG4ZXUmSJEmTa9hwtmrqfLH+z71HV5IkSdLkGvaw5sXA9Um+A+wGPHt0JUmSJE2uocJZVb0/yaXAfsA1VfW90ZYlSZI0mebz4PPQ3X9seZJ1oylHkiRpsg374PNzgYfRhbPQnXO2bnRlTZbaspksXTbuMiRJUgOGPeds56p6yNTMLM/N1M8pS5dx0xn7jnw/K8+8duT7kCRJd82whzUvT7LzwPyuoyhGkiRp0g0bzk4Gbk5yfX+H/7ePsCZJkqSJNWw4e2dV3aOqHlBVDwCeN8qiJEmSJtVQ4ayqTk+yQ5Ldk6Sqzh11YdJsasvmRbUfSZIGDXu15mOBc4ArgQuS7FxVbxtpZdIsvIBCkrSYDXtY80nAg4FPV9XfAqP/zShJkjSBhg1nX6+qTUD18z4hQNKCW4hDzR7OljRuw97n7IAkpwMPTvLHwF4jrEmSZrQQh7Q9nC1p3IYdOXsusBzYHdgTr9aUJEkaiWEffH4r8MKp+SQPBq4eVVGSJEmTatirNV8yrekI4DHbvhxJkqTJNuxhzUOBr/WvwlEzSZKkkRj2goC1VbVhaqa/KECSJEnb2LDh7MAkB/bTy4FfB948mpIkSZIm17Dh7A3AvwIBbgVeNrKKJEmSJtiw4exZVfWZkVYiSZKkocPZbyX5Jt3I2aA/qqrTt3FNkiRJE2vYcPY44DfprtZcBXwb2AjsDRjOJEmStpFhb6VxAfCAqjoSeADwrqo6Gjh5ZJVJkiRNoGHD2b2qqgCq6r+A+/XTHxhVYZIkSZNo2MOa903yFuDfgQPobqchSZKkbWzYkbOTgC8B+/d/njSyiiRJkibYsA8+vy3Ju4E9gRuqavNoy5IkSZpMQ42cJTke+BTwEuDRSV400qokSZIm1LCHNQ+pqgOBK6rqQuAeI6xJkiRpYg0bzr7f/1n9nx7WlCRJGoH5XK35VmBlkrP4aUiTJEnSNjTsyNlzgc8BNwJfAZ4/sookSZIm2LAjZ5+me47mOaMsRpIkadINO3J2ZVV9bmomyX1GVI8kSdJEGzac3ZjkcUn2SbI3HtaUJEkaiWEPa/4RcPXA/N7A87Z9OZIkSZPtTsNZktcCHwVeUFXnDbQ/ZsR1SZIkTaS5DmtuBj4GHJLktUn2A6iqj468MkmSpAk0Vzjb1D9H81Rgh6q6ZgFqkiRJmlhzhbMCqKrbgf+aakzy1FEWJUmSNKnmuiDgmCQ799OPSvKX/fSvAu8bXVmSJEmTaa5wdhvww376Hwfat4ymHEmSpMk2Vzh7XlVdPr0xyWF39qYkewIvAw6pqsP7tp2As4BvAPsDr6qqr/bLjgcOBW4Hrq2qt833g0iSJC0GdxrOZgpmffsVc2z3kcCFwEMH2p4L/EdV/WWSg4Fz6Q6V7kV3wcGhVVVJLk/y8ar696E/hSRJ0iIx7BMC5qWq3gvcOq35WOCyfvkX6W7PsRw4Briiqqpf7zLg8aOoS5IkqXUjCWez2IOtA9vGvm229p+RZG2S9UnWb9iwYWSFSpIkjctChrObgV0G5pf3bbO1/4yqOqeqVlfV6hUrVoysUEmSpHFZyHB2EbAGoD/n7AtVtRG4GDgsSfr11gAfXMC6pBnVls2LYh+SpO3LsA8+n5ckRwInACuT/DlwNvAG4Kx+fj/gJICq+nqSs4DXJbkdeLsXA6gFWbqMm87Yd6T7WHnmtSPdviRp+zOScFZVlwCXzLDo2bOsfz5w/ihqkSRJ2p4s5GFNSZIkzcFwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5k0Rt7oVpI03UjucyZpON7oVpI0nSNnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkNMZxJ2iZqy+ZxlyBJi8KScRcgaXHI0mXcdMa+I93HyjOvHen2JakFjpxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnM3Bu55LkqSF5BMC5uBdzyVJ0kJy5EySJKkhhjNJGrBQpzJ4yoSk2XhYU5IGLMSpDODpDJJm58iZtMg5QiNJ25exjJwl+RdgUz97e1U9OsluwKuA64D9gRdW1X+Ooz5pMXEkaHLVls1k6bJFsx9pUozrsOaHqmrdtLZXAB+tqncneRJwFnDCglcmSYuEwVzaPo3rsObBSZ6fZF2SY/u2Y4HL+ulP9/OSJEkTZVwjZ6+uqs8m2RG4NMmtwB7Arf3yjcCuSZZU1U+m3pRkLbAWYO+9917omiVJkkZuLCNnVfXZ/s/bgU8BRwM3A7v0qywHbhkMZv3651TV6qpavWLFioUsWZIkaUEseDhL8uAkJw007Q9cA1wErOnbHtHPS5IkTZRxHNbcCDwxyf3oRshuBN4JfBB4dZIDgH2BU8dQmyRJ0lgteDirqm8Cx82w6LvAMxe4HEmSpKZ4E1pJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJzastmxfFPqRhjOPZmpIkzUuWLuOmM/Yd6T5WnnntSLcvDcuRM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJGgOvDJQ0G6/WlKQx8OpDSbNx5EySJKkhhjNJkqSGGM4kSZIaYjiTJElqiOFMkiSpIYYzSZKkhhjOJEmSGmI4kyRJaojhTJIkqSGGM0mSpIYYziRJkhpiOJMkSWqI4UySJKkhhjNJkqSGGM4kSXdJbdk87hKkRWXJuAuQJG3fsnQZN52x70j3sfLMa0e6fakljpxJkiQ1xHAmSZLUEMOZJElSQwxnkiRJDTGcSZIkNcRwJkmS1BDDmSRJUkMMZ5IkSQ0xnEmSxMI96aC2bFqAffjUhu2ZTwiQJImFedIBdE878IkKujOOnEmStMgsxMiZo3Oj48iZJEmLjM873b41Fc6SPAb4DeBmoKrqzDGXJEmStKCaCWdJ7gG8FfjFqtqc5H1JHl1VHxt3bZIkaWu1ZTNZumwB9rOJLN1pxPtYmM8yrGbCGbAG+FpVTR3E/jRwLGA4kySpMV5AMTqpqnHXAECSpwNPq6qn9PP/Eziqqo4fWGctsLaffRDwlSE3vzvw7W1Y7mJmX82P/TU/9tf82F/Ds6/mx/6an23RX/tU1YphVmxp5OxmYJeB+eV92x2q6hzgnPluOMn6qlp918qbDPbV/Nhf82N/zY/9NTz7an7sr/lZ6P5q6VYalwH7JJk66PsI4KIx1iNJkrTgmhk5q6ofJflD4I1JNgBXejGAJEmaNM2EM4Cq+gjwkRFset6HQieYfTU/9tf82F/zY38Nz76aH/trfha0v5q5IECSJEltnXMmSZI08Zo6rLmt+cSBO5dkX+BlwOeAvYDvVNVLk+wGvAq4DtgfeGFV/ef4Km1HkrsDnwE+XFWnJtkJOAv4Bl1fvaqqvjrOGluR5EHA04EfA0cC6+i+iy8GrgFWAX9WVT8YU4lNSXIaXZ98m+7v0knA3fG7CECSPen+vTqkqg7v22b9/iU5HjgUuB24tqreNpbCx2SW/no+sCfwLeAw4CVVdXW/bGL7a6a+Glj2DOB8YJepf6sWJFtU1aJ8Afeg+wWwrJ9/H/DocdfV0gs4HHjywPxVdF/YtwK/3bc9Cfibcdfaygs4G/jfwFn9/OnA8/rpg4FPjbvGFl7AjnRXW+/Qz68EVgAfAn6lbzsZ+Itx19rCi+4X5ncH+utC4Bl+F7fqo9/s+2D9QNuM3z+6/2x+np+eunM5sP+4P0MD/fUXA33yNOAf7K+Z+6pvPxB4OVDAzn3bgmSLxXxYc7YnDqhXVZdX1YUDTTsAP6Trp8v6Nvutl+QEuv64fqD5jr6qqi8ChyRZPobyWnM4EODkJC+g+4fve8DRdP/wg3+3Bv0IuI3u/o4AOwNfwu/iHarqvcCt05pn+/4dA1xR/W/Pfp3HL1StLZipv6rqxQN9sgMwNWo90f01U1/1j5R8HjB9VGxBssViPqy5B1t39sa+TTNIchxwcVVdnWSw7zYCuyZZUlU/GV+F45XkIODAqnphkl8aWDTb37ONC1lfg/ah+0fs6VX1/STnA/cBfjzwC8DvZK+qNvaHNS9IchPwdbr/nftdvHOzff/89/9OJLkb8HvAs/sm++tnvZxuZP+2JIPtC9JXizmczfnEAXWSHE03ovHcvmmq775H12+3+MuA44BNSU4HHgncLclz8e/ZbDYCV1fV9/v5fwIeBdw9SfqAZl/1kjwUOA345ar6SZKzgZfgd3Eus33/bgb2m9Z+zQLW1aw+mP018KKqmnqgpP01IMn9gV2B3x4IZqck+QAL9G/+Yg5ndzxxoB9+fATwV2OuqTlJjqX7pfkcYGWSfejOFVoD3IhPagCgql4+Nd2fhLxzVb2+n14DfCrJwcAXqmrSR82gu2jiPkl2rKrb6UbSvkQ3enY48Fn8uzXoF4DvDgSvm4C98bs4l6n+2er7l+RiukPqU/8RWAO8aZyFtqC/oOmv6M6Z/VKSp1bV+wD7a0BV3QicODWf5JXAa6vqB/3hzpFni0V9n7Mkv053ot8GYEt5teZWkhwGXAKs75vuCbwFeD/wauBrwL7A6TWhV4hNl+SpdIcC7kbXV39Pd7XYTXT/83xFebUmcMeh8l+j+/7tTXcBwH3pRoSu69tOKa/WJMmOwBuBTXSjZA+hG8nejN9FAJIcCfx34HF0Iz9n94tm/P71Vx+uprv68Ks1QVcfwqz99bd0f7e+2a92z/rplZwT218z9VVV/TjJCuAP6C6k+AvgbVX1jYXIFos6nEmSJG1vFvPVmpIkSdsdw5kkSVJDDGeSJEkNMZxJkiQ1xHAmSdpKettgOztui3qkSWM4k7ZTSX4lySeT/HOSI+ZY96FJjlqg0uhv0Ds1vTzJJdt4+7skOTfJeTMse2ySzyf5QJIzk/yfJG8ZWP6/khzaT2/VL3eln5K8NMl/+3neO8O2Tk9SSdYNtJ3af67HzvHe309ycj997yQnznPf9wVeDyxN8qAk7+rbn5Hklnl+lIOTvHie75EmnrfSkLZj/S/vnavq1DnWOxFYVVXrFqAsktxQVasG5qdubrkt93EUcGJVnTjDsvPoHmL85n7+OuApVXXlYC3T++Wu9NO2/oxJCthl8D5wSc6b6fPOVkuSVcB5VXXUPPZ7MfCsqrp+cFv99FY/1yG3dwqwsarePp/3SZNsMT8hQJooSf4H8ErgdcADgQOBJwI7AU8B7t2HubfS3an/+cAXgQfTPUfuFuCddA8s/zzdDRnPB54M3EB3o8+H093o86IkDwReC/wzcDDdjRs/n2TtwL7+he65c28E7t3XuRY4gO5mqyuAU4An9HW/m+5xKL8M/G5V3ZDkKX0NX+n384fzeQpDf1f0nYBb0j0X9Y19ePvAtH55x13op7OB30jy+apal+QCYBXwIbobe66vqjP6ek4HfhG4mu7u4rcBf1JV/zGPzzTbzxq6vobuDudrgVX95/kQcD/gMXQ/z9VV9dvTtrsfcN+BYPYndD+fVQPrrKV7osGhdE8WWdb31XnA/wXOARgIkf8AXAAYzqRhVZUvX7620xewju5RLFPznwSO6affAjy1nz4RWDew3mXAw/vpo4D/NzD92X56T7pfwicC7+zbDgf+oZ/ei+5ZkNCFqfcMbP+GaXXe0P95IHDlQPtfA2v76fOAP+inTwP+rJ8+ErhXP30K8OyBWs+bpV/OAy4EXkAXwl7DT48UrKMbcZupX+5qP63r21bRPXJpx/71jb79IcBVA9s/f6qWGT5D0Y2KbvW5hvhZ39EvfR2fHHjP3wO/108/fIZ9Phm4aKaf3dQ0sF8//bSpn/m0Pt3q5wLcA9g07u+KL1/b08uRM2nxmXp81Aa2fkDvoF8CHtufq3Z3YPARSl8GqKpvAfTnhc+0zS3A7yR5PN1o14ohansI3S/4KdcAh8xS+6p++gfAS5J8my4EfmmI/QB8pLrDmq9M8ja6xyG9bsj3TplvPw26rrrnipJkS992EFs/UPq6O9n3D+mCzQ/6bdxz2v5huJ/1oFOAF/TnpH0gyWVVNXgodhkw14PVp2q+hm4EcC5b6M5fW1I+tF0aiuFMWnxmOu/pdrrTh3YDdga+APxddedgLQOOm+P9M7WdDtxSVS9PcgDwsIFl/0W3w0Or6l8H2r8IPGBgfn/gijn283bgOVV1aX9I7X4zrDOXm2Z53/R+uav9xBzLv0z3mac8kNkD2pXAo4D39fNH0PXfXPsYdDvd4VeSPBS4f1U9M8lSuufq/j3wuYH1bwR2m2ObD6QLZgcAV/Vtt9IFdOiemTpoN+Amg5k0PMOZtJ1KspruF/bdkjySbmRnH+D3+3OqjqC7Wu4i4LPA8XS/ON8InAT8WZLrgfsD5/fh4wTgl5I8tare11+59yRg1/58pOOBfZI8mi40vLJ/392m2qvqY8AVSV4J/DjJQcC9kjyrqt6a5E1JXg98n+58q3ck+RW6UaoTknxl2j7PBV6c5BPAYX37wQO1Pryq/nmgXx7db+s+Se5Fd77Zg4DnJHnIQL98ZIZ++Xn76f4DNR8E/O5APy3vP//vV9U7kpyf5J104WsZswesZwNnp3uA/A/oAu+f9p/x12f5WX9ssF/6z7MpyWvpgtQDk/wq8CPg3/rXoM8CK5LsVFWbkvxRX/vT6a7uvxfw1CT3phvxPLl/398Bf5VkCV0YG/y5rOGnAVPSELxaU5IW0GCYTPIO4B1V9U9jLusO6W7VcXRVvWAbbGtXuosFnlVV870NhzSxDGeStID6+4b9G92FAkur6s/HXNLPSLI/cGNVbbqL29kP+GZV/WjbVCZNBsOZJElSQ3xCgCRJUkMMZ5IkSQ0xnEmSJDXEcCZJktQQw5kkSVJDDGeSJEkN+f9Zxqm1ZZJlrAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#You can set the size of the figure by doing:\n", "pyplot.figure(figsize=(10,5))\n", "\n", "#Plotting\n", "pyplot.hist(ibu, bins=20, color='#e67e22', histtype='bar', edgecolor='white') \n", "#The \\n is to leave a blanck line between the title and the plot\n", "pyplot.title('ibu \\n')\n", "pyplot.xlabel('International Bittering Units (ibu)')\n", "pyplot.ylabel('Frequency');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exploratory exercise:_\n", "\n", "Play around with the plots, change the values of the bins, colors, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing with a normal distribution\n", "\n", "A **normal** (or Gaussian) distribution is a special type of distribution that behaves as shown in the figure: 68% of the values are within one standard deviation $\\sigma$ from the mean; 95% lie within $2\\sigma$; and at a distance of $\\pm3\\sigma$ from the mean, we cover 99.7% of the values. This fact is known as the $3$-$\\sigma$ rule, or 68-95-99.7 (empirical) rule.\n", "\n", " \n", "\n", "_Standard deviation and coverage in a normal distribution. Modified figure based on original from [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg), the free media repository._\n", "\n", "\n", "Notice that our histograms don't follow the shape of a normal distribution, known as *Bell Curve*. Our histograms are not centered in the mean value, and they are not symetric with respect to it. They are what we call **skewed** to the right (yes, to the _right_). A right (or positive) skewed distribution looks like it's been pushed to the left: the right tail is longer and most of the values are concentrated on the left of the figure. Imagine that \"right-skewed\" means that a force from the right pushes on the curve." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Discuss with your neighbor:_\n", "\n", "* How do you think that skewness will affect the percentages of coverage by standard deviation compared to the Bell Curve?\n", "\n", "* Can we calculate those percentages? \n", "\n", "_Spoiler alert! (and Exercise):_\n", "\n", "Yes we can, and guess what: we can do it in a few lines of Python. But before doing that, we want you to explain in your own words how the following piece of code works. \n", "\n", "*Hints:* \n", "\n", "1. Check what the logical operation `numpy.logical_and(1\n", "\n", "@font-face {\n", " font-family: 'et-book';\n", " src: url('https://cdnjs.cloudflare.com/ajax/libs/tufte-css/1.8.0/et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff') format('woff2'),\n", " url('https://cdnjs.cloudflare.com/ajax/libs/tufte-css/1.8.0/et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff') format('woff');\n", " font-weight: normal;\n", " font-style: normal;\n", "}\n", "@font-face {\n", " font-family: 'et-book';\n", " src: url('https://cdnjs.cloudflare.com/ajax/libs/tufte-css/1.8.0/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.woff') format('woff');\n", " font-weight: normal;\n", " font-style: italic;\n", "}\n", "@font-face {\n", " font-family: 'et-book';\n", " src: url('https://cdnjs.cloudflare.com/ajax/libs/tufte-css/1.8.0/et-book/et-book-bold-line-figures/et-book-bold-line-figures.woff') format('woff');\n", " font-weight: bold;\n", " font-style: normal;\n", "}\n", "/* The various ET-Book fonts are on GitHub at: https://github.com/edwardtufte/et-book\n", "*/\n", "@font-face{\n", " font-family: 'Source Code Pro';\n", " font-weight: 400;\n", " font-style: normal;\n", " font-stretch: normal;\n", " src: url('https://raw.githubusercontent.com/adobe-fonts/source-code-pro/release/WOFF2/OTF/SourceCodePro-Regular.otf.woff2') format('woff2'),\n", " url('https://raw.githubusercontent.com/adobe-fonts/source-code-pro/release/WOFF/OTF/SourceCodePro-Regular.otf.woff') format('woff'),\n", " url('https://raw.githubusercontent.com/adobe-fonts/source-code-pro/release/OTF/SourceCodePro-Regular.otf') format('opentype');\n", "}\n", "\n", "@font-face{\n", " font-family: 'Source Sans';\n", " font-weight: 400;\n", " font-style: normal;\n", " font-stretch: normal;\n", " src: url('https://raw.githubusercontent.com/adobe-fonts/source-sans/release/WOFF2/OTF/SourceSans3-Regular.otf.woff2') format('woff2'),\n", " url('https://raw.githubusercontent.com/adobe-fonts/source-sans/release/WOFF/OTF/SourceSans3-Regular.otf.woff') format('woff'),\n", " url('https://raw.githubusercontent.com/adobe-fonts/source-sans/release/OTF/SourceSans3-Regular.otf') format('opentype');\n", "}\n", "@font-face {\n", " font-family: \"Computer Modern\";\n", " src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n", "}\n", "\n", ":root {\n", " --jp-content-line-height: 1.3;\n", " --jp-content-font-scale-factor: 1.2;\n", " --jp-content-font-size1: 20px; /* Base font size */\n", " --jp-content-font-family: 'et-book', 'Palatino Linotype', Palatino, serif;\n", "} \n", "\n", ".jp-RenderedHTMLCommon {\n", " line-height: 1.3 !important;\n", " font-size: 20px !important;\n", " font-family: 'et-book', 'Palatino Linotype', Palatino, serif;\n", " padding-left: 10% !important;\n", " padding-right: 20% !important;\n", "}\n", "\n", ".jp-CodeCell {\n", " padding-left: 10% !important;\n", " padding-right: 20% !important;\n", "}\n", "\n", ".jp-RawCell {\n", " padding-left: 10% !important;\n", " padding-right: 20% !important;\n", "}\n", "\n", "\n", "code {\n", " font-size: 0.8em !important;\n", " background-color: #f0f0f0 !important;\n", "}\n", "\n", "hr {\n", " background: #000;\n", " height: 1px;\n", " border: 0px;\n", "} \n", "\n", "\n", "#notebook_panel { /* main background */\n", " background: rgb(245,245,245);\n", "}\n", "\n", "div.cell { /* set cell width */\n", " width: 800px;\n", "}\n", "\n", "div #notebook { /* centre the content */\n", " background: #fff; /* white background for content */\n", " width: 800px;\n", " margin: auto;\n", " padding-left: 0em;\n", "}\n", "\n", "#notebook li { /* More space between bullet points */\n", "margin-top:0.5em;\n", "}\n", "\n", "/* draw border around running cells */\n", "div.cell.border-box-sizing.code_cell.running { \n", " border: 1px solid #111;\n", "}\n", "\n", "/* Put a solid color box around each cell and its output, visually linking them*/\n", "div.cell.code_cell {\n", " background-color: rgb(256,256,256); \n", " border-radius: 0px; \n", " padding: 0.5em;\n", " margin-left:1em;\n", " margin-top: 1em;\n", "}\n", "\n", "\n", "div.text_cell_render{\n", " font-family: 'Source Sans', sans-serif;\n", " line-height: 140%;\n", " font-size: 100%;\n", " width:680px;\n", " margin-left:auto;\n", " margin-right:auto;\n", "}\n", "\n", "/* Formatting for header cells */\n", ".text_cell_render h1 {\n", " /* font-family: 'Merriweather', serif; */\n", " font-style:regular;\n", " font-weight: bold; \n", " font-size: 250%;\n", " line-height: 100%;\n", " color: #004065;\n", " margin-bottom: 1em;\n", " margin-top: 0.5em;\n", " display: block;\n", "}\t\n", ".text_cell_render h2 {\n", " /* font-family: 'Merriweather', serif; */\n", " font-weight: bold; \n", " font-size: 180%;\n", " line-height: 100%;\n", " color: #0096d6;\n", " margin-bottom: 0.5em;\n", " margin-top: 0.5em;\n", " display: block;\n", "}\t\n", "\n", ".text_cell_render h3 {\n", " /* font-family: 'Merriweather', serif; */\n", "\tfont-size: 150%;\n", " margin-top:12px;\n", " margin-bottom: 3px;\n", " font-style: regular;\n", " color: #008367;\n", "}\n", "\n", ".text_cell_render h4 { /*Use this for captions*/\n", " font-family: 'et-book';\n", " font-weight: 300; \n", " font-size: 100%;\n", " line-height: 120%;\n", " text-align: left;\n", " width:500px;\n", " margin-top: 1em;\n", " margin-bottom: 2em;\n", " margin-left: 80pt;\n", " font-style: italic;\n", "}\n", "\n", ".text_cell_render h5 { /*Use this for small titles*/\n", " font-family: 'Source Sans', sans-serif;\n", " font-size: 130%;\n", " color: #e31937;\n", " margin-bottom: .5em;\n", " margin-top: 1em;\n", " display: block;\n", "}\n", "\n", ".text_cell_render h6 { /*use this for copyright note*/\n", " font-family: 'Source Code Pro', sans-serif;\n", " font-weight: 300;\n", " font-size: 9pt;\n", " line-height: 100%;\n", " color: grey;\n", " margin-bottom: 1px;\n", " margin-top: 1px;\n", "}\n", "\n", " .CodeMirror{\n", " font-family: \"Source Code Pro\";\n", "\t\t\tfont-size: 100%;\n", " }\n", "/* .prompt{\n", " display: None;\n", " }*/\n", "\t\n", " \n", " .warning{\n", " color: rgb( 240, 20, 20 )\n", " } \n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Execute this cell to load the notebook's style sheet, then ignore it\n", "from IPython.core.display import HTML\n", "css_file = '../../../styles/custom.css'\n", "HTML(open(css_file, \"r\").read())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 4 }