{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "1D Data Analysis, Histograms, Boxplots, and Violin Plots\n", "===\n", "\n", "## Unit 7, Lecture 2\n", "\n", "*Numerical Methods and Statistics*\n", "\n", "----\n", "\n", "#### Prof. Andrew White, 2/27/2020" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Goals\n", "\n", "1. Be able to histogram 1D data\n", "2. Understand the difference between 1D and categorical 1D data\n", "3. Know how to make violin plots" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import random\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from math import sqrt, pi\n", "import scipy\n", "import scipy.stats\n", "\n", "plt.style.use('seaborn-whitegrid')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: pydataset in /home/whitead/.local/lib/python3.7/site-packages (0.2.0)\r\n", "Requirement already satisfied: pandas in /home/whitead/miniconda3/lib/python3.7/site-packages (from pydataset) (0.24.2)\r\n", "Requirement already satisfied: pytz>=2011k in /home/whitead/miniconda3/lib/python3.7/site-packages (from pandas->pydataset) (2019.1)\r\n", "Requirement already satisfied: numpy>=1.12.0 in /home/whitead/miniconda3/lib/python3.7/site-packages (from pandas->pydataset) (1.16.3)\r\n", "Requirement already satisfied: python-dateutil>=2.5.0 in /home/whitead/miniconda3/lib/python3.7/site-packages (from pandas->pydataset) (2.8.0)\r\n", "Requirement already satisfied: six>=1.5 in /home/whitead/miniconda3/lib/python3.7/site-packages (from python-dateutil>=2.5.0->pandas->pydataset) (1.12.0)\r\n" ] } ], "source": [ "!pip install --user pydataset" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Getting Data from PyDatasets\n", "===\n", "\n", "One way to get data is using the `pydatasets` package. You can find the list of datasets [here](http://vincentarelbundock.github.io/Rdatasets/datasets.html) and you can use one like so:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "initiated datasets repo at: /home/whitead/.pydataset/\n" ] }, { "data": { "text/plain": [ "(71, 2)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pydataset\n", "\n", "data = pydataset.data('chickwts').values\n", "data.shape" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We have loaded a dataset with 71 data points and each data point has 2 pieces of information. Let's see one" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[179 'horsebean']\n" ] } ], "source": [ "print(data[0, :])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The first slice index says grab row `0` and the second slice says grab all columns. Each data point contains the mass of a chicken and the type of food it was fed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Analzying 1D Data\n", "====\n", "\n", "Let's see an example with data from Lake Huron. Our first tool to understand 1D numerical data is to look at sample mean and sample standard deviation. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(98, 2)\n", "[1875. 580.38]\n" ] } ], "source": [ "#load data\n", "huron = pydataset.data('LakeHuron').values\n", "\n", "#see the dimensions of the data\n", "print(huron.shape)\n", "\n", "#look at the first row\n", "print(huron[0,:])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This data has 98 rows and 2 columns. The columns contain the year and the depth of Lake Huron in feet. We cannot simply take the mean of all the data because that would be the mean of all years *and* depths. Instead, we can slice out only one of the columns" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "579.0040816326531\n" ] } ], "source": [ "huron_mean = np.mean(huron[:, 1])\n", "print(huron_mean)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We now will follow significant figures convention in our calculations. Each data point in the dataset has 5 digits of precision, so our mean should as well. Thus we will print like so:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean is 579.0 ft\n" ] } ], "source": [ "huron_mean = np.mean(huron[:, 1])\n", "print('The mean is {:.5} ft'.format(huron_mean))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can similarily calculate the sample standard deviation:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sample standard deviation is 1.3183 ft\n" ] } ], "source": [ "huron_std = np.std(huron[:, 1], ddof=1)\n", "print('The sample standard deviation is {:.5} ft'.format(huron_std))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We had to specify manually that we want to have the $N - 1$ term in the denominator. `numpy` uses a convention where you can specify what is subtracted from $N$ in the denominator through the `ddof` argument, which stands for deducted degrees of freedom. Thus `ddof = 1` means we want to have $N - 1$ in the denominator instead of the default $N$ in the denominator. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Histogramming\n", "====\n", "Histogramming is the process of sorting data into bins and counting how much data is in each bin. Let's see a basic example.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There were 2 samples between 0 and 10\n", "There were 4 samples between 10 and 20\n" ] } ], "source": [ "#create some data\n", "x = [1, 2, 13, 15, 11, 12]\n", "#compute histogram\n", "counts, bin_edges = np.histogram(x, bins=[0, 10, 20])\n", "for i in range(len(counts)):\n", " print('There were {} samples between {} and {}'.format(counts[i], bin_edges[i], bin_edges[i + 1]))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We made two bins, one from 0 to 10 and another from 10 to 20. Those were specified with the `bins = [0, 10, 20]` command. We then were given the counts of data within those bins. We can plot using our output from `np.histogram`, or we can do both the histogram and plot using `plt.hist`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAD1CAYAAABwdB+7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEmpJREFUeJzt3X+s3XV9x/FnhZGrONApUAGzBmLeu6REZzWxMmkVUqirIa4gmZ0/NhYTBEOmxEA0DFeHE4YMcFlDpiIaWRUoMLgBFBYrIpmeoJF5fGMsXSrVtK6xWuhFKN0f53vL4fTce7/33PO9937I8/GP5/v5fL73vPM93774nM/3e/wu2r9/P5KkcrxkvguQJM2MwS1JhTG4JakwBrckFcbglqTCGNySVJhD5+JNWq2W9xxK0gCWLVu2qLdtToK7evOB9mu324yOjg65mtmzrpmxrpmxrpl5sdbVarX6trtUIkmFMbglqTAGtyQVxuCWpMIY3JJUmFp3lUTES4FHgfWZeWNX++nAFcA+YCwz1zdRpCTpeXVn3J8EdvVpvw5YC5wCrIqIk4ZVmCSpv2mDOyL+BDgJuLun/QRgV2Zuy8zngDHgtEaqlCQdUGep5GrgQuADPe2LgZ1d2zuAEyf7I+12e8bFAYyPjw+8b5Osa2YWal2rv7wF2DLfZUzCumZm4dW16dxjGznvpwzuiHg/8L3MfDwipvtbB/0ss9ugvx56sf4iqinWNVML7x+7XjxGRkYa+eXkdDPuPwdOiIg1wPHA0xHxi8z8FrCdzqx7wnFVmySpQVMGd2aeO/E6Ii4HtlahTWZujYgjImIJ8AtgDbCuuVIlSTDA/8lURHwQ2J2Zm4DzgZurro2Z+dgQa5Mk9VE7uDPz8j5tm4HlwyxIkjQ1fzkpSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSrMtA9SiIiXATcCxwAjwPrMvKurfyuwDdhXNa3LzCeGXagkqaPOE3DeBfwgM6+MiD8Gvgnc1TNmdWbuGXp1kqSDTBvcmbmxa/O1dB4MLEmaJ7WfORkRDwHH03mae68N1dPeHwQuzcz9wylPktRrJg8LfmtEvAH4akS8viucLwPuAXYBtwNrgVt692+32wMVOD4+PvC+TbKumVmodUlNauq8r3NxchmwIzO3ZeYPI+JQ4ChgB0Bm3tQ1dgw4mT7BPTo6OlCB7XZ74H2bZF0zs1Drgi3zXYBexEZGRmZ13rdarb7tdW4HPBX4GEBEHAO8HPh1tX1kRNwbEYdVY1cAjw5cpSRpWnWCewNwdER8B7gbuAB4f0S8OzN3A2PAwxHxXWAnfWbbkqThqXNXyV7gvVP0XwtcO8yiJEmT85eTklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVJg6Dwt+GXAjcAwwAqzPzLu6+k8HrgD2AWOZub6ZUiVJUG/G/S7gB5m5AngP8Lme/uuAtcApwKqIOGm4JUqSutV55uTGrs3XAr+Y2IiIE4Bdmbmt2h4DTgN+MuQ6JUmVaYN7QkQ8BBwPrOlqXkznye4TdgAnDqc0SVI/tYM7M98aEW8AvhoRr8/M/X2GLZps/3a7PUh9jI+PD7xvk6xrZhZqXVKTmjrv61ycXAbsyMxtmfnDiDgUOIrO7Ho7nVn3hOOqtoOMjo4OVGC73R543yZZ18ws1Lpgy3wXoBexkZGRWZ33rVarb3udi5OnAh8DiIhjgJcDvwbIzK3AERGxpAr0NcB9A1cpSZpWneDeABwdEd8B7gYuAN4fEe+u+s8Hbga+A2zMzMcaqVSSBNS7q2Qv8N4p+jcDy4dZlCRpcv5yUpIKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpT6ynvEXEl8LZq/Gcy87auvq3ANmBf1bQuM58YbpmSpAl1nvL+dmBpZi6PiFcBjwC39QxbnZl7mihQkvRCdZZKNgPnVK9/AxweEYc0V5IkaSp1Hha8D3iy2jwPGKvaum2IiCXAg8Clmbl/qFVKkg6otcYNEBFn0QnuVT1dlwH3ALuA24G1wC29+7fb7YEKHB8fH3jfJlnXzCzUuqQmNXXe1704eQbwCeDMzNzd3ZeZN3WNGwNOpk9wj46ODlRgu90eeN8mWdfMLNS6YMt8F6AXsZGRkVmd961Wq2/7tGvcEXEkcBWwJjN39fZFxL0RcVjVtAJ4dOAqJUnTqjPjPhd4NfD1iJhoewD4cWZuqmbZD0fEXjp3nBw025YkDU+di5M3ADdM0X8tcO0wi5IkTc5fTkpSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1Jh6j4s+ErgbdX4z2TmbV19pwNXAPuAscxc30ShkqSOOg8LfjuwNDOXA2cC/9Iz5DpgLXAKsCoiThp6lZKkA+oslWwGzqle/wY4PCIOAYiIE4BdmbktM58DxoDTGqlUkgTUe1jwPuDJavM8Ossh+6rtxcDOruE7gBOHWqEk6QVqrXEDRMRZdIJ71RTDFk3W0W63Z1DW81Z/eQuwZaB9m2ddM7NQ65KaMT4+PnD2TaXuxckzgE8AZ2bm7q6u7XRm3ROOq9oOMjo6OmCJ/mOXVKaRkZFZZB+0Wq2+7XUuTh4JXAWsycxd3X2ZuRU4IiKWRMShwBrgvoGrlCRNq86M+1zg1cDXI2Ki7QHgx5m5CTgfuLlq35iZjw29SknSAXUuTt4A3DBF/2Zg+TCLkiRNzl9OSlJhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmHqPix4KXAHcE1mfr6nbyuwDdhXNa3LzCeGWKMkqcu0wR0RhwPXA/dPMWx1Zu4ZWlWSpEnVWSp5GngnsL3hWiRJNdR5WPCzwLNdT3jvZ0NELAEeBC7NzP3DKU+S1KvWGvc0LgPuAXYBtwNrgVt6B7Xb7SG8lSSVY3x8vJHsm3VwZ+ZNE68jYgw4mT7BPTo6OuA7bBlwP0maXyMjI7PIPmi1Wn3bZ3U7YEQcGRH3RsRhVdMK4NHZ/E1J0tTq3FWyDLgaWAI8ExFnA3cCj2fmpmqW/XBE7AUeoc9sW5I0PHUuTraAlVP0XwtcO8SaJElT8JeTklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVJhaDwuOiKXAHcA1mfn5nr7TgSuAfcBYZq4fepWSpAOmnXFHxOHA9cD9kwy5DlgLnAKsioiThleeJKlXnaWSp4F3Att7OyLiBGBXZm7LzOeAMeC04ZYoSeo2bXBn5rOZuXeS7sXAzq7tHcBrhlGYJKm/WmvcM7Boso52uz3kt5KkhW18fLyR7JttcG+nM+uecBx9llQARkdHB3yLLQPuJ0nza2RkZBbZB61Wq2/7rG4HzMytwBERsSQiDgXWAPfN5m9KkqY27Yw7IpYBVwNLgGci4mzgTuDxzNwEnA/cXA3fmJmPNVSrJIkawZ2ZLWDlFP2bgeVDrEmSNAV/OSlJhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFqfWw4Ii4BngLsB+4KDO/39W3FdgG7Kua1mXmE8MtU5I0oc4zJ1cAr8vM5RExCnyRgx9Vtjoz9zRRoCTpheoslZwG3A6QmW3glRFxRKNVSZImVWepZDHQ6treWbX9tqttQ0QsAR4ELs3M/b1/pN1uz6JMSSrP+Ph4I9lXa427x6Ke7cuAe4BddGbma4FbencaHR0d4K0Atgy4nyTNr5GRkVlkH7Rarb7tdYJ7O50Z9oRjgV9ObGTmTROvI2IMOJk+wS1JGo46a9z3AWcDRMQbge2Z+btq+8iIuDciDqvGrgAebaRSSRJQY8admQ9FRCsiHgKeAy6IiA8CuzNzUzXLfjgi9gKP4GxbkhpVa407My/pafpRV9+1wLXDLEqSNDl/OSlJhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFqfUEnIi4BngLsB+4KDO/39V3OnAFsA8Yy8z1TRQqSeqYdsYdESuA12XmcuA84LqeIdcBa4FTgFURcdLQq5QkHVBnqeQ04HaAzGwDr4yIIwAi4gRgV2Zuy8zngLFqvCSpIXWWShYDra7tnVXbb6v/3dnVtwM4sd8fabVa/Zqndes5iwfaT5Lm21NPPTVw9k2l1hp3j0Uz7Vu2bNlU+0iSZqDOUsl2OjPrCccCv5yk77iqTZLUkDrBfR9wNkBEvBHYnpm/A8jMrcAREbEkIg4F1lTjJUkNWbR///5pB0XEPwGnAs8BFwB/CuzOzE0RcSrw2WrorZn5z4MUspBvOYyIK4G30Vla+kxm3tbVtxXYVtUGsC4zn2i4npXAN4D/qZp+nJkf6eqft+MVEecB7+tqelNmvryr/xngu139p2XmPhoSEUuBO4BrMvPzEfFa4CvAIXS+Ob4vM5/u2WfSc7Hhur4E/AHwDPBXmfmrrvErmeIzb7CuG4FlwP9VQ67KzLt79pmP4/UN4Kiq+4+AhzPzQ13jPwisB35eNX0zM/+xgbpekA3A95mD86vWGndmXtLT9KOuvs3A8pm+cbfuWw4jYhT4Ys/fvA44A3gC+HZE3JqZP5nNe86gtrcDS6vaXgU8AtzWM2x1Zu6Zi3q6fDszz56kb96OV2Z+AfgCHPhc39MzZHdmrpyLWiLicOB64P6u5n8A/jUzvxERVwB/A/xb1z7TnYtN1fVp4IbM/HpEXAB8FPh4z65TfeZN1QVwaWbeNck+83K8MvOcrv4vAv/eZ9eNmXnxMGvpqatfNtzPHJxfC+WXkwv5lsPNwMRJ8hvg8Ig4ZA7ff0YWwPHqdhmdWc98eRp4Jy+87rISuLN6/Z/A6T37THouNlzXh4Fbq9c7gVcN+T3r6FfXdObreAEQEQG8IjP/e8jvWcdB2cAcnV+D3FXShKHcctiE6mv8k9XmeXSWHnq/2m+IiCXAg3RmJ9OvP83eSRFxJ52viZ/KzG9W7fN6vCZExJuBbd1f9ysjEfE14I/pLK19rqkaMvNZ4NnOv+0DDu/66roDeE3PblOdi43VlZlPAlSTggvofDPoNdln3lhdlQsj4qN0jteFmfnrrr55OV5dLqIzG+9nRUTcQ2f56eLMfGRYNVV1HZQNwBlzcX4tlBl3rxnfcti0iDiLzodzYU/XZXS+1q4EltL5FWnTfgZ8CjgL+ADwhYg4bJKx83Ur5t8CN/Zpvxj4ELAKWBcRb5rLonrUOTZzdvyq0P4K8EBm9i5XzOQzH6avAJdk5juAHwKXTzN+Lo/XYcCfZeZ/9el+GLg8M88EPgnc1GAdk2VDY+fXQplxL+hbDiPiDOATwJmZubu7LzNv6ho3BpwM3NJkPdXFz43V5s8j4ld0jsvjLIDjVVkJHHTxLDM3TLyOiPvpHK8fzF1Z7ImIl2bmXvofm6nOxaZ9CfhZZn6qt2Oaz7wxPf8BuZOu9drKfB6vFUDfJZLM/Cnw0+r19yLiqIg4ZNgXwnuzISLm5PxaKDPuBXvLYUQcCVwFrMnMXb19EXFv18xnBfDoHNS0LiIurl4vBo6hcyFy3o9XVdOxwJ7M/H1Pe0TE1yJiUVXbKTx/l8Rc+RbPfytaC9zT0z/pudikiFgH/D4z/36y/sk+84brurW6bgKd/xj3nt/zcrwqb6brRoluEfHxiPjL6vVSYGcDod0vG+bk/Kp1O+BcmItbDges60N0vh4+1tX8AJ3bsTZFxEV0vrrupXNV+SNNr3FHxB8CXwNeARxG5yv00SyA41XVtwz4dGaurrYvoXNHxPci4rPAO+h8znc2cYtWTx1XA0vo3GL3BLCOzhLOCPC/wF9n5jMR8R/V672952Jm9g2HIdd1NDDO82udP8nMD0/URefb8Qs+88wcm4O6rgcuAZ4C9tA5RjsWwPH6Czrn/YOZubFr7B2ZeVZEHE9nmecldI7d3w37AuYk2fABOne4NHp+LZjgliTVs1CWSiRJNRncklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQV5v8BJS2zQOWmZkAAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.hist(x, bins=[0, 10, 20])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "There are a few problems we can see. The first is that the x-axis has ticks in weird locations. The second problem is that the bars are right on top of one another, so it's hard to tell what's going on. Let's adjust the options to fix this." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/whitead/miniconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: \n", "The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.\n", " alternative=\"'density'\", removal=\"3.1\")\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWsAAAD1CAYAAACWXdT/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAABxtJREFUeJzt20+opXUdx/HP5CxGBpKgRX9BhPh1wlV3kwsrSdJKcGHiInIqoU1CW8EW0sZQROnPRloE7VpEf1AjbOOmhZxFq8uvP/SHMigQrYQRtdtijnFHHe8598ydOx/m9Vo957nP85zv4uHNc3/nnBN7e3sB4PL2juMeAICDiTVAAbEGKCDWAAXEGqCAWAMUOLntBZbLpe/+ARzCzs7OiXWP3TrWqze8GJeBi253dzeLxeK4x4A3WS6XGx1vGQSggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUABsQYoINYABcQaoIBYAxQQa4ACYg1QQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUCBtWI9xrh6jPGHMcaXjngeAN7Cuk/W30jy/FEOAsCFHRjrMcaHk3wkyRNHPw4Ab+XkGsc8kuTeJGcudMDu7u5FGwguprNnz+ba+zxncHl56sx1G5/ztrEeY9yd5Ndzzj+OMS543GKx2PiN4VLwIMHlaLFYZLlcbnTOQU/Wn0ty3RjjtiQfSPLyGOOvc86nDzkjAIfwtrGec971+vYY44EkfxJqgEvP96wBCqzzAWOSZM75wBHOAcDb8GQNUECsAQqINUABsQYoINYABcQaoIBYAxQQa4ACYg1QQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUABsQYoINYABcQaoIBYAxQQa4ACYg1QQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUABsQYoINYABcQaoIBYAxQQa4ACYg1QQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUABsQYoINYABcQaoMDJdQ4aYzyU5MbV8Q/OOX98pFMBcJ4Dn6zHGDcluX7OeUOSW5M8duRTAXCedZZBnkly52r7hSSnxxhXHd1IALzRgcsgc87Xkry0enlPkidX+/7v2vueOILRYDtPnbkuZ8+ePe4x4E12d3c3PmftDxjHGLfnXKzv3fhd4BgsFoucOnXquMeAN1ksFhufs+4HjLckuT/JrXPOFzd+FwC2cmCsxxjXJHk4yc1zzuePfiQA3midJ+u7krw7yY/GGK/vu3vO+ZcjmwqA86zzAePjSR6/BLMAcAF+wQhQQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUABsQYoINYABcQaoIBYAxQQa4ACYg1QQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUABsQYoINYABcQaoIBYAxQQa4ACYg1QQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVBArAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUECsAQqINUABsQYoINYABcQaoIBYAxQQa4ACYg1QQKwBCog1QAGxBigg1gAFxBqggFgDFBBrgAJiDVDgxN7e3lYXWC6X210A4Aq1s7NzYt1jt441AEfPMghAAbEGKHBym5PHGI8m+ViSvSRfn3M+e1GmgkMaY1yf5KdJHp1zfneM8cEkP0xyVZK/J/ninPPl45yRK9MY46EkN+Zcdx9M8mw2uDcP/WQ9xvhEkg/NOW9Ick+Sbx/2WnAxjDFOJ/lOkl/t2/3NJN+bc96Y5PdJvnIcs3FlG2PclOT6VS9vTfJYNrw3t1kG+VSSnyTJnHM3ybvGGO/c4nqwrZeTfDbJc/v2fTLJz1bbP09y8yWeCZLkmSR3rrZfSHI6G96b2yyDvCfJct/rf672/WuLa8KhzTlfTfLqGGP/7tP7/rX8R5L3XvLBuOLNOV9L8tLq5T1Jnkxyyyb35lZr1m+w9vcF4Zi4RzlWY4zbcy7Wn07yu31/OvDe3GYZ5Lmce5J+3ftybpEcLif/GWNcvdp+f85fIoFLZoxxS5L7k3xmzvliNrw3t4n1L5N8fjXER5M8N+f89xbXg6PwdJI7Vtt3JPnFMc7CFWqMcU2Sh5PcNud8frV7o3tzq18wjjG+leTjSf6b5Gtzzt8c+mKwpTHGTpJHklyb5JUkf0vyhSQ/SHIqyZ+TfHnO+coxjcgVaozx1SQPJPntvt1nknw/a96bfm4OUMAvGAEKiDVAAbEGKCDWAAXEGqCAWAMUEGuAAmINUOB/Cw8D2+Yug/YAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#rwidth controls how close bars are\n", "plt.hist(x, bins=[0, 10, 20], rwidth = 0.99, normed=False)\n", "#set exactly where the ticks should be\n", "plt.xticks([0, 10, 20])\n", "plt.yticks([2, 4])\n", "plt.xlim(0, 20)\n", "plt.ylim(0, 5)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now let's take a look at the Lake Huron level. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAD1CAYAAABEDd6nAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEDRJREFUeJzt3H2wXHV9x/F3ID7EZBoDTGtMLdhqv17L1JneWtQkGjQUUBxGgrUjUjBMpWodlanWqpWHPujoKFjMOGR8ANGObW3VZIBAg0+pViZz/5Bx3PlqpGrHhAcHSROGGwimf5xzyTa9G3bP7t7s+fl+/cPu2XN++7kn93zu3t/5XRYdOnQISVI5jjvWASRJo2WxS1JhLHZJKozFLkmFsdglqTAWuyQVZvFCv+HMzIzrKyVpQNPT04v63XfBix1genp6qOM7nQ5TU1MjSrMw2pa5bXmhfZnblhfal7mUvDMzMwON01exR8SpwJeBqzPzYxHxDOBG4HhgD3BhZh6IiAuAtwG/ADZn5icHSiNJGtrjzrFHxFLgWuD2rs1XAZsycy2wC9hY7/c+YD2wDnh7RJww8sSSpKPq5+bpAeDlwO6ubeuALfXjrVRlfhqwMzP3ZuZDwDeB1aOLKknqx+NOxWTmQeBgRHRvXpqZB+rH9wIrgacB93XtM7ddkrSARnHztNed2p53cDudzlBvODs7O/QYC61tmduWF9qXuW15oX2Zf1nzNi32/RGxpJ5yWUU1TbOb6lP7nFXAt+c7eNi71G270w3ty9y2vNC+zG3LC+3LXEreQVfFNP0Dpe3AhvrxBmAbcAfw/Ih4akQso5pf39FwfElSQ4/7iT0ipoEPA6cAj0TE+cAFwPURcSnwY+CGzHwkIt4F3AocAq7MzL1jSy5Jmlc/N09nqFbBHOmMefb9AvCF4WONyBXLGxzjzyLpl80p77pp4GN+9IFXjCHJaPj/ipGkwljsklQYi12SCmOxS1JhLHZJKozFLkmFsdglqTAWuyQVxmKXpMJY7JJUGItdkgpjsUtSYSx2SSqMxS5JhbHYJakwFrskFcZil6TCWOySVBiLXZIKY7FLUmEsdkkqjMUuSYWx2CWpMBa7JBXGYpekwljsklQYi12SCmOxS1JhLHZJKozFLkmFsdglqTAWuyQVxmKXpMJY7JJUmMVNDoqIZcBngBXAk4ArgbuBjwOHgDsz842jCilJ6l/TT+wXA5mZpwPnAx8FrgHempmrgeURcfZoIkqSBtG02H8GnFg/XgHcDzwzM3fW27YC64fMJklqoNFUTGZ+PiIujohdVMX+SmBT1y73Ait7Hd/pdJq87WNmZ2f7GmOqwdjDZuul38yTom15oX2Z25YX2pd5nHnHMe6o8jadY38d8JPMPCsingd8Edjbtcuiox0/NdWkcg/rdDpDj9HLuMYdZ+ZxaFteaF/mtuWF9mXuP+9dA489jvPQK+/MzMxA4zSdilkN3AqQmd8BlgAndb2+CtjdcGxJ0hCaFvsu4DSAiDgZ2Ad0ImJN/fp5wLbh40mSBtVoKga4DvhURHy9HuPPqJY7XhcRxwF3ZOb2EWWUJA2g6c3T/cAfzfPS2uHiSJKG5V+eSlJhLHZJKozFLkmFsdglqTAWuyQVxmKXpMJY7JJUGItdkgpjsUtSYSx2SSqMxS5JhbHYJakwFrskFcZil6TCWOySVBiLXZIKY7FLUmEsdkkqjMUuSYWx2CWpMBa7JBXGYpekwljsklQYi12SCmOxS1JhLHZJKozFLkmFsdglqTAWuyQVxmKXpMJY7JJUGItdkgpjsUtSYRY3PTAiLgDeCRwE3gfcCdwIHA/sAS7MzAOjCClJ6l+jT+wRcSJwObAGOAc4F7gK2JSZa4FdwMZRhZQk9a/pVMx6YHtm7svMPZn5BmAdsKV+fWu9jyRpgTWdijkFeEpEbAFWAFcAS7umXu4FVg6dTpI0sKbFvgg4EXgVcDLw1Xpb9+s9dTqdhm9bmZ2d7WuMqQZjD5utl34zT4q25YX2ZW5bXmhf5nHmHce4o8rbtNjvAb6VmQeBH0bEPuBgRCzJzIeAVcDuXgdPTTWp3MM6nc7QY/QyrnHHmXkc2pYX2pe5bXmhfZn7z3vXwGOP4zz0yjszMzPQOE3n2G8DXhoRx9U3UpcB24EN9esbgG0Nx5YkDaFRsWfmT4EvAN8GbgHeQrVK5qKI2AGcANwwqpCSpP41XseemdcB1x2x+Yzh4kiShuVfnkpSYSx2SSqMxS5JhbHYJakwFrskFcZil6TCWOySVBiLXZIKY7FLUmEsdkkqjMUuSYWx2CWpMBa7JBXGYpekwljsklQYi12SCmOxS1JhLHZJKozFLkmFsdglqTAWuyQVxmKXpMJY7JJUGItdkgpjsUtSYSx2SSqMxS5JhbHYJakwFrskFcZil6TCWOySVBiLXZIKY7FLUmEWD3NwRCwBvgv8DXA7cCNwPLAHuDAzDwydUJI0kGE/sb8XuL9+fBWwKTPXAruAjUOOLUlqoHGxR8RzgOcCN9Wb1gFb6sdbgfVDJZMkNTLMJ/YPA5d1PV/aNfVyL7ByiLElSQ01mmOPiD8B/jMz/ysi5ttl0dGO73Q6Td72MbOzs32NMdVg7GGz9dJv5knRtrzQvsxtywvtyzzOvOMYd1R5m948fQXwmxFxDvDrwAFgf0QsycyHgFXA7l4HT001qdzDOp3O0GP0Mq5xx5l5HNqWF9qXuW15oX2Z+89718Bjj+M89Mo7MzMz0DiNij0zXzP3OCKuAH4EvAjYAHy2/u+2JmNLkoYzynXslwMXRcQO4ATghhGOLUnq01Dr2AEy84qup2cMO54kaTj+5akkFcZil6TCWOySVBiLXZIKY7FLUmEsdkkqjMUuSYWx2CWpMBa7JBXGYpekwljsklQYi12SCmOxS1JhLHZJKozFLkmFsdglqTAWuyQVxmKXpMJY7JJUGItdkgpjsUtSYSx2SSqMxS5JhbHYJakwFrskFcZil6TCWOySVBiLXZIKY7FLUmEsdkkqjMUuSYWx2CWpMBa7JBVmcdMDI+KDwNp6jPcDO4EbgeOBPcCFmXlgFCElSf1r9Ik9Ik4HTs3MFwJnAdcAVwGbMnMtsAvYOLKUkqS+NZ2K+Qbw6vrxA8BSYB2wpd62FVg/VDJJUiONpmIy81HgwfrpJcDNwJldUy/3AiuHjydJGlTjOXaAiDiXqtj/EPhB10uLjnZcp9MZ5m2ZnZ3ta4ypBmMPm62XfjNPirblhfZlblteaF/mceYdx7ijyjvMzdMzgfcAZ2Xm3ojYHxFLMvMhYBWwu9exU1NNKvewTqcz9Bi9jGvccWYeh7blhfZlblteaF/m/vPeNfDY4zgPvfLOzMwMNE7Tm6fLgQ8B52Tm/fXm7cCG+vEGYFuTsSVJw2n6if01wEnAP0fE3LaLgE9ExKXAj4Ebho8nSRpU05unm4HN87x0xnBxJEnD8i9PJakwFrskFcZil6TCWOySVBiLXZIKY7FLUmEsdkkqjMUuSYWx2CWpMBa7JBXGYpekwljsklQYi12SCmOxS1JhLHZJKozFLkmFsdglqTAWuyQVxmKXpMJY7JJUGItdkgpjsUtSYSx2SSqMxS5JhbHYJakwFrskFcZil6TCWOySVBiLXZIKY7FLUmEsdkkqjMUuSYWx2CWpMBa7JBVm8agHjIirgRcAh4C3ZubOUb+HJKm3kX5ij4iXAM/OzBcClwD/MMrxJUmPb9RTMS8DvgSQmR1gRUT8yojfQ5J0FIsOHTo0ssEiYjNwU2Z+uX6+A7gkM78/t8/MzMzo3lCSfklMT08v6nffkc+xH+H/BRkknCRpcKOeitkNPK3r+dOBPSN+D0nSUYy62G8DzgeIiN8DdmfmvhG/hyTpKEY6xw4QER8AXgz8AnhzZn6n4TgXAO8EDgLvA+4EbgSOp/ot4MLMPFDv97b6/TZn5ieH/yoa5V0GfAZYATwJuBK4G/g41dLPOzPzjfW+7wBeXW+/MjNvXuCspwJfBq7OzI9FxDPo89xGxBOA64GTgUeB12fmXccg76eBJwCPAK/LzLsnNW/X9jOBbZm5qH4+EXnny1znuAF4FrAPOD8zfz7hmV8M/D3V98SDVN/HP5/veouI5cA/AsuB/cBrM/P+Mef9ILCWagr8/cBOxnTdjfwPlDLzXZn5osxcM0SpnwhcDqwBzgHOBa4CNmXmWmAXsDEillKV/npgHfD2iDhhBF9GExcDmZmnU/3W8lHgGqq1/KuB5RFxdkQ8E/hjDn9tH4mI4xcqZH3OrgVu79o8yLl9LfBAZq4B/o7qG3Sh8/4t1Tf8S4AvApdNeF4i4snAX1FPTU5K3qNk/lPgvsz8A+CfgLUtyPwRqsUapwPfAi49yvX2NuBrdeZ/A/5yzHlPB06tl4KfRdUNY7vuJvUvT9cD2zNzX2buycw3UH2RW+rXt9b7nAbszMy9mfkQ8E1g9bEIDPwMOLF+vAK4H3hm1x9ozWU+HbglMx/OzPuAHwPPXcCcB4CXU90PmbOO/s/ty6jKFGA74z/f8+V9E/Cv9eP7qM77JOcFeDewCXi4fj4peWH+zK8EPgeQmZszc0sLMh95Df6M3tdbd+a57/lx+gbVbw0ADwBLGeN1N6nFfgrwlIjYEhE7IuJlwNLMPFC/fi+wkupG7X1dx81tX3CZ+XngNyJiF9U/4l8AP58n2zHNnJkH62+YboOc28e2Z+YvgEMR8cSFzJuZD2bmo/UnrzdT/Uo9sXkj4reB52Xmv3Rtnoi8vTJTXYNnR8TXIuLz9afGSc/8duBLEZFUUx7X95OZBbgGM/PRzHywfnoJcDNjvO4mtdgXUf3kPY9qiuPT/N+lk72WTB6zpZQR8TrgJ5n5LOClwGeP2GXiMvcwaM5jkr8u9RuBr2Tm7fPsMkl5rwYue5x9Jinv3PtmZq4Dvks1jTTfPr2OPRauBV6VmQH8B9VvdkeaL9uC5Y2Ic6mK/c/7zNDoHE9qsd8DfKv+qfxDqps3+yJiSf36KqpfwY5cXjm3/VhYDdwKUN9bWAKc1PX6JGaes3+Ac/vY9vqGzqLMfJiF92ngB5l5Zf18IvNGxCrgOcDnIuLbwMqI+Pqk5u1yD/D1+vGtwO8w+Zl/NzO/WT/+d+D36SMzC3QN1jfP3wOcnZl7GeN1N6nFfhvw0og4rr6RuoxqXmlD/foGYBtwB/D8iHhqvSplNbDjWASmuvlxGkBEnEz1w6gTEWvq18+jyvwV4BUR8cSIeDrVP9z3jkHeboOc29s4PFf4SuCrC5x1bjXJw5l5edfmicybmT/NzN/KzBdk5guAPfVN34nM2+UWqpt8ANNAMvmZ746IuftVzwd+QO/rrTvz3Pf82NSrcD4EnNO1+mZs193IlzuOSkRcSvUrC1SrIHZSLSd8MtUNkNdn5iMRcT7wDqqlTNdm5ueOUd5lwKeAX6NazvTXVMsdr6P6AXpHZl5W7/sW4II683t7TCWMK+c08GGqOdRHgJ/WWa6nj3NbT4F8Ang21Q2sizPzvxc4768Cs8D/1Lt9LzPfNMF5z5u7mCPiR5l5Sv34mOc9SubXUq3sWkm1HPCizLxnwjO/m6o8H6FavLAxMx+Y73qrr9fPUk35PkC1ZHbvGPO+AbgC+H7X5ouoztvIr7uJLXZJUjOTOhUjSWrIYpekwljsklQYi12SCmOxS1JhLHZJKozFLkmFsdglqTD/C84BsbH0ZbDEAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.hist(huron)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "What went wrong?" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.hist(huron[:, 1])\n", "plt.xlabel('Lake Huron Level in Feet')\n", "plt.ylabel('Number of Times Observed')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The ticks aren't great and I personally don't like the bars touching. Let's work a little on improving the plot." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.style.use('seaborn')\n", "plt.hist(huron[:, 1], rwidth=0.9)\n", "plt.xlabel('Lake Huron Level in Feet')\n", "plt.ylabel('Number of Times Observed')\n", "plt.yticks([0, 5, 10, 15, 20])\n", "plt.ylim(0,20)\n", "plt.axvline(x=np.mean(huron[:,1]),color='red', label='mean')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can see a lot from this figure. We can see the lowest and highest depths. This representation may remind you of how probability distributions look and indeed, this representation is how you can reconstruct probability mass functions. To see this, let's look at another example." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We'll look at a larger dataset that is speed of cars." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8437\n" ] } ], "source": [ "car_speed = pydataset.data('amis').values\n", "speeds = car_speed[:, 0]\n", "print(len(speeds))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/whitead/miniconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: \n", "The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.\n", " alternative=\"'density'\", removal=\"3.1\")\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#now we'll use normed\n", "plt.hist(speeds, normed=True)\n", "plt.xlabel('Speed in mph')\n", "plt.ylabel('Proportion')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now the y-axis shows the proportion of times that a particular speed was observed. Thanks to the Law of Large Numbers, and the fact we have 8,500 samples, we know that these proportions will approach the probabilities of these intervals. For example, the probability of observing a speed between 25 and 30 mph is $\\approx 0.012$. If we make our bins small enough, we'll eventually be able to assign a probability to any value and thus we'll have recreated the probability mass function!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Kernel Density Estimation\n", "====\n", "\n", "Kernel density estimation is a more sophisticated method for estimating the probability mass function from a histogram. It can help you see what *type* of distribution your data might follow (e.g., normal, exponential). Let's see an example." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/whitead/miniconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py:6521: MatplotlibDeprecationWarning: \n", "The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.\n", " alternative=\"'density'\", removal=\"3.1\")\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "\n", "sns.distplot(speeds, bins=range(15, 65))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The new solid line shows us that a normal distribution would be a good fit, although the right tail is a little long. This line is generated by estimating what the histogram would look like if the bins were infinitely small." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Categorical Data Boxplots\n", "====\n", "\n", "Sometimes we'll have measured some quantity, like mass of a chicken, under multiple conditions. This is not exactly 2D, because the conditions are usually categorical data. For example, my conditions are the kind of food I've fed to my chickens. We can analyze this using a boxplot, which shows the category and quartiles in one plot." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['casein' 'horsebean' 'linseed' 'meatmeal' 'soybean' 'sunflower']\n" ] } ], "source": [ "data = pydataset.data('chickwts').values\n", "categories = np.unique(data[:,1])\n", "print(categories)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This first step is a way to find all the unique labels to find our possible categories. Now we'll use that to separate our data into a list of arrays, one for each catgory, instead of one large array." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "data_as_arrays = []\n", "#loop over categories\n", "for c in categories:\n", " #get a True/False array showing which rows had which category\n", " rows_with_category = data[:,1] == c\n", " #now slice out the rows with the category and grab column 0 (the chicken mass)\n", " data_slice = data[rows_with_category, 0]\n", " #now we need to make the data into floats, because it happened to be loaded as a string\n", " data_as_arrays.append(data_slice.astype(float))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Whew! That was a lot of work. We used a few tricks. One was that you can slice using True and False values in numpy. Let's see a smaller example:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 4, 20])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([4, 10, 20])\n", "my_slice = [True, False, True]\n", "x[my_slice]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The other thing we did is convert the array into floating point numbers. Recall that in `numpy` each array can only be one data type. The original chicken dataset had strings in it, like 'linseed', so that the whole array was strings. We thus had to convert to floats to do calculations on the chicken weights. We used the `astype()` method on the array. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So we found which rows had their second column (the category column) be equal to category `c` and then slice out those rows. Now we can make the boxplot." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAFYCAYAAACoFn5YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XlcVOX+B/DPzMDIIgQYaFoumYoJoST6U7PElUpzY1GQNvO2KEoZqai5XlEpK0ULNXGtqyIlWW5ZdluUq6EE5la3q4gL2ygwMzAw8/z+IKZQYWB0Zjjweb9evRrPzDnP9xzOzGeeOec8RyaEECAiIiJJktu6ACIiIjIfg5yIiEjCGOREREQSxiAnIiKSMAY5ERGRhDHIiYiIJMzO1gWYIy+v2NYlEBERWY2np0uNz7FHTkREJGEMciIiIgljkBMREUkYg5yIiEjCGOREREQSxiAnIiKSMAY5ERGRhDHIiYiIJIxBTkRE9KczZ37FmTO/2rqMepHkyG5ERESWsHv3LgCAt/fDNq6k7tgjJyIiQmVv/OzZ0zh79rSkeuUMciIiIvzVG7/5cUPHICciIpIwBjkRERGAkSPH3vZxQ8eT3YiIiFB5gluXLl2Nj6WCQU5ERPQnKfXEq8iEEMLWRdRXXl6xrUsgIiKyGk9Plxqf4zFyIiIiCWOQExERSRiDnIiISMIY5ERERBLGICciIpIwBjkREZGEMciJiIgkjEFOREQkYQxyIiIiCWOQExERSRiDnIiISMJ405Q/7dixDceOpZk1r1qtBgA4OzubNX9AQG+EhkaYNS8RETVtFu2Rl5aWYvDgwUhJScGVK1cQGRmJ8PBwTJs2DTqdDgCQmpqKsWPHIiQkBDt37rRkORaj05VBpyuzdRlERNQEWfTuZ++99x5++OEHRERE4NixY3j88cfx5JNPYsWKFWjVqhVGjRqF0aNHIzk5Gfb29ggODsbWrVvh5uZW63Ib2t3PYmKmAgDi41fauBIiImqMbHL3s99//x2//fYbBgwYAABIS0vDoEGDAACBgYE4cuQIMjIy4OvrCxcXFzg4OMDf3x/p6emWKomIiKjRsdgx8mXLlmHu3Ln4/PPPAQBarRZKpRIA0KJFC+Tl5SE/Px8eHh7GeTw8PJCXl2dy2e7uTrCzU1imcDMoFJXfh2r7xkRERGQJFgnyzz//HN27d8cDDzxw2+dr+jW/rr/yq1Qas2uzBL3eAKDh/eRPRESNQ20dRYsE+eHDh5GdnY3Dhw/j6tWrUCqVcHJyQmlpKRwcHHDt2jV4eXnBy8sL+fn5xvlyc3PRvXt3S5RERETUKFkkyN9//33j41WrVqFNmzY4ceIE9u/fj5EjR+LAgQPo378//Pz8MGfOHBQVFUGhUCA9PR2xsbGWKImIiMikM2d+BQB4ez9s40rqzmrXkUdFRWHGjBnYvn07WrdujVGjRsHe3h7Tp0/HxIkTIZPJMHnyZLi48DgzERHZxu7duwAwyKuJiooyPk5KSrrl+aCgIAQFBVm6DCIiolqdOfMrzp49bXwslTDnEK1ERET4qzd+8+OGjkFOREQkYQxyIiIiACNHjr3t44aON00hIiJC5QluXbp0NT6WCgY5ERHRn6TUE6/CICciIvqTlHriVXiMnIiISMIY5ERERBLGICciIpIwBjkREZGEMciJiIgkjGetE5HN7dixDceOpdV7PrVaDQBwdnY2q92AgN4IDY0wa16ihoI9ciKSLJ2uDDpdma3LILIp9siJyOZCQyPM6hnHxEwFAMTHr7zbJRFJBnvkREREEsYgJyIikjAGORERkYQxyImIiCSs0Z3stmTJfKhUhVZts6q9qhNvrMnd3QOxsfOt3i4RETUMjS7IVapCFBQUQGbvaLU2xZ8/bBQWaazWJgCIcq1V2yMiooan0QU5AMjsHdH8oWdsXYbFlfyWausSiIjIxniMnIiISMIY5ERERBLGICciIpIwBjkREZGEMciJiIgkjEFOREQkYQxyIiIiCWOQExERSRiDnIiISMIY5ERERBLGICciIpIwBjkREZGEMciJiIgkzGJ3P9NqtZg5cyYKCgpQVlaG1157Dfv378epU6fg5uYGAJg4cSIGDBiA1NRUbNq0CXK5HKGhoQgJCbFUWURERI2KxYL822+/hY+PDyZNmoScnBy8+OKL6NGjB9544w0EBgYaX6fRaLB69WokJyfD3t4ewcHBGDJkiDHsiYiIqGYWC/KnnnrK+PjKlSto2bLlbV+XkZEBX19fuLi4AAD8/f2Rnp6OgQMHWqo0IiKiRsNiQV5l3LhxuHr1Kj766CNs3LgRW7duRVJSElq0aIG5c+ciPz8fHh4extd7eHggLy+v1mW6uzvBzk5x2+cUiqZ12F+hkMPT08XWZRDZRNX7ne8BasosHuT/+te/cPr0acTExCA2NhZubm7o2rUr1q5di4SEBPTo0aPa64UQJpepUmlqfE6vN9xxzVKi1xuQl1ds6zKIbKLq/c73ADV2tX1ZtVj3NSsrC1euXAEAdO3aFXq9Hp07d0bXrl0BAAMHDsS5c+fg5eWF/Px843y5ubnw8vKyVFlERESNisWC/Pjx49iwYQMAID8/HxqNBm+//Tays7MBAGlpaejUqRP8/PyQmZmJoqIiqNVqpKeno2fPnpYqi4iIqFGx2E/r48aNw+zZsxEeHo7S0lK8/fbbcHJyQnR0NBwdHeHk5IS4uDg4ODhg+vTpmDhxImQyGSZPnmw88Y2IiIhqZ7Egd3BwwLvvvnvL9F27dt0yLSgoCEFBQZYqhYiIqNGy+Mlu1qZWqyHKS1HyW6qtS7E4Ua6FWm365EAiImq8Gl2QExFR07ZjxzYcO5Zm1rxqtRoA4OzsbNb8AQG9ERoaYda85mp0Qe7s7IwyvQzNH3rG1qVYXMlvqXB2drJ1GUREFrFkyXyoVIX1nk+tVkOnKzOrTYOh8pJGc+c/fPgbs79EuLt7IDZ2fr3na3RBTkREjYNKVYiCwgLYOdfzBGg7e8jt7M1qU5SVAgDkzRzMmr8CwI0yXf3nU5s/FgKDnIiIGiw7Zxd0CP2HrcuwuD92rDV73qY1nikREVEjwyAnIiKSMAY5ERGRhDHIiYiIJIxBTkREJGEmg7y8vBxXr14FAJw5cwaff/45tFqtxQsjIiIi00wG+cyZM3Hy5Elcu3YNUVFROHfuHGbOnGmN2oiIiMgEk9eRX7t2DUFBQUhKSkJ4eDheeOEFPP/881YojYiI/q6pDT2qVqtRUVZ6R9dYS0WFuhjqCvMGoTHZI9fpdBBC4ODBgxgwYAAAQKPRmNUYERHZhk5XZvawo9SwmeyR9+rVC48++ij69++PDh06YOPGjejQoYM1aiMior8JDY0wu1ccEzMVABAfv/JulmRRzs7OqLCzbzIjuzk3U5o1r8kgf/PNN/GPf/wDrq6uAIDBgwdjwoQJZjVGREREd5fJID979ixSUlJQXFwMIf6693VcXJxFCyMiIiLTTAb5tGnTMHz4cDz00EPWqIeIiIjqwWSQt2nTBlOmTLFGLURERFRPJoN85MiRWLNmDXr06AE7u79eHhAQYNHCiEhaliyZD5Wq0KptVrVXdSKXNbm7eyA2dr7V2yW6mckgT01NxR9//IEffvjBOE0mk2Hbtm0WLYyIpEWlKkRBQQGa2TtZrU0ZFACAkiLrjjZZVs5LcKnhMBnkhYWFOHTokDVqISKJa2bvBP+uY21dhsWln95l6xKIjEwOCBMQEICLFy9aoxYiIiKqJ5M98h9//BHbtm2Dm5sb7OzsIISATCbD4cOHrVAeERER1cZkkCcmJt4yraioyCLFEBERUf2Y/Gm9TZs20Gq1uHz5Mi5fvoz//e9/eOONN6xRGxEREZlgske+ePFi/Pjjj8jPz0fbtm2RnZ2NF1980Rq1ERFRE1ehLrbq3c/0ZaUAAEUz8+5EZq4KdTHQrIVZ85oM8szMTOzduxeRkZHYsmULsrKycPDgQbMaIyIiqit3dw+rt6nSlAAA7jHzBiZma9bC7PU1GeRKZeXKlJeXQwgBHx8fLFu2zKzGiIiI6soWA+5I8S5xJoO8Q4cO2LZtG3r27IkXXngBHTp0QHFxsTVqI5K0HTu24dixNLPmVavVACpv42iOgIDeZt/ukoikxWSQL1iwADdu3ICrqyu+/PJLFBQU4OWXX7ZGbURNlk5XBsD8ICeipsNkkC9ZsgSzZ88GAIwYMcLiBRE1FqGhEWb3iqX48x4R2YbJy88UCgWOHDmCsrIyGAwG439ERERkeyZ75Dt37sSmTZsghDBOk8lkOH36tEULIyIiItNMBvnPP/9s1oK1Wi1mzpyJgoIClJWV4bXXXoO3tzfeeust6PV6eHp6Ij4+HkqlEqmpqdi0aRPkcjlCQ0MREhJiVptERERNTa1BXlpaCgeHyovi9+7dC7VaDUdHRzz99NMmF/ztt9/Cx8cHkyZNQk5ODl588UX4+/sjPDwcTz75JFasWIHk5GSMGjUKq1evRnJyMuzt7REcHIwhQ4bAzc3t7qwhERFRI1bjMfKsrCwMGzYMFRUVAIA1a9bg559/xvr165GSkmJywU899RQmTZoEALhy5QpatmyJtLQ0DBo0CAAQGBiII0eOICMjA76+vnBxcYGDgwP8/f2Rnp5+N9aNiIio0asxyN99913MmzcPdnaVnXY3NzfExcXhww8/RHJycp0bGDduHN58803ExsZCq9UaB5hp0aIF8vLykJ+fDw+Pv0az8fDwQF5enrnrQ0RE1KTU+NO6Wq3GwIEDjf+u+qm7VatWkMlkdW7gX//6F06fPo2YmJhqJ8z9/fHf1TT979zdnWBnp7jtcwqFyRPxGxWFQg5PTxdbl0F3WdV+LKW/Ld97DZsU9ylbkOJ2MnmyW5VVq1YZH+t0OpOvz8rKQosWLXDfffeha9eu0Ov1cHZ2Nh53v3btGry8vODl5YX8/HzjfLm5uejevXuty1apNDU+p9c3rUvj9HoD8vI40l5jU7UfS+lvy/dewybFfcoWGup2qu2LRY1foZs1a4YLFy7cMv3MmTNwcTH9TeX48ePYsGEDACA/Px8ajQZ9+/bF/v37AQAHDhxA//794efnh8zMTBQVFUGtViM9PR09e/Y0uXwiIiKqpUf+8ssv46WXXsI//vEP+Pr6oqKiAidOnMDGjRuRmJhocsHjxo3D7NmzER4ejtLSUrz99tvw8fHBjBkzsH37drRu3RqjRo2Cvb09pk+fjokTJ0Imk2Hy5Ml1+qJARER0O3dynwOVqhDAX6Mr1pct7nNQY5A/9thjWLVqFT7++GN88sknkMvl8Pb2RlJSEtq2bWtywQ4ODnj33XdvmZ6UlHTLtKCgIAQFBdWzdCIiortLqWxm6xLqrdZj5N7e3oiPj7dWLURERHfsTu5zIEVN6zRTIiKiRoZBTkREJGEMciIiIgkzeR35nj17sG7dOhQVFUEIASEEZDIZDh8+bIXyiIiIqDYmg3zVqlVYvHgxWrdubY16iIiIqB5MBnm7du0QEBBgjVqIiIionkwGeY8ePbBixQr06tULCsVf45v36dPHooURERGRaSaD/KeffgIAnDhxwjhNJpMxyImIiBoAk0G+ZcsWa9RBREREZqgxyBcvXow5c+YgPDz8trct3bZtm0ULIyIiItNqDPLg4GAAQHR0tNWKISIiovqpMci9vb0BAL169bJaMURERFQ/Jo+RS5Eo16Lkt1TrtafXAQBkCqXV2gQq1xNwsmqbRETUsDS6IHd397B6mypVaWXbrtYOVSebrC8RETUcJoP8xo0byM3NRadOnfD999/jl19+QWhoKDw9Pa1RX73Fxs63eptVN6CPj19p9baJiKhpM3nTlJiYGOTm5uJ///sfli5dCjc3N8yePdsatREREZEJJoNcq9WiX79+2LdvHyZMmICIiAiUl5dbozYiIiIyoU5BXlhYiP3792PAgAEQQuDGjRvWqI2IiIhMMHmMfMSIERg6dChCQkJw3333ISEhAb1797ZGbUREjc6SJfOhUhVavd2qNqvO6bEmd3cPm5y/1FSYDPLnnnsOzz33nPHfERERcHd3t2hRRESNlUpViMLCfLg4W/dyVbs/73lVXlZk1XaL1TqrttcUmQzylJQUaLVajBs3DhMmTMDVq1cxadIkhIeHW6M+IqJGx8VZiZcjfG1dhlUkbsu0dQmNnslj5Nu3b0dISAgOHjyITp064dChQ9i7d681aiMiIiITTAZ5s2bNoFQq8d133+HJJ5+EXG5yFiIiIrKSOqXyggULkJ6ejl69euHEiRPQ6XjMg4iIqCEweYz8nXfewVdffYXIyEgoFArk5ORgwYIF1qiNiCRErVajrLwU6ad32boUiysr10CmNti6DCIAdeiRe3l5ISQkBA4ODrh8+TK6deuGOXPmWKM2IiIiMsFkj3zdunVITEyETqeDk5MTysrKMGLECGvURkQS4uzsDKGXw7/rWFuXYnHpp3fB2dnR1mUQAahDj3z//v346aef4Ofnh6NHj+Kdd95Bp06drFEbERERmWAyyJ2dnaFUKo3jqw8aNAiHDh2yeGFERERkmsmf1u+55x6kpqaic+fOmDVrFjp27Ijc3Fxr1EZkcxxOk4gaOpNBvmzZMhQUFGDIkCHYtGkTrl69ihUrVlijNiKbU6kKUViQj+ZWHj9BYag8I1pn5S8RJQaeiU0kNSaD3NHREffffz8A4JVXXrF4QUQNTXO5HBPu8bB1GVax9Yb1f30gojtTY5B7e3ujZcuWsLOrfIkQAjKZzPh/HicnIiKyvRqDfOrUqTh48CDatWuHESNG4IknnjCGel0tX74cP//8MyoqKvDyyy/jm2++walTp+Dm5gYAmDhxIgYMGIDU1FRs2rQJcrkcoaGhCAkJubO1IiIiaiJqTObXXnsNr732Go4fP47PPvsMS5YswWOPPYaRI0fC39/f5IKPHj2K8+fPY/v27VCpVBg9ejT+7//+D2+88QYCAwONr9NoNFi9ejWSk5Nhb2+P4OBgDBkyxBj2REREVDOTXeyePXuiZ8+eKC0tRUpKCl599VU0b97c5E/rAQEBeOSRRwAArq6u0Gq10Ov1t7wuIyMDvr6+cHFxAQD4+/sjPT0dAwcONGd9iIiImhSTQS6EwHfffYddu3bh5MmTGD58OEaOHGlywQqFAk5OTgCA5ORkPP7441AoFNi6dSuSkpLQokULzJ07F/n5+fDw+OtEIg8PD+Tl5d3BKhGRrZSVa6w61nqFvvIGTnYKpdXaBCrXszk4shs1DDUG+R9//IFdu3Zh37598PHxwZgxY/Dee+/V+zj5119/jeTkZGzYsAFZWVlwc3ND165dsXbtWiQkJKBHjx7VXi+EMLlMd3cn2Nkp6lWHJSkUlZcmeXq62LgSutuq/rZNiUIhN2tf9vLytPr2ys/XAgDucXe2aruAM1q0aGHWdtJqNdBqdUjclmmBuhqeYrUOjgYNPx8tqMZUfvLJJ9GhQwcMHDgQrq6uOHXqFE6dOmV8fsqUKSYX/v333+Ojjz7C+vXr4eLigj59+hifGzhwIObPn49hw4YhPz/fOD03Nxfdu3evdbkqlcZk29ak11dee5uXV2zjSuhuq/rbNiV6vcGsfTkmZq4FqjHVZuWAOUuXvm/1tgHz3vMGg+nOSmNjMAh+Pt6h2r4I1RjkcXFxd9RocXExli9fjo0bNxpPXIuKisJbb72FBx54AGlpaejUqRP8/PwwZ84cFBUVQaFQID09HbGxsXfUNhFRQ+Xs7AylnR4vR/jauhSrSNyWCftm1v7FpGmpMchHjx59Rwv+6quvoFKpEB0dbZw2ZswYREdHw9HREU5OToiLi4ODgwOmT5+OiRMnQiaTYfLkycYT34iIiKh29TvgXQ9hYWEICwu7ZfrtviAEBQUhKCjIUqUQERE1Wk3vTB4iIqJGpM49ciFEtTPK5Va+iQQRERHdymSQr1+/Hh999BHUajWAv8ZcP336tMWLIyIiotqZDPJdu3YhNTUVrVu3tkY9REREVA8mfx9v164dQ5yIiKiBMtkj79KlC6ZPn45evXpBofhrNLXg4GCLFkZERESmmQzy3NxcKJVKnDx5stp0BjkREZHtmQzyuLg4GAwGFBQUwNPT0xo1ERERUR2ZPEZ+5MgRDB48GJGRkQCAJUuW4PDhw5aui4iIiOrAZJC/99572LFjh7E3/sorr2DNmjUWL4yIiIhMMxnkTk5OuPfee43/9vDwgL29vUWLIiIioroxeYzcwcEB//nPfwAAN27cwJdffolmzZpZvDAiIiJrO3PmVwCAt/fDNq6k7kwG+bx58zB//nxkZmZi6NCh8Pf3x6JFi6xRGxERkVXt3r0LQCMLcrlcjsTExGrTTp48iTZt2lisKCIiIms7c+ZXnD172vhYKmFuMshfeuklrFq1Cu3btwcArFmzBqmpqdi3b5+la6MGaseObTh2LM2seavG7Hd2djZr/oCA3ggNjTBrXiKi2lT1xqseSyXITZ7sFh8fj+joaHzzzTeIjIzEf//7XyQnJ1ujNmqEdLoy6HRlti6DiKjRMNkj9/b2RmJiIiZNmoR+/fphxowZ1qiLGrDQ0Aize8UxMVMBAPHxK+9mSUREd2zkyLFYvnyx8bFU1Bjk4eHhkMlkxn/LZDLs3LkTv/zyCwBg27Ztlq+OiIjISry9H0aXLl2Nj6WixiCPjo62Zh1EREQ2J6WeeJUag7xXr14AKm+asm/fPjz77LMAKkd6Cw8Pt051RDamVqtRZjBg641CW5diFSUGA5r9eUIiUVMkpZ54FZMnu82aNavayG5dunRBbGysRYsiIiKiujF5sptOp8NTTz1l/PdTTz2FTz/91KJFETUUzs7OsNeVYcI9HrYuxSq23iiE0sxLA4nINkz2yAHg3//+N0pLS6HRaLB///5qJ8ERERGR7ZjskS9atAjz58/HtGnTIJPJGu0QrXcyyIlKVXn8tOrSqvriICdERGQuk0Hevn17bNy40QqlSJdSyZvIEBGRbdQY5IsXL8acOXNuuZ68SmO7jvxOBjkhIiKylRqDPDg4GACvJyciImrIajzZzdvbG0Dl9eTNmzeHwWCAXq83/kdERES2Z/IYeVRUFM6dOwcvLy/jNJlMhj59+li0MCIiIjLNZJDn5ORg//791qiFiIiI6snkdeQdO3aETqezRi1ERERUTzX2yGNiYiCTyVBSUoLhw4fD19cXCoXC+Pzy5cutUiARUWNTrNYhcVumVdssLasAADg0M/lD7F1VrNbBg1foWlSNf9G+fftasw4ioibB3d02w/2WaCoHrrJv5mrVdj2a2W6dm4oag3z06NHIzs7GAw88YJym1Wpx7do1tG/f3hq1ERE1OrGx823SbtXIk/HxK23SPllOjcfIjxw5gvHjx6O4uNg4LTs7Gy+99BKysrLqtPDly5cjLCwMY8eOxYEDB3DlyhVERkYiPDwc06ZNMx57T01NxdixYxESEoKdO3fe4SoRERE1HTUGeUJCAjZs2AAXFxfjtM6dO+PDDz/E+++/b3LBR48exfnz57F9+3asX78eS5YswcqVKxEeHo5PPvkE7dq1Q3JyMjQaDVavXo2NGzdiy5Yt2LRpE65fv3531o6IiKiRqzHIhRDo3LnzLdM7deqEsrIykwsOCAjABx98AABwdXWFVqtFWloaBg0aBAAIDAzEkSNHkJGRAV9fX7i4uMDBwQH+/v5IT083d32IiIialBqDXKPR1DhTXXrMCoUCTk5OAIDk5GQ8/vjj0Gq1UCqVAIAWLVogLy8P+fn58PD460QIDw8P5OXl1XkFiIiImrIaT3br1KkTPv30U4wfP77a9HXr1sHPz6/ODXz99ddITk7Ghg0bMHToUON0IcRtX1/T9L9zd3eCnZ3C5Ouo4VEoKr87enq6mHhlw1BVb1OiUMgl9/eRSr22xG3VeNUY5G+99RYmT56M3bt3w8fHBwaDAenp6WjevDkSExPrtPDvv/8eH330EdavXw8XFxc4OTmhtLQUDg4OuHbtGry8vODl5YX8/HzjPLm5uejevXuty1Wpav61gBo2vd4AAMjLKzbxyoahqt6mRK83SO7vI5V6bYnbStpq+wJWY3fD09MTO3bswLRp09C2bVt07NgRs2fPxtatW+Hs7Gyy0eLiYixfvhyJiYlwc3MDUHltetVwrwcOHED//v3h5+eHzMxMFBUVQa1WIz09HT179qzvOhIRETVJJof46dOnj1k3SPnqq6+gUqmq3QZ16dKlmDNnDrZv347WrVtj1KhRsLe3x/Tp0zFx4kTIZDJMnjy52pnyREREVDOLjdUXFhaGsLCwW6YnJSXdMi0oKAhBQUGWKoWIiKjRanpn8hARETUiDHIiIiIJY5ATERFJGIOciIhIwhjkREREEsYgJyIikjAGORERkYQxyImIiCSMQU5ERCRhDHIiIiIJY5ATERFJGIOciIhIwhjkREREEmaxu58RNRYlBgO23ii0apulBgMAwEFu3e/aJQYDPKzaIhHdKQZ5E7VkyXyoVNYNJwDGNmNiplq1XXd3D8TGzjdrPltQ/7mdlFZu3wO2W2ciMg+DvIlSqQpRUJgPuaN1dwGDXFS2r71uvTa1FWbPa0743w1VX3Ti41fapH0ikg4GeRMmd7SDe1BbW5dhcap9F21dAhGRxfBkNyIiIgljkBMREUkYg5yIiEjCGOREREQSxiAnIiKSMAY5ERGRhDHIiYiIJIxBTkREJGEMciIiIgljkBMREUkYg5yIiEjCGOREREQSxiAnIiKSMN79jIhIInbs2IZjx9LMmlf15z3uq26RW18BAb0RGhph1rxkWQxyIqImQKlsZusSyEIY5EREEhEaGsFeMd2Cx8iJiIgkzKI98nPnzuG1117D888/jwkTJmDmzJk4deoU3NzcAAATJ07EgAEDkJqaik2bNkEulyM0NBQhISGWLIuIGhhzj/3yuC+RBYNco9Fg0aJF6NOnT7Xpb7zxBgIDA6u9bvXq1UhOToa9vT2Cg4MxZMgQY9gTEdWEx32JLBjkSqUS69atw7p162p9XUZGBnx9feHi4gIA8Pf3R3p6OgYOHGip0giAWq2GoawCqn0XbV2KxRm0FVBNIiwDAAAUOklEQVQb1LYug2rBY79E5rPYMXI7Ozs4ODjcMn3r1q149tln8frrr6OwsBD5+fnw8PAwPu/h4YG8vDxLlUVERNSoWPWs9ZEjR8LNzQ1du3bF2rVrkZCQgB49elR7jRDC5HLc3Z1gZ6ewVJlNgqurC3Ql5XAPamvrUixOte8iXJu7wNPTxdal1JlCUfkdW0o1E5FtWDXI/368fODAgZg/fz6GDRuG/Px84/Tc3Fx079691uWoVBqL1dhU6PUGW5dgVXq9AXl5xbYuo86q/j5SqpmILKe2L/VWvfwsKioK2dnZAIC0tDR06tQJfn5+yMzMRFFREdRqNdLT09GzZ09rlkVERCRZFuuRZ2VlYdmyZcjJyYGdnR3279+PCRMmIDo6Go6OjnByckJcXBwcHBwwffp0TJw4ETKZDJMnTzae+EZERES1s1iQ+/j4YMuWLbdMHzZs2C3TgoKCEBQUZKlSiIiIGi2O7EZERCRhDHIiIiIJ401TiCyEt5wkImtgkBM1QBx6lIjqikFOZCEcdpSIrIFB3oQZtNYfa92g0wMA5Errjcxn0FYAjlZrjojIqhjkTZS7u4fpF1mAqrTy2K+7oxXvbudou/UlIrI0majL4OYNDIetlK6qk7fi41fauBIiIuloMEO0EhER0d3FICciIpIwBjkREZGEMciJiIgkjEFOREQkYQxyIiIiCWOQExERSRiDnIiISMIY5ERERBLGICciIpIwBjkREZGEMciJiIgkjEFOREQkYQxyIiIiCWOQExERSRiDnIiISMIY5ERERBLGICciIpIwBjkREZGEMciJiIgkjEFOREQkYQxyIiIiCZMJIYSti6ivvLxiW5fQpO3YsQ3HjqWZNa9KVQgAcHf3MGv+gIDeCA2NMGteIiKp8vR0qfE5OyvWQQSlspmtSyAialTYIyciImrgauuR8xg5ERGRhFk0yM+dO4fBgwdj69atAIArV64gMjIS4eHhmDZtGnQ6HQAgNTUVY8eORUhICHbu3GnJkoiIiBoViwW5RqPBokWL0KdPH+O0lStXIjw8HJ988gnatWuH5ORkaDQarF69Ghs3bsSWLVuwadMmXL9+3VJlERERNSoWC3KlUol169bBy8vLOC0tLQ2DBg0CAAQGBuLIkSPIyMiAr68vXFxc4ODgAH9/f6Snp1uqLCIiokbFYmet29nZwc6u+uK1Wi2USiUAoEWLFsjLy0N+fj48PP66FMnDwwN5eXmWKouIiKhRsdnlZzWdLF+Xk+jd3Z1gZ6e42yURERFJjlWD3MnJCaWlpXBwcMC1a9fg5eUFLy8v5OfnG1+Tm5uL7t2717oclUpj6VKJiIgajAZz+Vnfvn2xf/9+AMCBAwfQv39/+Pn5ITMzE0VFRVCr1UhPT0fPnj2tWRYREZFkWWxAmKysLCxbtgw5OTmws7NDy5Yt8c4772DmzJkoKytD69atERcXB3t7e+zbtw8ff/wxZDIZJkyYgGeeeabWZXNAGCIiakpq65FzZDciIqIGrsH8tE5ERER3F4OciIhIwiT50zoRERFVYo+ciIhIwhjkREREEsYgJyIikjAGORERkYQxyImIiCSMQU5ERCRhDHILO336NFauXGnrMuotLS0NU6dOtdjyU1JSsGzZMostv6FISUnB7Nmz8fbbb1utzTFjxuDSpUtWa89WDh06BJ1Od9eXGxkZiXPnzt315drCwIEDoVarbV1Gg1VeXo6QkBDMmDEDq1atwtatW21dklkY5BbWtWtXiwYiNXyurq5YuHChrctodDZu3Ijy8nJbl0ESlpeXB51OJ/lOhc3uR95QlZeXY+bMmcjJyUGzZs2wZMkSLFy4EBqNBqWlpZg7dy4eeeQRrF27FgcPHoRcLkdgYCBeeeUVHD9+HCtWrICdnR3uu+8+LFq0CCdOnMC2bduwcuVKDBkyBIMHD0Z6ejpcXFywdu1ayOUN97uUWq3Gm2++ibNnz2LYsGEYMmQIFi5cCLlcDmdnZyxduhRnz57Fhg0boNFoMGPGDHz++efIysqCXq/H+PHjMWbMGBw4cAAbNmyAnZ0dfHx8MHPmTADApUuXMGnSJFy9ehXPPfccgoODb7sN5XI5ZsyYgWvXrkGj0SAqKgqBgYGIjIxE3759cfToUahUKnz00Udo3bq1jbfarXJycjBmzBikpKRgyJAhCAsLw7fffgudToekpCQUFRUhJiYGcrkcer0e8fHxaNWqFebOnYvs7GxUVFRg6tSp6NOnD3777TcsXLgQMpnM+DdwdXXF4sWLceLECXTo0KFBh1tKSgqOHTsGlUqF8+fP4/XXX8eePXvw+++/45133kFWVha++OILyOVyDB48GC+++CKuXr2KmJgYAEBFRQWWLVuG9PR0nDx5EpMmTcI///lPzJ49G23btsWJEycwfvx4nD17FhkZGYiIiEBERES99quG5vLly7fsHwkJCcjOzoZOp8PUqVOh1+uxZ88exMfHAwDmzJljXJfExEQcP34cCoUCq1evhrOz8233rZ9++gkffPAB7O3t4erqivfff9/4+SWTyfDf//4Xw4YNw5QpU2y5OW5x8/bp27cv1Go1ZsyYAbVajREjRuCbb7657XsvLi4OFy9exKxZs6p9dixfvhzp6enQ6/WIiIiAl5cX9u/fjwULFuCLL77A2rVr8cUXXyA3NxfTp0/Hhx9+iNjYWNy4cQN6vR5z5syBt7c3hg4discffxwtWrTAq6++armNIKiaHTt2iCVLlgghhNizZ4/YuHGjOHjwoBBCiJ9++klMmTJFCCFE7969RXl5uTAYDGLbtm1CCCFGjhwpVCqVEEKIZcuWid27d4ujR4+KqKgoIYQQXbp0EadPnxZCCBESEiJ+/fVXq65bfRw9elQ88cQTQqPRiJKSEtG7d28RGRkpTp48KYQQYv369eKDDz4QR48eFQMGDBBlZWVCpVKJQYMGCSGE0Ol0Yvv27aKkpESMGjVKlJWVCSGEmDp1qjh+/LjYtWuXGD58uNDpdKKwsFD0799fGAyG227D/Px8kZKSIoQQ4uLFi2L06NFCCCEmTJggNm/eLIQQIj4+XiQlJVlzE9XJrl27RFRUlLHmwMBAcejQISGEENHR0eLgwYNiw4YNIiEhQQghRFZWljhx4oT47LPPxIoVK4QQQhQUFIjhw4cLIYR49tlnxR9//CGEEGLr1q1izZo14vz582L06NFCr9eLy5cvi27duons7Gwrr2nd7Nq1S4wbN04YDAaxfft2MXz4cFFRUSF27NghXnnlFTFhwgRhMBiEwWAQYWFhIicnR2RkZIgjR44IIYTYuXOniIuLE0JUbsuSkhKRnZ0tunfvLgoLC8Uff/whunXrJq5evSouXLggnnnmGSHE7d+bte1XZ8+etfamqdHN+8eqVavE22+/LYQQ4urVq2Lo0KGioqJCDB06VJSWlgq9Xi+efPJJUVZWJgIDA8WePXuEEEIsXbpUbN68ucZ966uvvhIXL14UQggRExMjDh06dMvnQK9evay9+ibdvH0SExPF0qVLhRBClJSUiMDAQCHE7d972dnZxr/7ypUrxZYtW8R//vMf8dJLLwkhhFCr1WLQoEGiqKhIBAcHCyGEmD9/vnj22WdFUVGR2Lt3r1i1apVISEgQO3bsEEIIcf78efH8888b2/zuu+8svg3YI7/JqVOn0KdPHwDA008/jeLiYixcuBAff/wxdDodnJycAADDhg3DCy+8gOHDh+OZZ55Bfn4+Lly4gKioKACARqOBu7s7WrZsaVx28+bN4e3tDQBo1aoViosb9l3cHn74YTg6OgIAhBD4/fff4efnBwDo3bs3EhIS0Lt3b3Tp0gVKpRJKpRLt27fHq6++iqCgIIwaNQqnT5/G5cuXMXHiRABAcXExLl++DADw9/eHvb093N3d0bx5cxQUFNx2G7q6uiIzMxPbt2+HXC7H9evXjTVW3bu+VatW1aY3ZH+vubi4GP369cOUKVNQXFyMYcOGoUePHvjss8/w888/Iz09HQBQVlYGnU6HX375BXPnzgUA6HQ6+Pr64rfffoOfnx/kcjnuu+8+PPDAAzZbt7rw8fGBTCaDp6cnunTpAoVCgXvvvRdnz55FRUUFnn32WQCVvwjl5OTg/vvvx+LFi7Fq1SoUFRWhW7dutyyzbdu2cHd3h1KphIeHB1q2bAm1Wo3i4uIa35u17VcNyc37x/Xr19G7d28AQMuWLaFUKlFcXIwBAwbgu+++g6enJ3r27AmlUgkAxtf6+vri+PHj0Ov1t923PDw8MGfOHOj1emRnZ+P//u//4OzsXO1zoCG6efvce++9UKlUt33tze+928nKykJAQAAAwMnJCQ899BAuXrwIpVIJrVaLy5cvY8iQIcjIyEB6ejqGDBmCxMREFBYWIjU1FQCg1WqNy3vkkUfu5ureFoP8JgqFAgaDwfjvTZs2oWXLloiPj0dmZiaWL18OAFiwYAF+//137N27F5GRkfj444/h5eWFLVu2VFteWlpatWX/nWjgw9zb2dW8e5SXlxsPC1R9YADA+vXrcerUKezZswe7d+/Gm2++CR8fH3z88cfV5k9JSYFMJqs2TaFQ3HYbfvbZZ7hx4wY++eQTXL9+HcHBwdXmqdLQt2eVm2vu3Lkzdu/ejR9//BErVqzA2LFjYW9vj1deeQXDhw+vNq+joyM2b95cbdvt3bu32iGav++/DdHf96u/P75x4waefvrpW84nmDVrFh577DGMHz8e+/btw+HDh29Z5t+36c37rb29fb33q4bk5v0jJycHPXr0MD6v0+kgl8sxatQorFu3Dm3atKm23/x9X5HJZDXuW7GxsVi7di06duxY7W9Q2+dAQ3Dz9hkzZozxuYqKimqvrcvnxc2fS1WfdY8++iiOHDkCZ2dn+Pn54bvvvsOvv/6KN998E/b29pg7d261v0sVe3v7O1m9Omm4B2htxNfXF0ePHgUAfPvtt/jwww/Rtm1bAMDXX3+N8vJyFBcXIyEhAR07dsSUKVNwzz33GD9If/vtNwDAli1bcObMGdushIV06tQJJ06cAAAcO3YMPj4+1Z6/dOkSNm/ejG7dumHGjBm4fv06OnTogN9//x0FBQUAgJUrV+LatWsAgJMnT0Kv16OwsBBarRZubm4Abt2GKpUK999/P+RyOQ4ePGiRM5Vt6csvv8T58+cxePBgTJs2DVlZWfDz88OhQ4cAAAUFBVixYgUAwNvbG//+97+N8x05cgQdOnTAqVOnIIRATk4OcnJybLYud6Jbt25IS0uDVquFEAKLFy9GaWkpVCoV2rZtCyEEDh06ZDwHQCaTQa/Xm1zuPffcA0C6+9XN+4dMJjN2EK5cuQK5XA5XV1d07doV165dwy+//GLsUQLA8ePHAQAZGRl48MEHa9y3SkpKcN9996GoqAhpaWkN+lyLv7t5+2zYsAG5ubkAgJ9//rney/Px8TFuX7VajYsXL6Jdu3YICAjA5s2b8cgjj8Db2xsZGRlwcHCAUqmEn58fvv76awCV+1lSUtLdW8E6aNhftWzgqaeewk8//YQJEybAzs4OSUlJmDdvHvbt24eIiAjs2bMHBw4cgEqlQnBwMJycnNCjRw+4ubnhn//8J2bNmmXsAYSFhRmDrzGYM2cOFixYAJlMhnvuuQdxcXE4deqU8XkvLy+cOHECX331Fezt7TF27Fg4OjoiNjYWkyZNglKpxMMPPwwvLy8AwIMPPohp06bhwoULiI6Ohkwmu+02bN68OV599VWcPHkSY8eORatWrZCQkGCrzXDXtW/fHvPmzYOTkxMUCgXmzJmDdu3a4ejRoxg3bhz0er3xBKPZs2dj7ty5WLduHZo1a4Z3330Xbm5u6Ny5M8LCwtC+fXvj4Rupad26NYYNG4aIiAgoFAoMHjwYDg4OCAsLw6JFi9CmTRtERkZi7ty5+OGHH9CrVy+Eh4cjLi7O5LKlvF/dvH+sWbMGmzdvRmRkJMrLy6v1nvv16we1Wl2tV3n+/Hl8+umnAICoqCg4ODjcdt8KDw/H+PHj0b59e7z00ktYtWoV3njjDeuurBlu3j7x8fGIjY1FZGQknnjiiVt62Kb07NkTPj4+iIiIQEVFBaZPnw4nJyf4+/vj1VdfRXR0NOzt7aHRaNCvXz8AwIQJEzBr1iyEh4fDYDBg9uzZlljVGvE2pkREjYAQAi+88AIWLFiAdu3a2bocsiL+tE5EJHGXLl3C2LFj0bdvX4Z4E8QeORERkYSxR05ERCRhDHIiIiIJY5ATERFJGC8/I2riLl26hKCgoFsGs4iNjUXXrl3NXu748eMRHR1tHFmMiCyDQU5E8PDwuGXkMyKSBgY5Ed3WjRs3MG/ePBQWFqKkpAQvvPACRowYAZ1Oh4ULF+LChQtQq9UYPnw4XnzxRWi1Wrz++utQqVRo164dysrKbL0KRE0Cg5yIbuv9999H//79MXbsWGg0GowcORL9+vVDSkoKvLy8sHjxYuj1eoSGhqJv377GISu3b9+O3NxcDBo0yNarQNQkMMiJCIWFhYiMjKw2LT8/H5mZmfj8888BVN4849KlS0hLS8PVq1dx7NgxAJU37bh48SLOnTuHRx99FEDlcL0PPvigdVeCqIlikBPRbY+Rjxo1CvPmzYOvr2+16UqlEpMnT0ZQUFC16UePHpXUXdiIGgtefkZEt/Xoo49i7969AIDS0lLMnz8fFRUV1aYbDAbExcXh+vXr6Nixo/EmQVeuXMEff/xhs9qJmhIGORHd1pQpU3DhwgWMHz8eERERePjhh2FnZ4eIiAg4OTkhLCwMoaGhcHFxgZubG0aOHAmVSoXw8HC89957t/TkicgyONY6ERGRhLFHTkREJGEMciIiIgljkBMREUkYg5yIiEjCGOREREQSxiAnIiKSMAY5ERGRhDHIiYiIJOz/AY6vwnMR8p2XAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#NOTICE WE USE Seaborn not PLT\n", "sns.boxplot(data=data_as_arrays)\n", "#Need to replace the ticks (0, through N - 1) with the names of categories\n", "plt.xticks(range(len(categories)), categories)\n", "plt.xlabel('Feed')\n", "plt.ylabel('Chicken Mass in Grams')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The box plot shows a quite a bit of information. It should the median in as a horizontal line over the box. The box itself shows the middle two quartiles and the \"whiskers\" show the bottom 10th and upper 90th percentiles of the data. The points outside of the boxs are outliers. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Violin Plots\n", "====\n", "\n", "Just like how we saw that you can use kernel density estimation to provide richer information, we can apply this to boxplots as well." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.violinplot(data=data_as_arrays)\n", "#Need to replace the ticks (0, through N - 1) with the names of categories\n", "plt.xticks(range(len(categories)), categories)\n", "plt.xlabel('Feed')\n", "plt.ylabel('Chicken Mass in Grams')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In this plot, you can see the original boxes inside the \"violins\". The violins show the same thing we saw above with Kernel Density Estimation and shows how spread the data is." ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }