{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# HIDDEN\n", "from datascience import *\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "plots.style.use('fivethirtyeight')\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variability ###\n", "The mean tells us where a histogram balances. But in almost every histogram we have seen, the values spread out on both sides of the mean. How far from the mean can they be? To answer this question, we will develop a measure of variability about the mean.\n", "\n", "We will start by describing how to calculate the measure. Then we will see why it is a good measure to calcualte." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Rough Size of Deviations from Average ###\n", "For simplicity, we will begin our calcuations in the context of a simple array `any_numbers` consisting of just four values. As you will see, our method will extend easily to any other array of values." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "any_numbers = make_array(1, 2, 2, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal is to measure roughly how far off the numbers are from their average. To do this, we first need the average: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3.75" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 1. The average.\n", "\n", "mean = np.mean(any_numbers)\n", "mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's find out how far each value is from the mean. These are called the *deviations from the average*. A \"deviation from average\" is just a value minus the average. The table `calculation_steps` displays the results." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
Value | Deviation from Average | \n", "
---|---|
1 | -2.75 | \n", "
2 | -1.75 | \n", "
2 | -1.75 | \n", "
10 | 6.25 | \n", "
Value | Deviation from Average | Squared Deviations from Average | \n", "
---|---|---|
1 | -2.75 | 7.5625 | \n", "
2 | -1.75 | 3.0625 | \n", "
2 | -1.75 | 3.0625 | \n", "
10 | 6.25 | 39.0625 | \n", "
Name | Position | Height | Weight | Age in 2013 | \n", "
---|---|---|---|---|
DeQuan Jones | Guard | 80 | 221 | 23 | \n", "
Darius Miller | Guard | 80 | 235 | 23 | \n", "
Trevor Ariza | Guard | 80 | 210 | 28 | \n", "
James Jones | Guard | 80 | 215 | 32 | \n", "
Wesley Johnson | Guard | 79 | 215 | 26 | \n", "
Klay Thompson | Guard | 79 | 205 | 23 | \n", "
Thabo Sefolosha | Guard | 79 | 215 | 29 | \n", "
Chase Budinger | Guard | 79 | 218 | 25 | \n", "
Kevin Martin | Guard | 79 | 185 | 30 | \n", "
Evan Fournier | Guard | 79 | 206 | 20 | \n", "
... (495 rows omitted)
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nba13.select('Height').hist(bins=np.arange(68, 88, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is no surprise that NBA players are tall! Their average height is just over 79 inches (6'7\"), about 10 inches taller than the average height of men in the United States." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "79.065346534653472" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean_height = np.mean(nba13.column('Height'))\n", "mean_height" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "About how far off are the players' heights from the average? This is measured by the SD of the heights, which is about 3.45 inches." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3.4505971830275546" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sd_height = np.std(nba13.column('Height'))\n", "sd_height" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The towering center Hasheem Thabeet of the Oklahoma City Thunder was the tallest player at a height of 87 inches." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "Name | Position | Height | Weight | Age in 2013 | \n", "
---|---|---|---|---|
Hasheem Thabeet | Center | 87 | 263 | 26 | \n", "
Roy Hibbert | Center | 86 | 278 | 26 | \n", "
Tyson Chandler | Center | 85 | 235 | 30 | \n", "
... (502 rows omitted)
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nba13.sort('Height', descending=True).show(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thabeet was about 8 inches above the average height." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "7.9346534653465284" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "87 - mean_height" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's a deviation from average, and it is about 2.3 times the standard deviation:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2.2995015194397923" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(87 - mean_height)/sd_height" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In other words, the height of the tallest player was about 2.3 SDs above average.\n", "\n", "At 69 inches tall, Isaiah Thomas was one of the two shortest NBA players in 2013. His height was about 2.9 SDs below average." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "Name | Position | Height | Weight | Age in 2013 | \n", "
---|---|---|---|---|
Isaiah Thomas | Guard | 69 | 185 | 24 | \n", "
Nate Robinson | Guard | 69 | 180 | 29 | \n", "
John Lucas III | Guard | 71 | 157 | 30 | \n", "
... (502 rows omitted)
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nba13.sort('Height').show(3)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "-2.9169868288775844" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(69 - mean_height)/sd_height" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we have observed is that the tallest and shortest players were both just a few SDs away from the average height. This is an example of why the SD is a useful measure of spread. No matter what the shape of the histogram, the average and the SD together tell you a lot about where the histogram is situated on the number line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First main reason for measuring spread by the SD\n", "\n", "**Informal statement.** In all numerical data sets, the bulk of the entries are within the range \"average $\\pm$ a few SDs\".\n", "\n", "For now, resist the desire to know exactly what fuzzy words like \"bulk\" and \"few\" mean. We wil make them precise later in this section. Let's just examine the statement in the context of some more examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have already seen that *all* of the heights of the NBA players were in the range \"average $\\pm$ 3 SDs\". \n", "\n", "What about the ages? Here is a histogram of the distribution, along with the mean and SD of the ages." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbsAAAEqCAYAAACMU/74AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlUVOfdB/DvFfQlQBCUzaqICgrG1IW4RBGXVMWIC/W1\nDSZ6NBGjpE0EDIhL00ZliUpMGjWxoKFiwIiKgknU1C0E3FpNiVGhKnhIlEUWHXgBYeb9wzLHkcU7\nzFxm5s73cw7nOPc+997fwwN+ubtQWVmpAhERkYx1MnQBREREUmPYERGR7DHsiIhI9hh2REQkeww7\nIiKSPYYdERHJHsOOiIhkz+Bhl52djaCgIAwaNAgODg5ISUlRz2toaMB7772HsWPHomfPnvDy8kJw\ncDCKiooMWDEREZkag4dddXU1nnvuOcTGxsLa2lpjXk1NDXJzcxEREYEzZ84gJSUFRUVFmDt3LpRK\npYEqJiIiUyMY0xNUevXqhY0bNyIoKKjVNtevX8fo0aORnZ0Nb2/vDqyOiIhMlcH37LR1//59CIIA\ne3t7Q5dCREQmwqTC7uHDh1izZg2mTZuGHj16GLocIiIyEZaGLkCsxsZGBAcH48GDB9i7d6+hyyEi\nIhNiEmHX2NiI119/HdeuXcORI0d4CJOIiLRi9GHX0NCARYsW4fr16zhy5AgcHR0NXRIREZkYg5+z\nq66uRm5uLv79739DqVSiqKgIubm5KCoqQmNjIxYsWIB//etfSEhIgEqlQklJCUpKSlBbW2vo0o1O\nfn6+oUswCHPtN8C+myNz7beuDB52ly5dgp+fHyZMmIDa2lrExMRg/PjxiImJwc8//4yvv/4ad+7c\nwYQJE+Dl5aX+OnjwoKFLJyIiE2Hww5i+vr6oqKhodX5b84iIiMQw+J4dERGR1Bh2REQkeww7IiKS\nPYYdERHJHsOOiIhkj2FHRESyx7AjIiLZY9gREZHsMeyIiEj2GHZERCR7DDsiIpI9hh0REckew46I\niGSPYUdERLLHsCMiItlj2BERkewx7IiISPYYdkREJHsMOyIikj2GHRERyR7DjoiIZI9hR0REssew\nIyIi2WPYERGR7DHsiIhI9hh2REQkeww7IiKSPYYdERHJHsOOiIhkz+Bhl52djaCgIAwaNAgODg5I\nSUlp1iYmJgbe3t7o0aMHAgICcO3aNQNUSkREpsrgYVddXY3nnnsOsbGxsLa2bjZ/y5Yt2L59OzZu\n3IiTJ0/CyckJgYGBqK6uNkC1RERkigwedpMnT8aaNWswc+ZMCILQbP6nn36K0NBQBAQEwMvLC9u3\nb4dCoUBaWpoBqiUiIlNk8LBrS0FBAYqLizFx4kT1NCsrK4wZMwbnzp0zYGVERGRKLA1dQFtKSkog\nCAKcnJw0pjs5OeHu3bsGqoq0UVxagbKK+1otY231P6iprRPd3lIw6h9jIjIC/F+CJFVWcR/vb/lC\nq2WWvzEbWxLTRbcPWzxT27KIyMwYddg5OztDpVKhtLQUPXv2VE8vLS2Fs7Nzm8vm5+dLXZ5RMrZ+\nKxT1Wl9M1NDYoPUyxtbvjsS+mx9z7Lenp6dOyxt12Lm7u8PFxQUnT57E0KFDAQC1tbXIycnB+vXr\n21xW12+MKcrPzze6fl/JK4SNjY1Wy1haWGq9jLH1u6MY45h3FHPtu7n2W1cGD7vq6mrcvHkTKpUK\nSqUSRUVFyM3NhYODA3r16oVly5YhPj4eHh4e6N+/PzZt2gRbW1vMmTPH0KWTkbCyssKVvEKtlnF0\nsIOLk4NEFRGRsTF42F26dAkzZsxQ33YQExODmJgYBAUFYevWrXjnnXdQW1uLiIgIVFZWwsfHBwcO\nHND6L3+Sr/uKGnzyeaZWy/xp+TyGHZEZMXjY+fr6oqKios02kZGRiIyM7KCKiIhIboz6PjsiIiJ9\nMPieHZkWbe+bq6t7KGE1RETiMOxIK9reN7f8jdkSVkNEJA4PYxIRkexxz45IhPY89oy3NxAZD4Yd\nkQjteewZb28gMh48jElERLLHsCMiItkTHXZxcXH46aefWp1/9epVxMXF6aUoIiIifRIddrGxsbhy\n5Uqr8xl2RERkrPR2GFOhUKBz5876Wh0REZHetHk15o8//ojc3Fz155ycHDQ0NDRrV1lZiZ07d/K1\nE0REZJTaDLvMzEz1oUlBELBr1y7s2rWrxbb29vbYsWOH/iskIiLSUZtht3DhQvj7+0OlUmHSpElY\ntWoVJk+e3KydjY0N+vbtC0tL3rZHRETGp810cnV1haurKwAgIyMDAwcOhJOTU4cURkREpC+id8V8\nfX2lrIOIiEgyrYbdW2+9BUEQ8NFHH8HCwgJvvfXWU1cmCAI++eQTvRZIRESkq1bD7syZM+jUqROU\nSiUsLCxw5swZCILQ5sqeNp+IiMgQWg27x285aOkzERGRqeCzMYmISPbada+AQqFAZWUlVCpVs3m9\ne/fWuSgiIiJ9Eh12tbW1iIuLw+7du1FeXt5qu7bmERERGYLosAsPD0dKSgqmT5+OF198Efb29lLW\nRUREpDeiwy4jIwMLFizAli1bpKyHiIhI70RfoCIIAoYMGSJlLURERJIQHXYvv/wyTp06JWEpRERE\n0hAdduHh4bh16xbefvttXLx4EXfv3kVpaWmzLyIiImMj+pzdiBEjADy6uTw5ObnVdrwak4iIjI3o\nsIuIiODjwIiIyCSJDruoqCgp62iVUqlEdHQ09u3bh+LiYri4uGDu3LlYtWoVOnXiA2CIiOjpjP5t\nqx9++CF27tyJTz/9FN7e3rhy5QqWLVsGKysrrFixwtDlERGRCRAddnFxcU9tIwgCIiIidCroSefP\nn4e/vz+mTJkC4NHjyPz9/XHx4kW9boeIiORLdNjFxsa2Ok8QBKhUKknC7sUXX0RiYiLy8/Ph6emJ\na9eu4bvvvkN4eLhet0NERPIlOuwqKiqaTVMqlbh9+zYSEhKQnZ2NtLQ0vRYHAMuXL4dCocCoUaNg\nYWGBxsZGhIeHY9GiRXrfFhERyZNO5+w6deoEd3d3rF+/HsHBwYiIiEBCQoK+agMA7N+/H6mpqdi5\ncycGDhyI3NxcREZGok+fPnjttddaXS4/P1+vdZgKqfutUNSjurpadPuGxgat2rd3GW3bKxQKrb5X\n2va7PdtoL3P9WQfMt+/m2G9PT0+dltfbBSpjxozBe++9p6/Vqb333nt4++23MXv2bACAt7c3bt++\njQ8//LDNsNP1G2OKmg71SulKXiFsbGxEt7e0sNSqfXuX0ba9ra0tPD37iG6vbb/bs4326IgxN1bm\n2ndz7beu9Hbt/qVLlyS5FaCmpqbZejt16gSlUqn3bRERkTyJ3rNLSUlpcXpVVRWys7PVb0XQN39/\nf2zZsgVubm7w8vLCDz/8gG3btmHevHl63xYREcmT6LALCQlpdV737t0RGhqq9ysxAWDjxo3YsGED\nVqxYgbKyMri4uGDhwoWSbIuIiORJdNj98MMPzaYJggB7e3s8++yzei3qcTY2NoiOjkZ0dLRk2yAi\nInkTHXZubm5S1kFERCQZPlySiIhkj2FHRESyx7AjIiLZY9gREZHsMeyIiEj2RIVdTU0NunXrhk2b\nNkldDxERkd6JCjtra2s4OjrCzs5O6nqIiIj0TvRhzNmzZ+PgwYN8JiUREZkc0TeVBwQE4LvvvoO/\nvz8WLFgAd3d3PPPMM83a+fj46LVAIiIiXYkOu1mzZqn/feHCBQiCoDG/6U3l5eXl+quOiIhID0SH\n3datW6Wsg4iISDKiw46v1CEiIlPVrvvsbty4gbNnz6Kqqkrf9RAREemdVmG3b98+DB48GCNGjMDL\nL7+My5cvAwDu3bsHHx8fHDx4UJIiiYiIdCE67A4dOoQlS5ZgwIABeP/996FSqdTzunfvjgEDBiA1\nNVWSIomIiHQh+pzd5s2bMWHCBBw4cADl5eVYu3atxvwXXngBO3fu1HuBRFIQIOBKXqHo9nV1DyWs\nhoikJjrs8vLysGHDhlbnOzk5oaysTC9FEUmt6kE1tiSmi26//I3ZElZDRFITfRjT2toa1dXVrc6/\ndesWunfvrpeiiIiI9El02Pn5+eGLL75AfX19s3l37txBUlISJk2apNfiiIiI9EH0Ycy1a9fipZde\nwoQJEzB79mwIgoDjx4/j5MmTSEpKgoWFBSIjI6WslcikaHte0NHBDi5ODhJWRGS+RIdd//79cfTo\nUaxcuRKxsbFQqVTqp6qMGzcO8fHx6N27t2SFEpkabc8L/mn5PIYdkUREhx0ADBw4EAcPHkRlZSVu\n3rwJpVIJd3d3ODo6SlUfERGRzrQKuyb29vYYPny4vmshIiKShFZhV1lZia1bt+Lo0aO4ffs2AMDN\nzQ1Tp07FW2+9BXt7e0mKJCIi0oXoqzFv3rwJX19fbNq0CQ0NDRg3bhzGjRuHhoYGbNq0CWPHjsWN\nGzekrJWIiKhdRO/Zvfvuu7h//z4OHToEPz8/jXmnT5/G/PnzERkZibS0NL0XSUREpAvRe3Y5OTlY\nunRps6ADgPHjx+PNN99Edna2XosjIiLSB9Fh17Vr1zbPydnb26Nr1656KYqIiEifRIfd/PnzkZyc\njAcPHjSbV1VVheTkZCxYsECvxTUpLi7GsmXL4OHhAVdXV7z44ovciyQiItFEn7Pz9PSEIAh44YUX\nEBQUhH79+gF49CLX1NRUODk5wdPTs9k77QIDA3UqsKqqClOnTsWYMWOQlpaGbt26oaCgAE5OTjqt\nl4iIzIfosFuyZIn63x999FGz+SUlJViyZInGe+4EQdA57D766CP06NED27ZtU09zc3PTaZ1ERGRe\nRIddRkaGlHW06quvvsJvfvMbvP766/juu+/g6uqKBQsWIDg42CD1EBGR6REddr6+vlLW0aqCggIk\nJiYiJCQEoaGhyM3NRUREBARBwOLFiw1SExERmZZ2PS6sIymVSvj4+KjfjP7888/jxo0bSEhIaDPs\n8vPzO6pEoyJ1vxWK+jbfa/ikhsYGrdq3dxmpt9ER/VAoFO0aP3P9WQfMt+/m2G9PT0+dljf6sHNx\nccGAAQM0pg0YMACfffZZm8vp+o0xRfn5+ZL3+0peIWxsbES3t7Sw1Kp9e5eRehsd0Q9bW1t4evbR\nahsdMebGylz7bq791pXoWw8MZfTo0c3+isnPz+frhIiISDSjD7uQkBBcvHgRmzdvxq1bt5Ceno4d\nO3bwAhUiIhLN6MNu2LBh2LNnDw4ePIgxY8Zgw4YNWLt2LV5//XVDl0ZERCZC9Dm7uLg4zJgxA4MG\nDWpx/tWrV3H48GFERkbqrbgmkydPxuTJk/W+XnNXXFqBsor7Wi1TV/dQomqIiKQjOuxiY2PRr1+/\nNsMuLi5OkrAjaZRV3Mf7W77Qapnlb8yWqBoiIuno7TCmQqFA586d9bU6IiIivWlzz+7HH39Ebm6u\n+nNOTg4aGhqatausrMTOnTt5OSwRERmlNsMuMzMTcXFxAB4953LXrl3YtWtXi23t7e2xY8cO/VdI\nRESkozbDbuHChfD394dKpcKkSZOwatWqFi8UsbGxQd++fWFpafT3qBMRkRlqM51cXV3h6uoK4NGD\noAcOHMhX6xARkckx+gdBExER6Uqr447/+Mc/sHv3bhQUFKCyslLj3XXAo/N6ly9f1muBREREuhId\ndh9//DH+/Oc/w9nZGcOHD2/1fjsiIiJjIzrsPv30U/j5+WHfvn28n46IiEyK6JvKKysrMWvWLAYd\nERGZHNF7dj4+Pmb5wkAi0k57nrnq6GAHFycHiSoi0iLsNm3ahLlz52Lo0KH43e9+J2VNRGTC2vPM\n1T8tn8ewI0mJDrsFCxagvr4eS5cuRWhoKHr06AELCwuNNoIg4OzZs3ovkoiISBeiw87R0RFOTk7w\n8PCQsh4iIiK9Ex12R44ckbIOIiIiyRj9m8qJiIh0pVXYlZeXY/369Zg6dSqGDx+O8+fPq6fHxcXh\n+vXrkhRJRESkC9GHMQsLCzFt2jSUl5dj0KBBKCgowP/93/8BALp164YDBw6grKwMGzdulKxYIiKi\n9hAddu+99x5UKhXOnj2LZ599ttmFKi+//DLP6xERkVESfRjz1KlTCA4Ohru7OwRBaDa/T58++OWX\nX/RaHBERkT6IDru6ujrY29u3Or+qqgqdOvF6FyIiMj6i08nb2xvff/99q/OPHDmCX//613opioiI\nSJ9En7NbtmwZ3nzzTXh7eyMwMBAAoFQqkZeXhw8++AAXL17Enj17JCuUiJpTwhJX8gpFt+czKMlc\niQ67uXPnoqioCNHR0YiOjgYAzJkzBwDQqVMn/OUvf8G0adOkqZKIWlT5oAbxCYdFt+czKMlcafWm\n8tDQUMydOxeHDx/GzZs3oVQq0bdvX8yYMQPu7u4SlUhERKQbrcIOAHr16oWQkBApaiEiIpKE6LA7\ne/YssrOzERYW1uL8Dz/8EGPHjsXIkSP1VhwR6ZcAQatzfADP85E8iA67uLi4Nm89+PHHH5GVlYX9\n+/frpTAi0r+qB9XYkpiu1TI8z0dyIPrWg3//+99t7rWNGDECP/zwg16Kakt8fDwcHBwQEREh+baI\niEgeRIddTU1Ni09OeZxCodC5oLZcuHABSUlJGDx4sKTbISIieRF9GNPDwwMnTpzA0qVLW5z/7bff\nol+/fnor7ElVVVVYsmQJtm7ditjYWMm2Y8q0veeqru6hhNUQERkP0WG3YMECREREICIiAlFRUXBw\neHQMv7y8HDExMThx4gQ2bNggWaHLly9HYGAgfH19JduGqdP2nqvlb8yWsBoiIuMhOuyCg4ORm5uL\nv/3tb0hISICzszMAoKSkBCqVCvPmzcOyZcskKTIpKQkFBQVITEyUZP1ERCRvWt1n9/HHH6tvKi8o\nKAAAuLu7Y9asWZLtcf3nP//BunXrcPToUa0eNJ2fny9JPcauurpadNuGxgat2rdnmY7YBqBdv9uz\njY7oh0KhaNfPrdT90LYuhaJe8m00Mdffc3Pst6enp07Liwq7+vp6XLhwAa6urhg3bhzGjRun00a1\ncf78eZSXl2PUqFHqaY2NjcjOzsauXbvwyy+/oHPnzs2W0/UbY4rO/esKbGxsRLe3tLDUqn17lumI\nbQCQRT9sbW3h6dlHq210xJhrW9eVvELJtwE8+g/fHH/PzbXfuhIVdpaWlpg9ezaio6PRv39/qWvS\nEBAQgOHDh2tMCwkJgYeHB8LDw1sMOiIioseJCrtOnTrBzc1N8lsLWmJnZwc7OzuNadbW1rC3t8fA\ngQM7vB4iIjI9ok+CLV26FJ9//jlKS0ulrEeUp93vR0RE9DjRF6jU1NTA2toaw4cPx/Tp0+Hu7o5n\nnnlGo40gCHj77bf1XuSTMjIyJN8GERHJh+iw+/Of/6z+9969e1ts01FhR0REpA3RYdcRz70kIiKS\nguiwc3Nzk7IOIiIiyWj98tYbN24gKysLpaWlmDt3Lvr06YP6+noUFxfDxcUFXbp0kaJOIiKidhMd\ndkqlEqGhodi9ezdUKhUEQcCIESPUYTd27Fi8++67+OMf/yhlvURERFoTfevB5s2bkZycjNWrV+P4\n8eNQqVTqeba2tpgxYwYyMzMlKZKIiEgXosNuz549eO211xAeHt7iq3wGDRqEGzdu6LU4IiIifRB9\nGPOXX36Bj49Pq/OfeeYZgzxhhUguBAhavY/wEfEPRycyZ6LDztnZGbdv3251/uXLl9G7d2+9FEVk\njqoeVGNLYrpWy/xhYYBE1RDJi+g/C2fOnImdO3dqHKpsemzX8ePHkZqaitmz+TJQIiIyPqLDbuXK\nlejVqxf8/PwQHBwMQRAQHx+P3/zmN/j973+PwYMHIywsTMpaiYiI2kV02NnZ2eHYsWMICwtDSUkJ\nrKyscPbsWVRXV2PlypX46quvmj0rk4iIyBhodVO5lZUVwsPDER4eLlU9REREevfUsKutrcVXX32F\nwsJCdOvWDVOnToWrq2tH1EZERKQXbYbdnTt38PLLL6OwsFB9E7m1tTVSU1Mxbty4DimQiIhIV22e\ns1u/fj1u376NkJAQ7N27FzExMbCyskJkZGRH1UdERKSzNvfsTp06haCgIKxfv149zdnZGYsXL8bP\nP/+Mnj17Sl4gERGRrtrcsysuLsaoUaM0po0ePRoqlQpFRUWSFkZERKQvbYZdY2MjrKysNKY1fa6t\nrZWuKiIiIj166tWYBQUF+Oc//6n+fP/+fQBAfn4+bG1tm7Vv6/mZREREhvDUsIuJiUFMTEyz6RER\nERqfm95xV15err/qiIiI9KDNsNu6dWtH1UFERCSZNsNu3rx5HVUHERGRZLR6XBgRmR9t37NXV/dQ\nwmqI2odhR0Rt0vY9e8vf4Ku+yPjwNcdERCR7DDsiIpI9hh0REcme0YddfHw8Jk2aBDc3N3h4eOCV\nV17B1atXDV0WERGZEKMPu+zsbAQHB+PYsWPIyMiApaUlZs+ejcrKSkOXRkREJsLor8ZMS0vT+PzZ\nZ5/Bzc0N586dw9SpUw1UFRERmRKj37N70oMHD6BUKmFvb2/oUoiIyESYXNitXLkSQ4YMwciRIw1d\nChERmQijP4z5uFWrVuH8+fP45ptvIAiCocshIiITYTJhFxUVhfT0dGRmZsLNze2p7fPz8zugKuNT\nXV0tum1DY4NW7duzTEdsA9Cu3+3ZhrF+rwB5jLlCoWjX76y5/p6bY789PT11Wt4kwi4yMhKHDh1C\nZmYm+vfvL2oZXb8xpujcv67AxsZGdHtLC0ut2rdnmY7YBgBZ9KM92wC067ux9sPW1haenn20WiY/\nP98sf8/Ntd+6MvqwW7FiBb788kvs2bMHdnZ2KCkpAfDoF7w9/zEQEZH5MfqwS0xMhCAImDVrlsb0\nyMhIREZGGqgqIiIyJUYfdhUVFYYugYiITJzJ3XpARESkLYYdERHJHsOOiIhkz+jP2RERtUQJS1zJ\nKxTd3tHBDi5ODhJWRMaMYUdEJqnyQQ3iEw6Lbv+n5fMYdmaMhzGJiEj2GHZERCR7DDsiIpI9hh0R\nEckew46IiGSPYUdERLLHWw+IyCwIELS6Lw/gvXlywrAjIrNQ9aAaWxLTtVqG9+bJBw9jEhGR7DHs\niIhI9ngY00gVl1agrOK+lkvxbxcyTe05n8afd9IGw85IlVXcx/tbvtBqmT8sDJCoGiJpted8Gn/e\nSRv804iIiGSPYUdERLLHw5hERK3Q9lwi78szXgw7IqJWaHsukfflGS8exiQiItlj2BERkezxMCYR\nkQFpe0+tpcD/ttuD3zUiIgPS9p7asMUzJaxGvngYk4iIZI9hR0REssfDmEREetKeZ3zW1T2UqBp6\nHMOOiEhP2vOMz+VvzJaoGnqcyRzGTEhIwJAhQ+Dq6ooJEyYgJyfH0CUREZGJMIk9uwMHDiAqKgrx\n8fEYPXo0/va3v2Hu3Lk4d+4cevbs2eIylferRa37fzpb4pln/kef5RIRkZExibDbtm0bXnvtNcyf\nPx8A8MEHH+Af//gHdu7cibVr17a4zMqYnaLWHbo4EAP799Jbra3R9l4aHscnopZYWVlpfV6Qz+w0\ngbB7+PAhLl++jD/+8Y8a0ydNmoRz5861ulxFlULU+pVKlU71iaXtvTQ8jk9ELbmvqMEnn2dqtQyf\n2WkC5+zu3buHxsZGODs7a0x3cnJCSUmJgaoiIiJTIlRWVnbMrk073b17F97e3vjqq6/w4osvqqd/\n8MEHSEtLw/nz5w1YHRERmQKj37Pr3r07LCwsmu3FlZaWNtvbIyIiaonRh13nzp0xdOhQnDp1SmP6\nyZMnMXr0aMMURUREJsXoL1ABgLfeegtLly7FsGHDMHr0aCQmJqK4uBgLFy40dGlERGQCTCLsAgMD\nUVFRgc2bN6O4uBje3t7Yt28fevWS/pYBIiIyfUZ/gQoREZGujP6cXUuys7MRFBSEQYMGwcHBASkp\nKRrzQ0JC4ODgoPE1ZcoUA1WrP/Hx8Zg0aRLc3Nzg4eGBV155BVevXm3WLiYmBt7e3ujRowcCAgJw\n7do1A1SrP2L6LdcxT0hIwNixY+Hm5gY3NzdMmTIFx44d02gjt/Fu8rS+y3XMnxQfHw8HBwdERERo\nTJfruD+upb63d9xNMuyqq6vx3HPPITY2FtbW1i22mThxIvLz85GXl4e8vDx8+eWXHVyl/mVnZyM4\nOBjHjh1DRkYGLC0tMXv2bFRWVqrbbNmyBdu3b8fGjRtx8uRJODk5ITAwENXV4h6fZozE9BuQ55j3\n7NkT77//Ps6cOYNTp07Bz88Pr776Kn766ScA8hzvJk/rOyDPMX/chQsXkJSUhMGDB2tMl/O4N2mt\n70D7xt0kw27y5MlYs2YNZs6cCUEQWmzTpUsXODo6wsnJCU5OTrC3t+/gKvUvLS0NQUFB8PLygre3\nNz777DOUlZVpPEnm008/RWhoKAICAuDl5YXt27dDoVAgLS3NgJXrRky/AXmO+bRp0/DSSy/B3d0d\n/fr1w5o1a2Bra4sLFy4AkOd4N3la3wF5jnmTqqoqLFmyBFu3bkXXrl015sl53IG2+w60b9xNMuzE\nOHv2LDw9PfHCCy/gnXfeQVlZmaFL0rsHDx5AqVSqB7qgoADFxcWYOHGiuo2VlRXGjBnT5qPVTM2T\n/W4i9zFXKpXYv38/ampqMGrUKLMZb6B535vIecyXL1+OwMBA+Pr6akw3h3Fvre9N2jPuJnE1prYm\nT56MmTNnok+fPrh9+zbWrVuHmTNn4vTp0+jcubOhy9OblStXYsiQIRg5ciQAoKSkBIIgwMnJSaOd\nk5MT7t69a4gSJfFkvwF5j/lPP/2EKVOmoLa2Fra2tkhOToaXlxfOnz8v+/Fure+AvMc8KSkJBQUF\nSExMbDZP7r/nbfUdaP+4yzLsAgMD1f/29vbGkCFD8Pzzz+Po0aMICAgwYGX6s2rVKpw/fx7ffPNN\nq4dy5ai1fst5zAcMGICsrCxUVVXh8OHDWLp0KY4cOWLosjpEa3338vKS7Zj/5z//wbp163D06FF0\n6iTbg28tEtP39o67LMPuSa6urvjVr36FmzdvGroUvYiKikJ6ejoyMzPh5uamnu7s7AyVSoXS0lKN\n9/zJ5dH/3excAAAKPElEQVRqrfW7JXIac0tLS7i7uwMAhgwZgn/+85/Ytm0bwsLCZD3eQOt9//jj\nj5u1lcuYnz9/HuXl5RqHaxsbG5GdnY1du3YhJydHtuP+tL7/8ssvzfbexI67WfzZUFZWhjt37sDF\nxcXQpegsMjISBw8eREZGBvr3768xz93dHS4uLjh58qR6Wm1tLXJyckz+0Wpt9bslchrzJymVStTV\n1cl6vFvT1PeWyGXMAwICkJ2djaysLPXXsGHD8L//+7/IysqCh4eHbMf9aX1v6TCl2HE3yT276upq\n3Lx5EyqVCkqlEkVFRcjNzVXfcxEbG4uZM2fCxcUFhYWFWLduHZydnU360AYArFixAl9++SX27NkD\nOzs79cOxbWxsYGNjAwBYtmwZ4uPj4eHhgf79+2PTpk2wtbXFnDlzDFm6Tp7W7+rqatmO+V/+8hdM\nmTIFPXv2hEKhwL59+/D9999j3759AOQ53k3a6rucx9zOzg52dnYa06ytrWFvb4+BAwcCkO+4P63v\nuoy7SYbdpUuXMGPGDPU5m5iYGMTExCAoKAibN2/GTz/9hL1796KqqgouLi7w8/PD559/rg4EU5WY\nmAhBEDBr1iyN6ZGRkYiMjAQAvPPOO6itrUVERAQqKyvh4+ODAwcOmHTfn9ZvCwsL2Y55cXEx3nzz\nTZSUlMDOzg7PPfcc9u/fjwkTJgCQ53g3aavvtbW1sh3zljx5Xl7O4/6kx/uuy+86HxdGRESyZxbn\n7IiIyLwx7IiISPYYdkREJHsMOyIikj2GHRERyR7DjoiIZI9hR0REssewI3rM22+/DQcHB6xevdpg\nNWRlZcHBwQHff/+9XtZ3+vRpBAcHY8iQIejRoweGDRuG8PDwFl+LUldXh7Vr18LLyws9evTAlClT\nkJ2d3azdJ598gldeeQVeXl5wcHBAXFxci9sOCQnBqFGj4Obmhl69esHX1xc7duyAUqnUS9+IxGLY\nEf1XbW0t0tPTIQgC0tLSDPYf8tChQ/Htt99iyJAhelnfrl27cO/ePaxYsQL79+9HWFgYvv76a0ye\nPBk1NTUabf/whz9g9+7dWL16Nfbu3QsXFxfMmTMHP/74o0a73bt34969ewgICGjzrRv19fV48803\nkZSUhOTkZEycOBErV6406B8TZJ74BBWi/0pLS0NwcDCmTJmC48ePIzU1FVOmTDF0WTorLy9Ht27d\nNKZlZ2dj+vTp+OSTT/Dqq68CAHJzc+Hn54dt27YhKCgIwKMnzo8ePRqenp744osvmq27sbERjo6O\nWLlypfqRdU+zePFiHDt2DLdv39axZ0Ticc+O6L9SUlLg4OCA7du3w8rKCikpKS22S0tLw8iRI+Hq\n6oqxY8fi66+/RkBAAGbMmKHR7t69ewgNDcWgQYPg4uKCkSNHIikp6al1tHQYc/r06Zg2bRpOnz6N\n8ePH41e/+hXGjBmDzMzMp67vyaADgOHDhwMA7ty5o5729ddfo0uXLhrvC7OwsMBvf/tbnDhxAg8f\nPnzqtsRwcHCAhYWFXtZFJJZJPgiaSN/u3r2L06dPY9GiRejWrRumT5+OzMxMVFVVoWvXrup2J0+e\nxJIlSzB9+nRER0ejrKwMUVFRqKurg4eHh7rdgwcPMHXqVNTV1SEqKgpubm44ceIEwsLCUF9fj+Dg\n4DbrefLQoCAIuHXrFqKiohAWFoZu3brhr3/9KxYtWoQLFy6o3/kmVlZWFoBHL0dtcv36dfTp0wdW\nVlYabb29vVFfX4+bN2+qn7qvrcbGRigUCpw6dQqpqalYvnx5u9ZD1F4MOyIAe/fuhVKpxCuvvAIA\nCAoKQlpaGg4ePIiFCxeq28XExMDLywu7d+9WT/Py8sLEiRM1wm779u34+eefkZOTow6i8ePHo7Ky\nEnFxcXjjjTe0fgt1eXk5vvnmG/X6fv3rX2PgwIE4ePAgQkNDRa9HoVAgKioK3t7eGq9FqaiogL29\nfbP2Dg4O6vntcfToUfX3tVOnTggNDUV4eHi71kXUXjyMSQQgNTUVHh4e8PHxAQBMmDABPXr00DiU\nqVQqcfnyZcycOVNj2aFDh6JPnz4a006cOAEfHx/07t0bjY2N6q9JkyahvLwc165d07rG/v37a+zB\nOTo6wsnJCUVFRaLX0djYiDfeeAPFxcVITEzUOnDbY8yYMTh58iQOHTqE0NBQfPzxx1i/fr3k2yV6\nHPfsyOxdunQJ165dQ2hoKKqqqgAAKpUKAQEBSEhIwM2bN9GvXz/cu3cPDx8+hJOTU7N1ODs7a3wu\nLS3FrVu34Ojo2KytIAgoLy/Xus6W9rq6dOmC2tpaUcurVCosXboUZ86cwZdffglvb+9m628pOJv2\n6Jr28LT17LPPYujQoQAAPz8/WFpaYtOmTVi8eDFcXV3btU4ibTHsyOw1XWW4ZcsWfPjhh+rpTefN\nUlJSsHr1anTv3h2dO3dGaWlps3WUlJSgd+/e6s/dunWDs7MzYmNjoVI1v+DZ09NT3914quXLlyM9\nPR1///vfMW7cuGbzvby8cOTIEdTW1mqct7t69Sq6dOmCfv366aWOYcOGQalUorCwkGFHHYaHMcms\nPXz4EAcOHMCIESOQkZGBzMxM9VdGRgYGDx6MvXv3Anh0vmnYsGE4fPiwxjouX76MwsJCjWkvvfQS\n8vLy0KtXLwwdOrTZV0e/UXr16tVITk7Gtm3bMG3atBbb+Pv7o76+Hunp6eppjY2NSE9Px6RJk9C5\nc2e91JKVlQVBELS+qIZIF9yzI7P2zTffoLy8HNHR0Rg7dmyz+YsWLUJYWBiysrLg6+uLqKgoBAYG\n4tVXX8XChQtRVlaGuLg4uLq6apz/CgkJQXp6Ovz9/RESEgIPDw/U1NQgLy8POTk5Ld6z9riW9gbb\na8uWLdi2bRvmz5+Pvn374uLFi+p5jo6OGhe8/Pa3v0VUVBTq6+vRp08fJCYm4vbt20hISNBYZ1PA\nN914f/36dRw6dAgAMHXqVFhZWeHYsWPYs2cP/P390atXLygUChw/fhx///vf8frrr8PFxUVvfSR6\nGoYdmbXU1FTY2dlh1qxZLc6fM2cO1qxZg5SUFPj6+mLChAlISEhAXFwc5s+fj379+mHDhg2Ii4uD\nnZ2dejk7OzscPXoUH3zwAT766CPcuXMHXbt2hYeHR7MLXFrS0lNJWpvW1hNMAODbb7+FIAhITk5G\ncnKyxrygoCBs3bpV/Xnbtm1Yt24doqOjUVVVhcGDB2P//v14/vnnNZbbsWMHUlNT1TWkp6er9wh/\n+OEH9O7dG3379oVKpcKGDRtQVlaGrl27ol+/fvjss88wZ86cp34PiPSJT1Ah0tHPP/8MHx8fvPvu\nu7yknshIcc+OSAu1tbVYvXo1xo8fj+7du+PWrVv461//ChsbG8yfP9/Q5RFRKxh2RFqwsLBAcXEx\nIiMjUV5eDmtra4wZMwZJSUnNbj8gIuPBw5hERCR7vPWAiIhkj2FHRESyx7AjIiLZY9gREZHsMeyI\niEj2GHZERCR7/w9K83/dvmcjAgAAAABJRU5ErkJggg==\n", "text/plain": [ "Name | Position | Height | Weight | Age in 2013 | \n", "
---|---|---|---|---|
Juwan Howard | Forward | 81 | 250 | 40 | \n", "
Marcus Camby | Center | 83 | 235 | 39 | \n", "
Derek Fisher | Guard | 73 | 210 | 39 | \n", "
... (502 rows omitted)
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nba13.sort('Age in 2013', descending=True).show(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Howard's age was about 3.2 SDs above average." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3.1958482778922357" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(40 - mean_age)/sd_age" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The youngest was 15-year-old Jarvis Varnado, who won the NBA Championship that year with the Miami Heat. His age was about 2.6 SDs below average." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "Name | Position | Height | Weight | Age in 2013 | \n", "
---|---|---|---|---|
Jarvis Varnado | Forward | 81 | 230 | 15 | \n", "
Giannis Antetokounmpo | Forward | 81 | 205 | 18 | \n", "
Sergey Karasev | Guard | 79 | 197 | 19 | \n", "
... (502 rows omitted)
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nba13.sort('Age in 2013').show(3)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "-2.5895811038670811" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(15 - mean_age)/sd_age" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "What we have observed for the heights and ages is true in great generality. For *all* lists, the bulk of the entries are no more than 2 or 3 SDs away from the average. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Chebychev's Bounds ###\n", "The Russian mathematician [Pafnuty Chebychev](https://en.wikipedia.org/wiki/Pafnuty_Chebyshev) (1821-1894) proved a result that makes our rough statements precise.\n", "\n", "**For all lists, and all numbers $z$, the proportion of entries that are in the range\n", "\"average $\\pm z$ SDs\" is at least $1 - \\frac{1}{z^2}$.**\n", "\n", "It is important to note that the result gives a bound, not an exact value or an approximation.\n", "\n", "What makes the result powerful is that it is true for all lists – all distributions, no matter how irregular. \n", "\n", "Specifically, it says that for every list:\n", "\n", "- the proportion in the range \"average $\\pm$ 2 SDs\" is **at least 1 - 1/4 = 0.75**\n", "\n", "- the proportion in the range \"average $\\pm$ 3 SDs\" is **at least 1 - 1/9 $\\approx$ 0.89**\n", "\n", "- the proportion in the range \"average $\\pm$ 4.5 SDs\" is **at least 1 - 1/$\\boldsymbol{4.5^2}$ $\\approx$ 0.95**\n", "\n", "As we noted above, Chebychev's result gives a lower bound, not an exact answer or an approximation. For example, the percent of entries in the range \"average $\\pm ~2$ SDs\" might be quite a bit larger than 75%. But it cannot be smaller." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standard units\n", "\n", "In the calculations above, the quantity $z$ measures *standard units*, the number of standard deviations above average.\n", "\n", "Some values of standard units are negative, corresponding to original values that are below average. Other values of standard units are positive. But no matter what the distribution of the list looks like, Chebychev's bounds imply that standard units will typically be in the (-5, 5) range.\n", "\n", "To convert a value to standard units, first find how far it is from average, and then compare that deviation with the standard deviation.\n", "$$\n", "z ~=~ \\frac{\\mbox{value }-\\mbox{ average}}{\\mbox{SD}}\n", "$$\n", "\n", "As we will see, standard units are frequently used in data analysis. So it is useful to define a function that converts an array of numbers to standard units." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def standard_units(numbers_array):\n", " \"Convert any array of numbers to standard units.\"\n", " return (numbers_array - np.mean(numbers_array))/np.std(numbers_array) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example ###\n", "As we saw in an earlier section, the table `united` contains a column `Delay` consisting of the departure delay times, in minutes, of over thousands of United Airlines flights in the summer of 2015. We will create a new column called `Delay (Standard Units)` by applying the function `standard_units` to the column of delay times. This allows us to see all the delay times in minutes as well as their corresponding values in standard units. " ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "Date | Flight Number | Destination | Delay | Delay (Standard Units) | \n", "
---|---|---|---|---|
6/1/15 | 73 | HNL | 257 | 6.08766 | \n", "
6/1/15 | 217 | EWR | 28 | 0.287279 | \n", "
6/1/15 | 237 | STL | -3 | -0.497924 | \n", "
6/1/15 | 250 | SAN | 0 | -0.421937 | \n", "
6/1/15 | 267 | PHL | 64 | 1.19913 | \n", "
6/1/15 | 273 | SEA | -6 | -0.573912 | \n", "
6/1/15 | 278 | SEA | -8 | -0.62457 | \n", "
6/1/15 | 292 | EWR | 12 | -0.117987 | \n", "
6/1/15 | 300 | HNL | 20 | 0.0846461 | \n", "
6/1/15 | 317 | IND | -10 | -0.675228 | \n", "
... (13815 rows omitted)
\n", " \n", "... (13815 rows omitted)
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "united.hist('Delay (Standard Units)', bins=np.arange(-5, 15.5, 0.5))\n", "plots.xticks(np.arange(-6, 17, 3));" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.5" } }, "nbformat": 4, "nbformat_minor": 0 }