{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Thinking probabilistically - Discrete variables\n", "> A Summary of lecture \"Statistical Thinking in Python (Part 1)\", via datacamp\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Data_Science, Statistics]\n", "- image: images/bin-dist.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "%matplotlib inline\n", "\n", "sns.set()\n", "\n", "df = pd.read_csv('./dataset/iris.csv')\n", "renamed_columns = ['sepal length (cm)', 'sepal width (cm)', \n", " 'petal length (cm)', 'petal width (cm)', 'species']\n", "df.columns = renamed_columns\n", "versicolor_petal_length = df[df['species'] == 'Versicolor']['petal length (cm)']\n", "setosa_petal_length = df[df['species'] == 'Setosa']['petal length (cm)']\n", "virginica_petal_length = df[df['species'] == 'Virginica']['petal length (cm)']\n", "versicolor_petal_width = df[df['species'] == 'Versicolor']['petal width (cm)']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random number generators and hacker statistics\n", "- Hacker statistic\n", " - Uses simulated repeated measurements to compute probabilities\n", " - E.g. Coin Flips\n", " \n", "- np.random module\n", " - Suite of functions based on random number generation\n", " - ```np.random.random()```: draw a number between 0 and 1\n", " \n", "- Bernoulli trial\n", " - An experiment that has two options, \"success\" (True) and \"failure\" (False).\n", " \n", "- Random number seed\n", " - Integer fed into random number generating algorithm\n", " - Manually seed random number generator if you need reproducibility\n", " - Specified using ```np.random.seed()```\n", "\n", "- Hacker stats probabilities\n", " - Determine how to simulate data\n", " - Simulate many many times\n", " - Probability is approximately fraction of trials with the outcome of interest\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generating random numbers using the np.random module\n", "We will be hammering the np.random module for the rest of this course and its sequel. Actually, you will probably call functions from this module more than any other while wearing your hacker statistician hat. Let's start by taking its simplest function, ```np.random.random()``` for a test spin. The function returns a random number between zero and one. Call ```np.random.random()``` a few times in the IPython shell. You should see numbers jumping around between zero and one.\n", "\n", "In this exercise, we'll generate lots of random numbers between zero and one, and then plot a histogram of the results. If the numbers are truly random, all bars in the histogram should be of (close to) equal height.\n", "\n", "You may have noticed that, in the video, Justin generated 4 random numbers by passing the keyword argument ```size=4``` to ```np.random.random()```. Such an approach is more efficient than a for loop: in this exercise, however, you will write a for loop to experience hacker statistics as the practice of repeating an experiment over and over again." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Seed the random number generator\n", "np.random.seed(42)\n", "\n", "# Initialize random numbers: random_numbers\n", "random_numbers = np.empty(100000)\n", "\n", "# Generate random numbers by looping over range(100000)\n", "for i in range(100000):\n", " random_numbers[i] = np.random.random()\n", "\n", "# Plot a histogram\n", "_ = plt.hist(random_numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The np.random module and Bernoulli trials\n", "You can think of a Bernoulli trial as a flip of a possibly biased coin. Specifically, each coin flip has a probability p of landing heads (success) and probability 1−p of landing tails (failure). In this exercise, you will write a function to perform n Bernoulli trials, ```perform_bernoulli_trials(n, p)```, which returns the number of successes out of n Bernoulli trials, each of which has probability p of success. To perform each Bernoulli trial, use the ```np.random.random()``` function, which returns a random number between zero and one." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def perform_bernoulli_trials(n, p):\n", " \"\"\"Perform n Bernoulli trials with success probability p and return number of successes.\"\"\"\n", " # Initialize number of successes: n_successes\n", " n_success = 0\n", " \n", " # Perform trials\n", " for i in range(n):\n", " # Choose random number between zero and one: random_number\n", " random_number = np.random.random()\n", " \n", " # If less than p, it`s a success so add one to n_success\n", " if random_number < p:\n", " n_success += 1\n", " \n", " return n_success" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How many defaults might we expect?\n", "Let's say a bank made 100 mortgage loans. It is possible that anywhere between 0 and 100 of the loans will be defaulted upon. You would like to know the probability of getting a given number of defaults, given that the probability of a default is p = 0.05. To investigate this, you will do a simulation. You will perform 100 Bernoulli trials using the ```perform_bernoulli_trials()``` function you wrote in the previous exercise and record how many defaults we get. Here, a success is a default. (Remember that the word \"success\" just means that the Bernoulli trial evaluates to True, i.e., did the loan recipient default?) You will do this for another 100 Bernoulli trials. And again and again until we have tried it 1000 times. Then, you will plot a histogram describing the probability of the number of defaults.\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Seed random number generator\n", "np.random.seed(42)\n", "\n", "# Initialize the number of defaults: n_defaults\n", "n_defaults = np.empty(1000)\n", "\n", "# Compute the number of defaults\n", "for i in range(1000):\n", " n_defaults[i] = perform_bernoulli_trials(100, 0.05)\n", " \n", "# Plot the histogram with default number of bins; label your axes\n", "_ = plt.hist(n_defaults, density=True)\n", "_ = plt.xlabel('number of defaults out of 100 loans')\n", "_ = plt.ylabel('probability')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Will the bank fail?\n", "Plot the number of defaults you got from the previous exercise, in your namespace as n_defaults, as a CDF. \n", "\n", "If interest rates are such that the bank will lose money if 10 or more of its loans are defaulted upon, what is the probability that the bank will lose money?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def ecdf(data):\n", " \"\"\"Compute ECDF for a one-dimensional array of measurements.\"\"\"\n", " # Number of data points: n\n", " n = len(data)\n", "\n", " # x-data for the ECDF: x\n", " x = np.sort(data)\n", "\n", " # y-data for the ECDF: y\n", " y = np.arange(1, n + 1) / n\n", "\n", " return x, y" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of losing money = 0.022\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Compute ECDF: x, y\n", "x, y = ecdf(n_defaults)\n", "\n", "# Plot the ECDF with labeled axes\n", "_ = plt.plot(x, y, marker='.', linestyle='none')\n", "_ = plt.xlabel('x')\n", "_ = plt.ylabel('y')\n", "\n", "# Compute the number of 100-loan simulations with 10 or more defaults: n_lose_money\n", "n_lose_money = np.sum(n_defaults >= 10)\n", "\n", "# Compute and print probability of losing money\n", "print('Probability of losing money =', n_lose_money / len(n_defaults))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Probability distributions and stories: The Binomial distribution\n", "- Probability mass function (PMF)\n", " - The set of probabilities of discrete outcomes\n", "- Probability distribution\n", " - A mathmatical description of outcomes\n", "- Binomial distribution\n", " - The number r of successes in n Bernoulli trials with probability p of success, is Binomially distributed\n", " - The number r of heads in 4 coin flips with probability 0.5 of heads, is Binomially distributed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sampling out of the Binomial distribution\n", "Compute the probability mass function for the number of defaults we would expect for 100 loans as in the last section, but instead of simulating all of the Bernoulli trials, perform the sampling using ```np.random.binomial()```. This is identical to the calculation you did in the last set of exercises using your custom-written ```perform_bernoulli_trials()``` function, but far more computationally efficient. Given this extra efficiency, we will take 10,000 samples instead of 1000. After taking the samples, plot the CDF as last time. This CDF that you are plotting is that of the Binomial distribution.\n", "\n", "Note: For this exercise and all going forward, the random number generator is pre-seeded for you (with ```np.random.seed(42)```) to save you typing that each time." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEJCAYAAACUk1DVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAYD0lEQVR4nO3dfXBU9aHG8WeDIZgLc6NhN+mgg21tpdIE++IQKE2EgURiNkiIw4s1dqABrBrNMGgqCE4tEKw0LbU6TQZwvAQnEeUlvTakbeSOJXEYmIuxvCiUiyjCZkkUCQmQZM/9o4etG/JC4p6cHPP9/OP+cnbPeWaH+Ozv/HLOugzDMAQAGPQi7A4AABgYKAQAgCQKAQBgohAAAJIoBACAiUIAAEiiEAAApuvsDvBlfPrpBQUCvb+MIjZ2uBoamixIZA0n5XVSVslZeZ2UVXJWXidllfqeNyLCpRtu+I8utzu6EAIBo0+FcOW1TuKkvE7KKjkrr5OySs7K66SskjV5OWUEAJBEIQAATBQCAEBSPxRCU1OTMjIy9PHHH1+17fDhw8rKylJaWpqWLVumtrY2q+MAALpgaSG8++67mjt3rk6cONHp9qVLl2rFihXatWuXDMNQeXm5lXEAAN2wtBDKy8u1cuVKeTyeq7adOnVKFy9e1B133CFJysrKUmVlpZVxADjc7gOntK7sf7X7wKmw7XN+YbW8S3ZofmF12PYpSYt+/ZbmF1Zr0a/fCut+i3ce1Lyn31TxzoNh3a9kcSGsWrVKP/zhDzvdVl9fL7fbHRy73W75fD4r4wBwsN0HTumVyvd18P8+1SuV74elFDqWQLhKYdGv31Jr+7/+LLS13QhbKRTvPKh3Dvl0vrlV7xzyhb0UbLsOIRAIyOVyBceGYYSMr0Vs7PA+H9/tHtHn19rBSXmdlFVyVl6nZPUu2RF8XLFuRlj2+Url+1eN75s2Jiz7/qJwvMdXyuCL43Ds9+CJxqvG4fw3YVshxMfHy+/3B8dnz57t9NRSdxoamvp0cYbbPUJ+//lev84uTsrrpKySs/I6JWvHT9neJTu0sWCKJcey4v0Ixz4jh7hCSiFyiCss+x17y41655AvZNyb/UZEuLr9IG3bn52OGjVKUVFR2r9/vyRpx44dSk5OtisOgEGoY1GFq7j+uHSyIof864xH5BCX/rh0clj2uzBzrJJuj9OI6Egl3R6nhZljw7LfK/p9hpCbm6u8vDwlJCTo+eef1/Lly9XU1KSxY8cqJyenv+MAg9oXP81b9Sk+XKKjhqj5UnvIOBw2FkyxZPYVrhLoaGHmWMtmi/1SCNXV//5HV1JSEnw8ZswYbd26tT8iAOigswXVgVwKL+Sn6JGi/1HzpXZFRw3RC/kpdkf6ynH0ze0ADC6UgLW4dQUAQBKFACDMrFqohfU4ZQQg7KxaqIW1mCEAACQxQwAcwUl/HgrnYoYADHBW3W8H6IhCAABIohCAQSvn7tu6HWPwYQ0BGKTuumOUJGn/+/X6wW2e4BiDF4UADGJ33TGKIkAQp4wAAJIoBACAiUIAAEiiEAAAJgoBACCJQgAAmCgEAIAkrkMAwo4b0cGpmCEAYcSN6OBkFAIAQBKFAAAwUQjAAMd3FKO/sKgMOADfUYz+wAwBACCJQgAAmCgEAIAkCgEAYKIQAACSKAQAgIlCAABIsrgQKioqlJ6ertTUVJWWll61/eDBg5o1a5YyMzO1aNEiff7551bGAQB0w7JC8Pl8Kioq0pYtW7R9+3aVlZXp2LFjIc9ZtWqV8vLytHPnTn3961/Xhg0brIoDAOiBZYVQU1OjpKQkxcTEKDo6WmlpaaqsrAx5TiAQ0IULFyRJLS0tGjZsmFVxAAA9sOzWFfX19XK73cGxx+NRXV1dyHMKCgo0f/58rV69Wtdff73Ky8t7dYzY2OF9zud2j+jza+3gpLxOyipZnzec++e9tY6TskrW5LWsEAKBgFwuV3BsGEbI+OLFi1q2bJlefvllJSYmatOmTXryySdVXFx8zcdoaGhSIGD0OpvT7gnjpLxOyir1T95w7Z/31jpOyir1PW9EhKvbD9KWnTKKj4+X3+8Pjv1+vzweT3D8wQcfKCoqSomJiZKk2bNna+/evVbFAQD0wLJCmDhxompra9XY2KiWlhZVVVUpOTk5uH306NE6c+aMjh8/Lkn629/+poSEBKviAAB6YNkpo7i4OOXn5ysnJ0etra3Kzs5WYmKicnNzlZeXp4SEBK1Zs0aPP/64DMNQbGysVq9ebVUcAEAPLP0+BK/XK6/XG/KzkpKS4OOUlBSlpKRYGQHoUvHOgzp4olFjb7lRCzPH2h0HsB1fkINBqXjnQb1zyCdJwf9SChjsuHUFBqUrJdDVGBiMKAQAgCQKAQirjQVTuh0DAxlrCECYUQJwKmYIAABJFAIAwEQhAAAkUQgAABOFAACQRCEAAEwUAgBAEoUAADBRCAAASRQCAMBEIQAAJFEIAAAThQAAkEQhAABMFAIAQBKFAAAwUQgAAEkUAgDARCEAACRRCAAAE4UAAJBEIQAATBQCAEAShQAHOHbqnP679oSOnTpndxTgK+06uwMA3Tl26pxW/9f+4PipB36gW0f9p42JgK8uS2cIFRUVSk9PV2pqqkpLS6/afvz4cT3wwAPKzMzUggULdO4cnwAR6otl0NkYQPhYVgg+n09FRUXasmWLtm/frrKyMh07diy43TAMPfTQQ8rNzdXOnTv1ne98R8XFxVbFAQD0wLJCqKmpUVJSkmJiYhQdHa20tDRVVlYGtx88eFDR0dFKTk6WJC1evFj333+/VXGAEBsLpnQ7BgYjy9YQ6uvr5Xa7g2OPx6O6urrg+OTJkxo5cqSeeuopHT58WN/4xjf09NNPWxUHuMrGgilyu0fI7z9vdxRgQLCsEAKBgFwuV3BsGEbIuK2tTXv37tXmzZuVkJCg3/72tyosLFRhYeE1HyM2dnif87ndI/r8Wjs4Ka/VWcO9f95b6zgpr5OyStbktawQ4uPjtW/fvuDY7/fL4/EEx263W6NHj1ZCQoIkKSMjQ3l5eb06RkNDkwIBo9fZnPap0El5+yNrOPfPe2sdJ+V1Ulap73kjIlzdfpC2bA1h4sSJqq2tVWNjo1paWlRVVRVcL5Ck733ve2psbNSRI0ckSdXV1Ro7dqxVcQAAPbBshhAXF6f8/Hzl5OSotbVV2dnZSkxMVG5urvLy8pSQkKA//OEPWr58uVpaWhQfH6/nnnvOqjgAgB5YemGa1+uV1+sN+VlJSUnw8bhx47R161YrIwAArhG3rgAASKIQAAAmCgEAIIlCAACYKAQAgCQKAQBgohAAAJIoBACAiUIAAEiiEAAAJgoBACCJQgAAmCgEAICkayiERx99VDU1Nf2RBQBgox4LYdq0aXrxxReVlpamDRs26LPPPuuPXACAftZjIWRmZmrz5s168cUX1dDQoOzsbC1dulR1dXX9kQ8A0E+uaQ0hEAjoww8/1IkTJ9Te3q7Y2Fg988wzWr9+vdX5AAD9pMdvTCsqKtIbb7yhm2++WfPmzdPvfvc7RUZGqrm5WZMnT1ZeXl5/5AQAWKzHQmhsbFRJSYnGjBkT8vPo6GitW7fOsmAAgP7VYyE8++yzXW6bNGlSWMMAAOzDdQgAAEkUAgDARCEAACRRCAAAE4UAAJBEIQAATD3+2SlwrR5at1uXWgOKiozQS0vusjsOgF5ihoCwuFIGknSpNaCH1u22NxCAXqMQEBZXyqCrMYCBj0IAAEiiEDDAbSyY0u0YQPhYuqhcUVGhl156SW1tbXrwwQd1//33d/q83bt365e//KWqq6utjAOHogSA/mFZIfh8vuCts4cOHao5c+Zo/PjxuvXWW0Oed/bsWa1du9aqGACAa2TZKaOamholJSUpJiZG0dHRSktLU2Vl5VXPW758uR555BGrYgAArpFlM4T6+nq53e7g2OPxXPW1m6+88opuv/12jRs3rk/HiI0d3ud8bveIPr/WDk7LKzkns1NySs7KKjkrr5OyStbktawQAoGAXC5XcGwYRsj4gw8+UFVVlV5++WWdOXOmT8doaGhSIGD0+nVu9wj5/ef7dEw7OC3vFU7I7KT31klZJWfldVJWqe95IyJc3X6QtuyUUXx8vPx+f3Ds9/vl8XiC48rKSvn9fs2aNUsLFy5UfX295s2bZ1UcAEAPLCuEiRMnqra2Vo2NjWppaVFVVZWSk5OD2/Py8rRr1y7t2LFDxcXF8ng82rJli1VxAAA9sKwQ4uLilJ+fr5ycHN17773KyMhQYmKicnNz9d5771l1WABAH1l6HYLX65XX6w35WUlJyVXPu+mmm7gGAQBsxpXKAABJFAIAwEQhAAAkUQgAABOFAACQRCEAAEwUAgBAEoUAADBRCAAASRQCAMBEIQAAJFEIAAAThQAAkEQhAABMFAIAQBKFAAAwUQgAAEkUAgDARCEAACRRCAAA03V2B4A95hdWBx9vLJhiYxIAAwUzhEHoi2XQ2RjA4EQhAAAkUQgAABOFgLDouA7BugTgPCwqI2w2FkyR2z1Cfv95u6MA6ANmCAAASRQCAMBEIQAAJFEIAACTpYVQUVGh9PR0paamqrS09Krtf/3rXzVjxgxlZmbq5z//uc6dO2dlHABANywrBJ/Pp6KiIm3ZskXbt29XWVmZjh07Ftze1NSkZ555RsXFxdq5c6duu+02/f73v7cqDgCgB5YVQk1NjZKSkhQTE6Po6GilpaWpsrIyuL21tVUrV65UXFycJOm2227T6dOnrYoDAOiBZYVQX18vt9sdHHs8Hvl8vuD4hhtu0LRp0yRJFy9eVHFxsaZOnWpVHABADyy7MC0QCMjlcgXHhmGEjK84f/68Hn74YY0ZM0YzZ87s1TFiY4f3OZ/bPaLPr7WD1XnDuX/eW+s4KavkrLxOyipZk9eyQoiPj9e+ffuCY7/fL4/HE/Kc+vp6LViwQElJSXrqqad6fYyGhiYFAkavX+e0q2n7I2+49s97ax0nZZWclddJWaW+542IcHX7QdqyU0YTJ05UbW2tGhsb1dLSoqqqKiUnJwe3t7e3a/HixZo+fbqWLVvW6ewBANB/LJshxMXFKT8/Xzk5OWptbVV2drYSExOVm5urvLw8nTlzRocOHVJ7e7t27dolSfrud7+rVatWWRUJANANS29u5/V65fV6Q35WUlIiSUpISNCRI0esPDwAoBe4UhkAIIlCAACYKAQAgCQKAQBgohAAAJIoBACAiUIAAEiiEAAAJgoBACDJ4iuV8eXNL6wOPt5YMMXGJAC+6pghDGBfLIPOxgAQThQCAEAShQAAMFEIg1DHtQjWJgBILCoPWpQAgI6YIQAAJFEIAAAThQAAkEQhAABMFAIAQBKFAAAwUQgAAEkUAgDARCEAACRRCAAAE7euCBO+twCA0zFDCAO+twDAVwGFAACQRCEAAEwUwgDG9xYA6E8sKg9wGwumyO0eIb//vN1RAHzFWTpDqKioUHp6ulJTU1VaWnrV9sOHDysrK0tpaWlatmyZ2trarIwjSVpQWC3vkh1awMIvAISwrBB8Pp+Kioq0ZcsWbd++XWVlZTp27FjIc5YuXaoVK1Zo165dMgxD5eXlVsWR9K8yMMzHhjkGAPyLZYVQU1OjpKQkxcTEKDo6WmlpaaqsrAxuP3XqlC5evKg77rhDkpSVlRWy3QpGD2MAGMwsW0Oor6+X2+0Ojj0ej+rq6rrc7na75fP5enWM2NjhXzqn2z3iS++jP/ZrVU4rOCmr5Ky8TsoqOSuvk7JK1uS1rBACgYBcLldwbBhGyLin7deioaFJgcCX+5wfjsXajQVTrrpSOZyLwE5aVHZSVslZeZ2UVXJWXidllfqeNyLC1e0HacsKIT4+Xvv27QuO/X6/PB5PyHa/3x8cnz17NmS7FTr7H3c49w0ATmbZGsLEiRNVW1urxsZGtbS0qKqqSsnJycHto0aNUlRUlPbv3y9J2rFjR8h2q2wsmKKKdTP4HzgAdGBZIcTFxSk/P185OTm69957lZGRocTEROXm5uq9996TJD3//PNas2aN7r77bjU3NysnJ8eqOACAHrgMw3DsH9v0dQ1hsJwvtIOTskrOyuukrJKz8jopq2TdGgK3rgAASKIQAAAmCgEAIMnhN7eLiOjddQvheq0dnJTXSVklZ+V1UlbJWXmdlFXqW96eXuPoRWUAQPhwyggAIIlCAACYKAQAgCQKAQBgohAAAJIoBACAiUIAAEiiEAAAJgoBACBpEBZCRUWF0tPTlZqaqtLSUrvjdOuFF17QPffco3vuuUfPPfec3XGuydq1a1VQUGB3jB5VV1crKytL06dP169+9Su74/Rox44dwX8La9eutTtOp5qampSRkaGPP/5YklRTUyOv16vU1FQVFRXZnC5Ux6xlZWXKyMiQ1+vVL37xC12+fNnmhKE65r1i8+bNeuCBB8J3IGMQOXPmjDF58mTj008/NS5cuGB4vV7j6NGjdsfq1J49e4zZs2cbly5dMi5fvmzk5OQYVVVVdsfqVk1NjTF+/HjjySeftDtKt06ePGlMmjTJOH36tHH58mVj7ty5xu7du+2O1aXm5mbjzjvvNBoaGozW1lYjOzvb2LNnj92xQhw4cMDIyMgwxo4da3z00UdGS0uLkZKSYpw8edJobW015s+fP2De445Zjx8/bkybNs04f/68EQgEjCeeeMLYtGmT3TGDOua94ujRo8aPf/xj4yc/+UnYjjWoZgg1NTVKSkpSTEyMoqOjlZaWpsrKSrtjdcrtdqugoEBDhw5VZGSkvvnNb+qTTz6xO1aXPvvsMxUVFWnx4sV2R+nRX/7yF6Wnpys+Pl6RkZEqKirSuHHj7I7Vpfb2dgUCAbW0tKitrU1tbW2KioqyO1aI8vJyrVy5Mvi96HV1dRo9erRuvvlmXXfddfJ6vQPmd61j1qFDh2rlypUaPny4XC6Xvv3tbw+o37WOeSXp8uXLWrFihfLy8sJ6LEff7bS36uvr5Xa7g2OPx6O6ujobE3XtW9/6VvDxiRMn9Oc//1mvvvqqjYm6t2LFCuXn5+v06dN2R+nRhx9+qMjISC1evFinT5/WXXfdpccff9zuWF0aPny4HnvsMU2fPl3XX3+97rzzTn3/+9+3O1aIVatWhYw7+13z+Xz9HatTHbOOGjVKo0aNkiQ1NjaqtLRUa9assSNapzrmlaR169Zp1qxZuummm8J6rEE1QwgEAnK5/n37V8MwQsYD0dGjRzV//nw98cQTuuWWW+yO06nXXntNX/va1zRhwgS7o1yT9vZ21dbWavXq1SorK1NdXZ22bdtmd6wuHTlyRK+//rreeustvf3224qIiNCGDRvsjtUtJ/6u+Xw+Pfjgg5o1a5bGjx9vd5wu7dmzR6dPn9asWbPCvu9BVQjx8fHy+/3Bsd/vD5mGDTT79+/XT3/6Uy1ZskQzZ860O06X3nzzTe3Zs0czZszQ+vXrVV1drdWrV9sdq0sjR47UhAkTdOONN2rYsGGaOnXqgJ0pStLf//53TZgwQbGxsRo6dKiysrK0d+9eu2N1y2m/a//85z81Z84czZw5Uw8//LDdcbr1pz/9SUePHtWMGTO0fPly/eMf/wjfDDdsqxEOcGVRuaGhwWhubjYyMzONd9991+5Ynfrkk0+M8ePHGzU1NXZH6ZXXX399wC8qHzhwwEhLSzPOnTtntLW1GYsWLTLKy8vtjtWlt99+28jMzDQuXLhgBAIB4+mnnzbWr19vd6xOTZ482fjoo4+MixcvGsnJycaJEyeMtrY2Y8GCBcabb75pd7wQV7KeP3/eSElJMbZt22Z3pG5dyftF77zzTlgXlQfVGkJcXJzy8/OVk5Oj1tZWZWdnKzEx0e5YndqwYYMuXbqkwsLC4M/mzJmjuXPn2pjqq2HcuHH62c9+pnnz5qm1tVU/+tGPLJl+h8ukSZN06NAhZWVlKTIyUgkJCVq4cKHdsboVFRWlwsJCPfroo7p06ZJSUlJ099132x2rU1u3btXZs2e1adMmbdq0SZI0ZcoUPfbYYzYn6398YxoAQNIgW0MAAHSNQgAASKIQAAAmCgEAIIlCAACYKAQAgCQKAQBgohCAMNm2bZumTp2qCxcuqLm5WdOnT9f27dvtjgVcMy5MA8JoyZIlGjFihC5fvqwhQ4bo2WeftTsScM0oBCCMmpqaNGPGDA0bNkxvvPHGgPveAqA7nDICwqihoUGXLl3S559/rvr6ervjAL3CDAEIk9bWVs2ZM0dz5sxRIBDQa6+9pldffVWRkZF2RwOuCTMEIEx+85vfaOTIkbrvvvs0e/Zs3XDDDQPuy+WB7jBDAABIYoYAADBRCAAASRQCAMBEIQAAJFEIAAAThQAAkEQhAABMFAIAQJL0/0guu2cvlDomAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Take 10,000 samples out of the binomial distribution: n_defaults\n", "n_defaults = np.random.binomial(n=100, p=0.05, size=10000)\n", "\n", "# Compute CDF: x, y\n", "x, y = ecdf(n_defaults)\n", "\n", "# Plot the CDF with axis labels\n", "_ = plt.plot(x, y, marker='.', linestyle='none')\n", "_ = plt.xlabel('x')\n", "_ = plt.ylabel('y')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the Binomial PMF\n", "Plotting a nice looking PMF requires a bit of matplotlib trickery that we will not go into here. Instead, we will plot the PMF of the Binomial distribution as a histogram with skills you have already learned. The trick is setting up the edges of the bins to pass to ```plt.hist()``` via the ```bins``` keyword argument. We want the bins centered on the integers. So, the edges of the bins should be -0.5, 0.5, 1.5, 2.5, ... up to ```max(n_defaults) + 1.5```. You can generate an array like this using ```np.arange()``` and then subtracting 0.5 from the array.\n", "\n", "You have already sampled out of the Binomial distribution during your exercises on loan defaults, and the resulting samples are in the NumPy array ```n_defaults```." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEJCAYAAAC61nFHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3df0yTd+IH8DdgQfulOwZrwXDLTE5P4w80CwuMc7idSgXpOhlmCBssHCjenbrGwdjpdDI2f0TEWya4sGWXzR+TibTXCwPcdsupcFG5qZi5Re/i5ikrFZwWBGnp8/3Dr893HQrlsQ9P0fcrMeHz/Oq7pfVNn/Z5ngBBEAQQERENU6DSAYiIaHRigRARkSQsECIikoQFQkREkrBAiIhIEhYIERFJwgIhIiJJxigdYCRdudINt1vew14iIkLR0dEl623cDX/PB/h/Rn/PBzCjL/h7PkD+jIGBAXjwwf+54/z7qkDcbkH2Arl1O/7M3/MB/p/R3/MBzOgL/p4PUDYjd2EREZEkLBAiIpKEBUJERJKwQIiISBIWCBERScICISIiSVggREQkyX11HAjdWzQPjMPYEN89hXtvuOC41uOz7RHd61ggNGqNDRkDw2qLz7ZnLTPC4bOtEd37uAuLiIgkYYEQEZEkLBAiIpKEBUJERJKwQIiISBIWCBERScICISIiSWQ9DsRqtaKyshIulws5OTnIysq67XJFRUWIj49HWloaOjo6kJubK85zOBy4cuUKvvrqKxw9ehQrVqxAVFQUAGDq1KnYuHGjnHeBiIjuQLYCsdlsKC8vx4EDBxAcHIyMjAzExcVh4sSJHsusX78ezc3NiI+PBwBERETAYrl5cJjb7UZOTg5MJhMA4PTp08jNzcWyZcvkik1ERF6SbRdWU1MT4uPjERYWBrVaDb1ej/r6eo9lrFYr5s6di+Tk5Ntuo6amBuPGjYPBYAAAtLa24vDhwzAYDCgoKEBbW5tc8YmIaAiyFUh7ezu0Wq041ul0sNlsHsvk5eVh8eLFt12/v78fO3fuxOrVq8VpGo0GL7zwAqxWK+bMmSO+MyEiopEn2y4st9uNgIAAcSwIgsd4KIcOHcKECRMwefJkcVpJSYn485IlS1BWVgaHwwGNRuPVNiMiQr2+/buh1XqXRyn+ng9QLqO3t8vH0Df8PaO/5wOUzShbgURFReH48ePi2G63Q6fTeb3+Z599hpSUFHHsdrvx7rvvYunSpQgKChKn//TnoXR0dMHtFrxeXgqtVgO73X9Pyefv+QDvM8rxwvH2du+Vx1BJ/p7R3/MB8mcMDAwY9A9v2XZhJSQkoLm5GZ2dnejp6UFjYyMSExO9Xv/EiROIjY0Vx4GBgTh48CAaGhoAAGazGTNnzoRarfZ5diIiGppsBRIZGQmTyYTs7Gw888wzSE1NRUxMDPLz89Ha2jrk+hcuXBC/rnvL5s2b8eGHH2LhwoWoqalBaWmpXPGJiGgIAYIgyLtPx49wF5ay+Xx9ASgAPr8eCHdhjRx/z+jv+QDld2HxglI0YuS4ABQRKYenMiEiIklYIEREJAkLhIiIJGGBEBGRJCwQIiKShAVCRESSsECIiEgSFggREUnCAiEiIklYIEREJAkLhIiIJGGBEBGRJCwQIiKShAVCRESSsECIiEgSFggREUnCAiEiIklYIEREJImsBWK1WpGSkoKkpCTs3r37jssVFRXhwIED4ri2thazZ8+G0WiE0WhEeXk5AODSpUvIysrCggULsHz5cnR3d8sZn4iIBiFbgdhsNpSXl2PPnj0wm83Yt28fzp07N2CZgoICNDQ0eEw/ffo0iouLYbFYYLFYYDKZAAAbNmxAZmYm6uvrMX36dFRUVMgVn4iIhiBbgTQ1NSE+Ph5hYWFQq9XQ6/Wor6/3WMZqtWLu3LlITk72mN7a2ora2loYDAa8/PLLuHr1KpxOJ44dOwa9Xg8ASEtLG7A9IiIaOWPk2nB7ezu0Wq041ul0OHXqlMcyeXl5AICWlhaP6VqtFrm5uXj00Uexbds2lJSU4JVXXkFoaCjGjBkjLmOz2YaVKSIiVMpdGTatVjMityOVv+dTkrePzWh4DJnx7vl7PkDZjLIViNvtRkBAgDgWBMFjPJgdO3aIP+fl5WH+/PkoKioasL6327ulo6MLbrcwrHWGS6vVwG53yHobd0PJfKPhxejNY+Pvv2OAGX3B3/MB8mcMDAwY9A9v2XZhRUVFwW63i2O73Q6dTjfkeg6HA3/5y1/EsSAICAoKQnh4OBwOB/r7+4e1PSIikodsBZKQkIDm5mZ0dnaip6cHjY2NSExMHHI9tVqN9957DydPngQA7Nq1C/Pnz4dKpUJsbCzq6uoAAGaz2avtERGRPGTbhRUZGQmTyYTs7Gw4nU6kp6cjJiYG+fn5WLlyJWbMmHHb9YKCgrB9+3a8/vrr6O3txYQJE7BlyxYAwPr161FcXIzKykqMHz8e27Ztkys+3Yf6nP0+/wyk94YLjms9dxOLyG/JViAAYDAYYDAYPKZVVVUNWG7Tpk0e49jYWNTW1g5YLjo6Gh999JFvQxL9n2BVEAyrLT7dprXMCP/ei04kHY9EJyIiSVggREQkCQuEiIgkYYEQEZEkLBAiIpJE1m9h0eileWAcxobw6UFEd8b/Iei2xoaMkeUrrUR07+AuLCIikoQFQkREkrBAiIhIEhYIERFJwgIhIiJJWCBERCQJC4SIiCRhgRARkSQsECIikoQFQkREkrBAiIhIEhYIERFJImuBWK1WpKSkICkpCbt3777jckVFRThw4IA4bmlpQXp6OoxGI3JycnDx4kUAwNGjRxEXFwej0Qij0YhXX31VzvhERDQI2c7Ga7PZUF5ejgMHDiA4OBgZGRmIi4vDxIkTPZZZv349mpubER8fL04vLCxERUUFpkyZgv3796O0tBSVlZU4ffo0cnNzsWzZMrliExGRl2R7B9LU1IT4+HiEhYVBrVZDr9ejvr7eYxmr1Yq5c+ciOTlZnNbX14dVq1ZhypQpAIDJkyejra0NANDa2orDhw/DYDCgoKBAnE5ERCNPtncg7e3t0Gq14lin0+HUqVMey+Tl5QG4ucvqluDgYBiNN68b4Xa78c4772DevHkAAI1Gg+TkZCQlJWHv3r0wmUz4+OOPvc4UEREq+f4Mh1arGZHbkcrf891rlHq8R8Pv2d8z+ns+QNmMshWI2+1GQECAOBYEwWM8lL6+PhQXF8Plcom7rEpKSsT5S5YsQVlZGRwOBzQa7x7Ajo4uuN2C1xmk0Go1sNsdst7G3fA232h44YwWSjwf/P15CPh/Rn/PB8ifMTAwYNA/vGXbhRUVFQW73S6O7XY7dDqdV+t2d3cjLy8PLpcLlZWVUKlUcLvdqKysRH9/v8eyQUFBPs1NRETeka1AEhIS0NzcjM7OTvT09KCxsRGJiYlerVtYWIhHHnkE27dvR3Bw8M2ggYE4ePAgGhoaAABmsxkzZ86EWq2W6y4QEdEgZNuFFRkZCZPJhOzsbDidTqSnpyMmJgb5+flYuXIlZsyYcdv1vv76a3z++eeYOHEiFi1aBODm5ydVVVXYvHkzXnvtNezYsQPh4eHYsmWLXPGJiGgIshUIABgMBhgMBo9pVVVVA5bbtGmT+PPUqVPx7bff3nZ7kyZNGtaH5kREJB8eiU5ERJKwQIiISBIWCBERScICISIiSVggREQkCQuEiIgkYYEQEZEkLBAiIpKEBUJERJKwQIiISBIWCBERScICISIiSWQ9mSLR/a7P2e/Ti3P13nDBca3HZ9sjuhssECIZBauCYFht8dn2rGVG+Pc18uh+wl1YREQkCQuEiIgkYYEQEZEkLBAiIpJE1gKxWq1ISUlBUlISdu/efcflioqKcODAAXF86dIlZGVlYcGCBVi+fDm6u7sBANeuXcPSpUuRnJyMrKws2O12OeMTEdEgZCsQm82G8vJy7NmzB2azGfv27cO5c+cGLFNQUICGhgaP6Rs2bEBmZibq6+sxffp0VFRUAAC2b9+O2NhYfPrpp1i8eDHefPNNueITEdEQZCuQpqYmxMfHIywsDGq1Gnq9HvX19R7LWK1WzJ07F8nJyeI0p9OJY8eOQa/XAwDS0tLE9b788ksYDAYAQGpqKv7xj3/A6XTKdReIiGgQshVIe3s7tFqtONbpdLDZbB7L5OXlYfHixR7Trly5gtDQUIwZc/MQFa1WK673022OGTMGoaGh6OzslOsuEBHRIGQ7kNDtdiMgIEAcC4LgMb6T2y13p/UEQUBgoPcdGBER6vWyd8OXRx7Lwd/z0eC8/f2Nht+zv2f093yAshmHLJAVK1ZgyZIlSEhIGNaGo6KicPz4cXFst9uh0+mGXC88PBwOhwP9/f0ICgryWE+n0+Hy5cuIioqCy+VCd3c3wsLCvM7U0dEFt1sY1v0YLq1WA7vdf48V9jbfaHjh3K+8/f358/MQ8P+M/p4PkD9jYGDAoH94D/nn+/z581FRUQG9Xo/3338fP/74o1c3nJCQgObmZnR2dqKnpweNjY1ITEwccj2VSoXY2FjU1dUBAMxms7jenDlzYDabAQB1dXWIjY2FSqXyKg8REfnWkAXy9NNPY9euXaioqEBHRwfS09NRWFiIU6dODbpeZGQkTCYTsrOz8cwzzyA1NRUxMTHIz89Ha2vroOuuX78e1dXVSElJwfHjx/HSSy8BAFatWoUTJ05g4cKF2LNnD9atWzeMu0pERL7k1Wcgbrcb3333Hc6fP4/+/n5ERETg9ddfx5NPPomVK1fecT2DwSB+a+qWqqqqActt2rTJYxwdHY2PPvpowHJhYWHYuXOnN5GJiEhmQxZIeXk5Dhw4gIcffhiZmZn485//DJVKhevXr+Opp54atECIiOjeNWSBdHZ2oqqqClOmTPGYrlarUVZWJlswIiLyb0MWyBtvvHHHebNnz/ZpGCIiGj14MkUiIpKEBUJERJKwQIiISBIWCBERScICISIiSVggREQkCQuEiIgkYYEQEZEkLBAiIpKEBUJERJKwQIiISBIWCBERSSLbNdFpZGkeGIexId79Onm5WiLyBRbIPWJsyBgYVlt8tj1rmdFn2yKiexN3YRERkSQsECIikkTWXVhWqxWVlZVwuVzIyclBVlaWx/wzZ85gzZo16O7uRmxsLDZs2ICrV68iNzdXXMbhcODKlSv46quvcPToUaxYsQJRUVEAgKlTp2Ljxo1y3gUiIroD2QrEZrOJ11MPDg5GRkYG4uLiMHHiRHGZwsJClJaWYtasWfjTn/6E6upqZGZmwmK5uS/f7XYjJycHJpMJAHD69Gnk5uZi2bJlcsUmIiIvybYLq6mpCfHx8QgLC4NarYZer0d9fb04/+LFi+jt7cWsWbMAAGlpaR7zAaCmpgbjxo2DwWAAALS2tuLw4cMwGAwoKChAW1ubXPGJiGgIshVIe3s7tFqtONbpdLDZbHecr9VqPeb39/dj586dWL16tThNo9HghRdegNVqxZw5c8R3JkRENPJk24XldrsREBAgjgVB8BgPNf/QoUOYMGECJk+eLE4rKSkRf16yZAnKysrgcDig0Xh3XENERKik+zJcPM6C5OTt82s0PA/9PaO/5wOUzShbgURFReH48ePi2G63Q6fTecy32+3i+PLlyx7zP/vsM6SkpIhjt9uNd999F0uXLkVQUJA4/ac/D6WjowtutzDs+zIcWq0GdrtD1tu40+3S/cGb55dSz8Ph8PeM/p4PkD9jYGDAoH94y7YLKyEhAc3Nzejs7ERPTw8aGxuRmJgozo+OjkZISAhaWloAABaLxWP+iRMnEBsb+/9BAwNx8OBBNDQ0AADMZjNmzpwJtVot110gIqJByFYgkZGRMJlMyM7OxjPPPIPU1FTExMQgPz8fra2tAICtW7di48aNWLBgAa5fv47s7Gxx/QsXLohf171l8+bN+PDDD7Fw4ULU1NSgtLRUrvhERDQEWY8DMRgM4jeobqmqqhJ/njJlCvbv33/bdU+ePDlg2qRJk/Dxxx/7NiQREUnCI9GJiEgSFggREUnCAiEiIkl4OneiUaTP2e/T40B6b7jguNZzt7HoPsUCIRpFglVBPr/ui38f6UD+jLuwiIhIEhYIERFJwgIhIiJJWCBERCQJC4SIiCRhgRARkSQsECIikoQFQkREkrBAiIhIEhYIERFJwgIhIiJJWCBERCQJC4SIiCRhgRARkSSyFojVakVKSgqSkpKwe/fuAfPPnDmDtLQ06PV6rFmzBi6XCwBQW1uL2bNnw2g0wmg0ory8HABw6dIlZGVlYcGCBVi+fDm6u7vljE9ERIOQrUBsNhvKy8uxZ88emM1m7Nu3D+fOnfNYprCwEOvWrUNDQwMEQUB1dTUA4PTp0yguLobFYoHFYoHJZAIAbNiwAZmZmaivr8f06dNRUVEhV3wiIhqCbAXS1NSE+Ph4hIWFQa1WQ6/Xo76+Xpx/8eJF9Pb2YtasWQCAtLQ0cX5raytqa2thMBjw8ssv4+rVq3A6nTh27Bj0ev2A5YmIaOTJViDt7e3QarXiWKfTwWaz3XG+VqsV52u1Wvz+97/HX//6V4wfPx4lJSW4cuUKQkNDMWbMmAHLExHRyJPtkrZutxsBAQHiWBAEj/Fg83fs2CFOz8vLw/z581FUVOSxPIAB46FERIQOa3mpvL1mNZE/UPL56u+vFX/PByibUbYCiYqKwvHjx8Wx3W6HTqfzmG+328Xx5cuXodPp4HA4UFNTgxdffBHAzWIJCgpCeHg4HA4H+vv7ERQUNGB73ujo6ILbLdzdHRuCVquB3T7yV5keDU908k9KPF8B5V4r3vL3fID8GQMDAwb9w1u2XVgJCQlobm5GZ2cnenp60NjYiMTERHF+dHQ0QkJC0NLSAgCwWCxITEyEWq3Ge++9h5MnTwIAdu3ahfnz50OlUiE2NhZ1dXUAALPZ7LE9IiIaWbK9A4mMjITJZEJ2djacTifS09MRExOD/Px8rFy5EjNmzMDWrVuxdu1adHV1Ydq0acjOzkZQUBC2b9+O119/Hb29vZgwYQK2bNkCAFi/fj2Ki4tRWVmJ8ePHY9u2bXLFJyKiIchWIABgMBhgMBg8plVVVYk/T5kyBfv37x+wXmxsLGprawdMj46OxkcffeT7oERENGw8Ep2IiCRhgRARkSQsECIikoQFQkREkrBAiIhIEhYIERFJwgIhIiJJWCBERCQJC4SIiCRhgRARkSQsECIikoQFQkREkrBAiIhIEhYIERFJIuvp3InIv/U5+316NcveGy44rvX4bHvk31ggRPexYFUQDKstPtuetcwI/74ILPkSC0QhmgfGYWwIH34iGr34P5hCxoaM8flffkREI4kfohMRkSSyFojVakVKSgqSkpKwe/fuAfPPnDmDtLQ06PV6rFmzBi6XCwDQ0tKC9PR0GI1G5OTk4OLFiwCAo0ePIi4uDkajEUajEa+++qqc8YmIaBCyFYjNZkN5eTn27NkDs9mMffv24dy5cx7LFBYWYt26dWhoaIAgCKiurhanl5aWwmKxwGAwoLS0FABw+vRp5ObmwmKxwGKxYOPGjXLFJyKiIchWIE1NTYiPj0dYWBjUajX0ej3q6+vF+RcvXkRvby9mzZoFAEhLS0N9fT36+vqwatUqTJkyBQAwefJktLW1AQBaW1tx+PBhGAwGFBQUiNOJiGjkyfYhent7O7RarTjW6XQ4derUHedrtVrYbDYEBwfDaLz5gbDb7cY777yDefPmAQA0Gg2Sk5ORlJSEvXv3wmQy4eOPP/Y6U0RE6N3eLa/48nv1RKPNcJ7//v5a8fd8gLIZZSsQt9uNgIAAcSwIgsd4qPl9fX0oLi6Gy+XCsmXLAAAlJSXi/CVLlqCsrAwOhwMajXcPYEdHF9xuQfJ98oZWq4HdPvQ34UfDE5NICm+e/4D3rxWl+Hs+QP6MgYEBg/7hLdsurKioKNjtdnFst9uh0+nuOP/y5cvi/O7ubuTl5cHlcqGyshIqlQputxuVlZXo7+/3uJ2goCC57gIREQ1CtgJJSEhAc3MzOjs70dPTg8bGRiQmJorzo6OjERISgpaWFgCAxWIR5xcWFuKRRx7B9u3bERwcfDNoYCAOHjyIhoYGAIDZbMbMmTOhVqvlugtERDQI2XZhRUZGwmQyITs7G06nE+np6YiJiUF+fj5WrlyJGTNmYOvWrVi7di26urowbdo0ZGdn4+uvv8bnn3+OiRMnYtGiRQBufn5SVVWFzZs347XXXsOOHTsQHh6OLVu2yBWfiIiGIOuR6AaDAQaDwWNaVVWV+POUKVOwf/9+j/lTp07Ft99+e9vtTZo0aVgfmhMRkXx4JDoREUnCAiEiIklYIEREJAkLhIiIJOHp3InIZ4Z7hUNvluVVDv0XC4SIfMbXVzgEeJVDf8ZdWEREJAkLhIiIJGGBEBGRJCwQIiKShAVCRESSsECIiEgSFggREUnCAiEiIklYIEREJAmPRPeS5oFxGBvi3cPF650T+c5wT48yFJ4axXdYIF4aGzLGp6dosJYZfbYtonuZr0+PwlOj+A53YRERkSQsECIikkTWArFarUhJSUFSUhJ27949YP6ZM2eQlpYGvV6PNWvWwOVyAQAuXbqErKwsLFiwAMuXL0d3dzcA4Nq1a1i6dCmSk5ORlZUFu90uZ3wiIhqEbAVis9lQXl6OPXv2wGw2Y9++fTh37pzHMoWFhVi3bh0aGhogCAKqq6sBABs2bEBmZibq6+sxffp0VFRUAAC2b9+O2NhYfPrpp1i8eDHefPNNueIT0T3q1ofyQ/0D4NVymgfGKXyPlCPbh+hNTU2Ij49HWFgYAECv16O+vh5//OMfAQAXL15Eb28vZs2aBQBIS0vD22+/jcWLF+PYsWPYsWOHOP35559HYWEhvvzyS/GdTGpqKkpKSuB0OqFSqbzKFBgYcFf3Sfegb58o99v25Nimv29Pjm3eb9vz9TaDVUH4XWmjz7ZX+cpcn3/z8sYNF7q6er1a9m7/X7urbQsy2blzp7Bt2zZxXF1dLaxdu1Yc/+tf/xIyMjLE8fnz54WkpCTBZrMJTzzxhDjd6XQK06ZNEwRBEKZNmyY4nU5x3hNPPCH88MMPct0FIiIahGy7sNxuNwIC/r+9BEHwGN9p/s+XAzBg/NN1AgP5PQAiIiXI9r9vVFSUx4fcdrsdOp3ujvMvX74MnU6H8PBwOBwO9Pf3D1hPp9Ph8uXLAACXy4Xu7m5xFxkREY0s2QokISEBzc3N6OzsRE9PDxobG5GYmCjOj46ORkhICFpaWgAAFosFiYmJUKlUiI2NRV1dHQDAbDaL682ZMwdmsxkAUFdXh9jYWK8//yAiIt8KEARBkGvjVqsV7777LpxOJ9LT05Gfn4/8/HysXLkSM2bMwDfffIO1a9eiq6sL06ZNw8aNGxEcHIyLFy+iuLgYHR0dGD9+PLZt24Zf/OIX+PHHH1FcXIwLFy5Ao9Fg69at+OUvfylXfCIiGoSsBUJERPcufgJNRESSsECIiEgSFggREUnCAiEiIklYID4y1Ikj/cE777yDhQsXYuHChdiyZYvSce5o8+bNKC4uVjrGbX3xxRdIS0tDcnIySktLlY4zgMViEX/HmzdvVjqOh66uLqSmpuK///0vgJunOzIYDEhKSkJ5ebnC6Qbm27dvH1JTU2EwGPDqq6+ir69P4YQDM96ya9cuvPDCCyMfSLmD4O8dP/zwg/DUU08JV65cEbq7uwWDwSCcPXtW6Vgejhw5Ijz33HPCjRs3hL6+PiE7O1tobGxUOtYATU1NQlxcnPDKK68oHWWA77//Xpg9e7bQ1tYm9PX1CUuWLBG+/PJLpWOJrl+/Ljz22GNCR0eH4HQ6hfT0dOHIkSNKxxIEQRBOnDghpKamCtOmTRMuXLgg9PT0CHPmzBG+//57wel0Crm5uYo+lj/P95///EeYP3++4HA4BLfbLRQVFQkffPCBYvlul/GWs2fPCk888YTw/PPPj3gmvgPxgZ+eOFKtVosnjvQnWq0WxcXFCA4Ohkqlwq9+9StcunRJ6VgefvzxR5SXl6OgoEDpKLd18OBBpKSkICoqCiqVCuXl5Zg5c6bSsUT9/f1wu93o6emBy+WCy+VCSEiI0rEAANXV1Vi/fr14VolTp07hkUcewcMPP4wxY8bAYDAo+pr5eb7g4GCsX78eoaGhCAgIwK9//WvFXy8/zwgAfX19WLduHVauXKlIJl7S1gfa29uh1WrFsU6nw6lTpxRMNNCkSZPEn8+fP49PP/0Ue/fuVTDRQOvWrYPJZEJbW5vSUW7ru+++g0qlQkFBAdra2vDkk0/ipZdeUjqWKDQ0FKtWrUJycjLGjRuHxx57DI8++qjSsQBgwKUXbveasdlsIx1L9PN80dHRiI6OBgB0dnZi9+7d2LhxoxLRRLe7fEVZWRmeffZZxQ6o5jsQHxjqxJH+5OzZs8jNzUVRUREmTJigdBzRJ598gvHjx+Pxxx9XOsod9ff3o7m5GW+99Rb27duHU6dOoba2VulYom+++QY1NTX4+9//jkOHDiEwMBDvv/++0rFua7S8Zmw2G3JycvDss88iLi5O6Tgejhw5gra2Njz77LOKZWCB+MBQJ470Fy0tLXjxxRexevVqLFq0SOk4Hurq6nDkyBEYjUa8/fbb+OKLL/DWW28pHcvDQw89hMcffxzh4eEYO3Ys5s2b51fvNA8fPozHH38cERERCA4ORlpaGo4ePap0rNsaDa+Zf//738jIyMCiRYvwhz/8Qek4A/ztb3/D2bNnYTQasXbtWpw+fXrk3xGP+Kcu96BbH6J3dHQI169fF55++mnh5MmTSsfycOnSJSEuLk5oampSOsqQampq/PJD9BMnTgh6vV64evWq4HK5hGXLlgnV1dVKxxIdOnRIePrpp4Xu7m7B7XYLr732mvD2228rHcvDU089JVy4cEHo7e0VEhMThfPnzwsul0v43e9+J9TV1SkdT8zncDiEOXPmCLW1tUpHGuBWxp/65z//qciH6PwMxAciIyNhMpmQnZ0tnjgyJiZG6Vge3n//fdy4cQObNm0Sp2VkZGDJkiUKphpdZs6ciby8PGRmZsLpdOI3v/mNorsPfm727Nn4+um6XS8AAAFcSURBVOuvkZaWBpVKhRkzZmDp0qVKx7qtkJAQbNq0CStWrMCNGzcwZ84cLFiwQOlYov379+Py5cv44IMP8MEHHwAAfvvb32LVqlUKJ/MvPJkiERFJws9AiIhIEhYIERFJwgIhIiJJWCBERCQJC4SIiCRhgRARkSQsECIikoQFQqSQ2tpazJs3D93d3bh+/TqSk5NhNpuVjkXkNR5ISKSg1atXQ6PRoK+vD0FBQXjjjTeUjkTkNRYIkYK6urpgNBoxduxYHDhwwG+u30HkDe7CIlJQR0cHbty4gWvXrqG9vV3pOETDwncgRApxOp3IyMhARkYG3G43PvnkE+zduxcqlUrpaERe4TsQIoVs27YNDz30EBYvXoznnnsODz74IMrLy5WOReQ1vgMhIiJJ+A6EiIgkYYEQEZEkLBAiIpKEBUJERJKwQIiISBIWCBERScICISIiSVggREQkyf8CYY53nwys7V0AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Compute bin edges: bins\n", "bins = np.arange(0, max(n_defaults) + 2) - 0.5\n", "\n", "# Generate histogram\n", "_ = plt.hist(n_defaults, bins=bins, density=True)\n", "\n", "# Label axes\n", "_ = plt.xlabel('x')\n", "_ = plt.ylabel('y')\n", "plt.savefig('../images/bin-dist.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Poisson processes and the Poisson distribution\n", "- Poisson process\n", " - The timing of the next event is completely independent of when the previous event happened\n", "- Example\n", " - Natural births in a given hospital\n", " - Hit on a website during a given hour\n", " - Meteor strikes\n", " - Molecular collisions in a gas\n", " - Aviation incidents\n", " - Buses in Poissonville\n", "- Poisson Distribution\n", " - The number r of arrivals of a Poisson process in a given time interval with average rate of? arrivals per interval is Poisson distributed.\n", " - The number r of hits on a website in one hour with an average hit rate of 6 hits per hour is Poisson distributed.\n", " - Limit of the Binomial distribution for low probability of success and large number of trials\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Relationship between Binomial and Poisson distributions\n", "You just heard that the Poisson distribution is a limit of the Binomial distribution for rare events. This makes sense if you think about the stories. Say we do a Bernoulli trial every minute for an hour, each with a success probability of 0.1. We would do 60 trials, and the number of successes is Binomially distributed, and we would expect to get about 6 successes. This is just like the Poisson story we discussed in the video, where we get on average 6 hits on a website per hour. So, the Poisson distribution with arrival rate equal to np approximates a Binomial distribution for n Bernoulli trials with probability p of success (with n large and p small). Importantly, the Poisson distribution is often simpler to work with because it has only one parameter instead of two for the Binomial distribution.\n", "\n", "Let's explore these two distributions computationally. You will compute the mean and standard deviation of samples from a Poisson distribution with an arrival rate of 10. Then, you will compute the mean and standard deviation of samples from a Binomial distribution with parameters n and p such that np=10." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Poisson: 9.9895 3.176159591393354\n", "n = 20 Binom: 10.0621 2.240277569856021\n", "n = 100 Binom: 10.0432 2.988567175085747\n", "n = 1000 Binom: 10.0106 3.1406508306400442\n" ] } ], "source": [ "# Draw 10,000 samples out of Poisson distribution: samples_poisson\n", "samples_poisson = np.random.poisson(10, 10000)\n", "\n", "# Print the mean and standard deviation\n", "print('Poisson: ', np.mean(samples_poisson), np.std(samples_poisson))\n", "\n", "# Specify values of n and p to consider for Binomial: n, p\n", "n = [20, 100, 1000]\n", "p = [0.5, 0.1, 0.01]\n", "\n", "# Draw 10,000 samples for each n,p pair: samples_binomial\n", "for i in range(3):\n", " samples_binomial = np.random.binomial(n[i], p[i], size=10000)\n", " \n", " # Print results\n", " print('n = ', n[i], 'Binom:', np.mean(samples_binomial), np.std(samples_binomial))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Was 2015 anomalous?\n", "1990 and 2015 featured the most no-hitters of any season of baseball (there were seven). Given that there are on average 251/115 no-hitters per season, what is the probability of having seven or more in a season?" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of seven or more no-hitters: 0.0071\n" ] } ], "source": [ "# Draw 10,000 samples out of Poisson distribution: n_nohitters\n", "n_nohitters = np.random.poisson(251/115, 10000)\n", "\n", "# Compute number of samples that are seven or greater: n_large\n", "n_large = np.sum(n_nohitters >= 7)\n", "\n", "# compute probability of getting seven or more: p_large\n", "p_large = n_large / 10000\n", "\n", "# Print the result\n", "print('Probability of seven or more no-hitters:', p_large)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }