{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Week 4: Mathematical expectation\n", "\n", "\n", " #### [Back to main page](https://petrosyan.page/fall2020math3215)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean value of the hypergeometric distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean value of the hypergeometric distribution with parameters $(N,K,n)$ is equal to \n", "\n", "$$\\mu=\\frac{nK}{N}.$$\n", "\n", "As we discussed in class, the mean value was in a sense the center of the histogram: the point around which most of the histogram was concentrated. This is very vividly observed for the hypergeometric distribution. The plot bellow shows the mean value in blue for different values of $(N,K,n)$." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b8355df531f74e289ec832d171d24e3f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=80, description='N', min=1, readout_format=''), IntSlider(value=40, desc…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from scipy.special import comb\n", "from ipywidgets import interact, IntSlider\n", "\n", "def hypergeometric_pmf(N=80,K=40,n=30):\n", " range_x = np.arange(max(0, n-(N-K)), min(n, K)+1, 1)\n", "\n", " def hyper_pmf(N,K,n,i):\n", " pmf_val = comb(K, i, exact=True) * comb(N-K, n-i, exact=True) / comb(N, n, exact=True)\n", " return pmf_val\n", " mean = n * K / N\n", "\n", " pmf_values = np.array([hyper_pmf(N,K,n,i) for i in range_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.2))\n", " plt.xlim(-2,N+2)\n", " plt.xticks(np.arange(0, N+1, 5))\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " plt.title(\"Hypergeometric distribution\")\n", " plt.figtext(0.75,0.8, \" N={} \\n K={} \\n n={} \\n expectation={:.2f}\".format(N,K,n, mean), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "N = IntSlider(min=1.0, max=100.0, step=1.0, value=80, readout_format='')\n", "K = IntSlider(min=1.0, max=N.value, step=1.0, value=40, readout_format='')\n", "n = IntSlider(min=1.0, max=N.value, step=1.0, value=30, readout_format='')\n", "\n", "# enforce K<=N and n<=N\n", "def update_range(*args):\n", " K.max = N.value\n", " n.max = N.value\n", "N.observe(update_range, 'value')\n", "\n", "# display the interactive plot\n", "interact(hypergeometric_pmf, N=N, K=K, n=n);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Geometric distribution\n", "\n", "A random trial has a probability of success equal to $p$ and probability of failure $q=1-p$.Consider the following experiment: we are doing consecutive random trials until we reach a success.The set of outcomes has the form $s=\\overbrace{FF\\cdots F}^{n-1} S$ where number of $F$'s can be any number $n=1,2,\\dots$. Due to independence, $P(s)=q^np$. \n", "\n", "Let $X(s)$ denote the number of trials it took to reach success\n", "\n", "$$X(\\overbrace{FF\\cdots F}^{n-1} S)=n.$$\n", "\n", "Observe that \n", "\n", "$$f(n)=q^{n-1}p,\\quad n=1,2,\\dots.$$\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Definition (Geometric distribution)**\n", "\n", "
\n", "\n", "The pmf of the random variable $X$ is called Geometric distribution.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c001f78008d2491e8e4cd09f155feb73", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=0.5, description='p', max=1.0, min=0.01, readout_format='', step=0.01)…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from ipywidgets import interact, FloatSlider\n", "\n", "\n", "def geometric_pmf(p=0.5):\n", " q = 1-p\n", " N=50\n", " range_x = np.arange(1, N, 1)\n", "\n", " def geo_pmf(n):\n", " pmf_val = q**(n-1)*p\n", " return pmf_val\n", " mean = 1/p\n", "\n", " pmf_values = np.array([geo_pmf(n) for n in range_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.2))\n", " plt.xlim(0, N+1)\n", " plt.xticks(np.arange(0, N+1, 5))\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " plt.title(\"Geometric distribution\")\n", " plt.figtext(0.8,0.8, \"p={}\".format(p), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "p = FloatSlider(min=0.01, max=1, step=0.01, value=0.5, readout_format='')\n", "\n", "# display the interactive plot\n", "interact(geometric_pmf, p=p);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean and Variance under linear transformation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Theorem**\n", "\n", "
\n", "\n", "Suppose $X$ is a random variable with mean $\\mu_X$ and standard deviation $\\sigma_Y$. Let $Y=aX+b$ where $a$ and $b$ are any two numbers. Then mean and the standard deviation of $Y$ are\n", "$$\\mu_Y=a\\mu_X+b,\\quad \\sigma_Y=|a|\\sigma_X.$$\n", "\n", "
\n", " \n", "**Proof**\n", "$\\mu_Y=E[aX+b]=aE[X]+b=a\\mu_X+b.$ Notice that\n", "\n", "$$\\text{Var}(Y)=E[(Y-E[Y])^2]=E[(aX+b-aE[X]-b)^2]=E[(aX-aE[X])^2]=a^2\\text{Var}(X).$$\n", "\n", "Taking square roots, we have $\\sigma_Y=|a|\\sigma_X$. $\\blacksquare$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example**\n", "\n", "Let $X$ be a random variable with $\\text{range}(X)=\\{-2,-1,0,1,2\\}$ and \n", "\n", "$$f_X(-2)=0.4, \\quad f_X(-1)=0.25, \\quad f_X(0)=0.15, \\quad f_X(1)=0.1, \\quad f_X(2)=0.1.$$\n", "\n", "Take the random variable $Y=2X+1$.Notice that $\\text{range}(Y)=\\{-3,-1,1,3,5\\}$ and \n", "\n", "$$_Y(-3)=0.4, \\quad f_Y(-1)=0.25, \\quad f_Y(1)=0.15, \\quad f_Y(3)=0.1, \\quad f_Y(5)=0.1.$$\n", "\n", "It can be checked that \n", "\n", "$$\\mu_X=-0.75, \\quad \\sigma_X\\approx 1.414$$\n", "\n", "and thus \n", "\n", "$$\\mu_Y=-0.5, \\quad \\sigma_Y\\approx 2.828. $$" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABIcAAAJTCAYAAACSDnaYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de5Skd13n8c+XSUA3F0MkUW5DMCeLBgJxHYJ3wKgnsK6Iq8eYLOKKy4rG29E9RuMVdVdXZd2j0cgRRJGAN8Ji5Louies9QRO5m5CEEKI7XDJmEiDX3/5RFaa70zNdPdPdVd3f1+ucOT1V9TxVv37oTH1519NVNcYIAAAAAD09ZN4LAAAAAGB+xCEAAACAxsQhAAAAgMbEIQAAAIDGxCEAAACAxsQhAAAAgMbEIWDbqKovqqq/qqorq+rVVXX0vNcEALDTmcFg5xOHgO3kA0m+Yozx9CQ3JHnOnNcDANCBGQx2uKPmvQCAWY0xbl1y8d4k989rLQAAXZjBYOdz5hCQqjqxqi6rqjur6gNVdd4a29+x4s99VfUrS26/oqo+ueT2923weh+f5FlJLt+A+7qgqq6uqruq6hUzbH/Q772qTqmqN1TVbVX1z1X1q1V11JJ9D3k7ANDPeuawtWas9c50h7HWDZvBpvd3yJlyxbaHnNnmfWxguxOHgCS5OMndST4ryflJfr2qnniwjccYxz7wZ7rPJ5L8wYrNLliy3RM2aqFVdXyS307yvDHG3Rtwl7cm+ZkkL59l4zW+919LsjfJI5OcmeTpSb5zye5r3Q4A9LOuOSyHnrHWe18z24QZbNaZ8gGzzGxzOTawE4hDsACq6qKq+vUllx9eVfdU1adtwWMfk+TfJ/mxMcYdY4w/T/L6JM+b8S6+IZPg8X83aD1HV9XPVtVN02Mwpn+unZ5l8+okPznG2JCzkcYYrx1jvC7JRw9j95Xf++OT/P4Y45NjjH9O8qYkS4eOtW4HALbYNp/DNuy+tnoGW8UhZ8ojmdk28jjDTiUOwWI4I8k1Sy6fmeR9Y4xPzrJzVV1eVfsO8met037/dZL7xhj/uOS6azN7tHh+kt8ZY4wV1/+3qvpIVf1FVT1jxvtKJq8InZ3ky5KckORPk1yW5LlJvjnJ05L8+PTU4W9auXNVPaqqfrmq/k9VXVJVZ1fVv6qqz6+qn1rHOmax8nv/n0nOnT7eozM57fpNS7Zf63YAYOsd9hx2hDNYcnhz2MFmrCOd6eY9gx1splyPzTo2sON5rwtYDGck+R9LLp+ZyRNWquqVSX5pjHFNVX1/kpPGGD+ydOcxxtccwWMfm+RfVlz3L0mOW2vHqtqdya9GvWDFTT+U5N2ZnLp7bpI/rqozxxjvX+P+jkvyPUmePMb44PS6P0ryTWOMGzL5dIxXrrGsn0ny15mckvzUJD+b5PQk70myYXHoIN/7lUn+U5Lbk+zK5NTr163jdgBg6606h1XV2UnOH2N8W/Kpmez3xhifij5HOIMl65/DDjVjHclMN9cZ7BAz5XpsyrGBLpw5BHNWVQ9NcmqSdyy5+ik58ArWryX5zqp6WpJnJ/mxI3y885e8Ud8bk9yR5PgVmx2fZP8Md/ctSf58jHHj0ivHGH8zxtg/xrhrjPHbSf5iuva1fHmSG8YY1y257uFJ/nmGfR/wg0kemslQcHQm7+nzmUnOS/LYddzPWpZ971X1kCRvTvLaJMckeUQma//5WW4HALbeoeawMcafJnlyVR1fVS9McuvSMHSYj3dEc9gaM9aRzHTznsFWnSnXYxOPDbQgDsH8nZ7kQ2OMjydJVVWSZ2R65tAY46+SfG6SX0nyLWOM+1beQVW9sR78aQ9LB49PGWO8askb9T0ryT8mOaqqTluy2VOSvGuGtX9LJme/rGUkqRm2OynJbQ9cmB6L52b6iRhV9cqqOnP69++vqv+6yn3890w+YvUPp497SZJ9SX4/kzcy3Cgrv/cTMxl8fnU6lHw0yW/lwFCy1u0AwNY75ByWyXP9z2fyfjgXrdx5PTNYsuFzWLJ8xjqS+1prBju7ql6+5PZXVtXKs6aOZAabdaZcj406NtCCXyuD+TsjyclVdWomT5wXJXlckpuSTz05fzLJFWOMf1rtDqbDxWEZY9xZVa9N8uKq+vZMTqV+TpIvPtR+VfXFSR6dFZ8oUVUnZPI76VdmMiB8UyavRn3f9PZXTB/3W1e523cm+TfTAPS+JD+RyRP7701vf+AsqpdlElXOWeU+/vOSgHZlkl9c4/s4KpN/C3cl2VWTN5+8d4xx7yH2edD3Psb4SFXdmORFVfWLmZy+/PwciHyHvB0AmItDzmGZBIubkpyx2mxwJDPYdP+Z57C1ZqxZ7usQc9ghZ7Axxp9W1c/X5BPLzs3qZ1GtawZbsqZVZ8pVtjvozLYRxwa6c+YQzN8Zmfy60RuTXJ/k/2Xye90PvDr1X5L8fZKvnoaizfCdST49k0+IeHWSF40xPvVKyvRVsR9Zsc/zk7x2jLHydNyjM/md8w8n+UiS707ydUs+2eKxmZzm+yBjjKsz+f30N2RyDD47ybPHGPdMb1/zLKrVrlvDj2bysakXJvkP07//6AM3rvN7//pMgtWHM/nf8t4k37+O2wGArbXWHPbQJNeOMT60iWs46By2Yg5Za8Y65H1NrTqHrTWDTR3yLKrDmMEesOpctcoMdqiZbSOODbRWR/Zm8MCRmp5y/JtjjD9a5bYvzeTJ998m+aUkbxxjvGWLl7hhpr/Xf20mb3Z4z1rbr7J/ZTLA/fUY48c3en0AQC+HmsOmtz89yTeOMS7Y2pVtvA2Yw47PgbOoNjOWAXPgzCGYvzMy+RSHZarqpEw+OeP5Y4z7k/x6Jq94bFtjjLvHGJ93OAPJ1FacRQUA9LHqHLbEE7ND3pdmA+awrTiLCpgTZw7BHFXVwzM5ffmYI3iibmGnnUUFAMyXOWx9dtJZVMCDOXMI5miMcdsY46EGkkPbiWdRAQDzZQ5btx1zFhXwYM4cAgAAAGjMmUMAAAAAjYlDAAAAAI2JQwAAAACNHTXvBazmEY94xDjllFPmvQwAYJO8/e1v/8gY46R5r4MDzF8AsPMdbAZbyDh0yimn5Oqrr573MgCATVJVH5j3GljO/AUAO9/BZjC/VgYAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0NhMcaiqzqmq91XV9VV14SG2e2pV3VdV37DefQEAWM4MBgBshTXjUFXtSnJxkmclOT3JN1fV6QfZ7ueTvHm9+wIAsJwZDADYKrOcOXRWkuvHGDeMMe5O8pokz1llu+9O8kdJ9h7GvgAALGcGAwC2xFEzbPPoJB9ccvmWJE9bukFVPTrJc5N8RZKnrmdfNt/evcm+ffNeRXLCCcnJJ897FQCwbZjBDpPZBwDWZ5Y4VKtcN1Zc/uUkPzTGuK9q2eaz7DvZsOqFSV6YJLt3755hWcxi797ktNOS22+f90qS449PrrvOkAQAM9r0GWwnzl9mHwBYv1ni0C1JHrvk8mOS3Lpimz1JXjMdSh6R5NlVde+M+yZJxhgvTfLSJNmzZ8+qAYn127dvMhxdemkyz5nv5puT886brMeABAAz2fQZbCfOX2YfAFi/WeLQVUlOq6rHJ/lQknOTnLd0gzHG4x/4e1W9IsnlY4zXVdVRa+3L1ti9Ozn11HmvAgBYBzPYETD7AMDs1oxDY4x7q+qCTD4BY1eSl48x3lVV3zG9/ZL17rsxSwcA2LnMYADAVpnlzKGMMd6Q5A0rrlt1IBljfOta+wIAsDYzGACwFWb5KHsAAAAAdihxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoLGZ4lBVnVNV76uq66vqwlVuP7+q/mH65y+r6ilLbrupqt5RVddU1dUbuXgAgJ3MDAYAbIWj1tqgqnYluTjJVyW5JclVVfX6Mca7l2x2Y5KnjzFuq6pnJXlpkqctuf2ZY4yPbOC6AQB2NDMYALBVZjlz6Kwk148xbhhj3J3kNUmes3SDMcZfjjFum1786ySP2dhlAgC0YwYDALbEmmcOJXl0kg8uuXxLlr8itdILkrxxyeWR5C1VNZL8xhjjpeteJTvGjTfOewXJCSckJ58871UAwJrMYGyIvXuTffvmuwbzF8BimyUO1SrXjVU3rHpmJoPJly65+kvGGLdW1clJ3lpV7x1j/Nkq+74wyQuTZPfu3TMsi+3ktulrmuecM991JMnxxyfXXWdAAWDhbfoMZv7a+fbuTU47Lbn99vmuw/wFsNhmiUO3JHnsksuPSXLryo2q6slJfjPJs8YYH33g+jHGrdOve6vqskxOkX5QHJq+mvXSJNmzZ8+qgw/b1/79k68veUly1lnzW8fNNyfnnTd59cxwAsCC2/QZzPy18+3bNwlDl16azKv/mb8AFt8sceiqJKdV1eOTfCjJuUnOW7pBVe1O8tokzxtj/OOS649J8pAxxv7p3786yYs3avFsP498ZHLqqfNeBQBsC2YwNszu3WYwAA5uzTg0xri3qi5I8uYku5K8fIzxrqr6juntlyT58SSfmeTXqipJ7h1j7EnyWUkum153VJJLxxhv2pTvBABgBzGDAQBbZZYzhzLGeEOSN6y47pIlf//2JN++yn43JHnKEa4RAKAlMxgAsBVm+Sh7AAAAAHYocQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKAxcQgAAACgMXEIAAAAoDFxCAAAAKCxmeJQVZ1TVe+rquur6sJVbv/cqvqrqrqrqn5wxW03VdU7quqaqrp6oxYOALDTmcEAgK1w1FobVNWuJBcn+aoktyS5qqpeP8Z495LNPpbke5J83UHu5pljjI8c6WIBALowgwEAW2WWM4fOSnL9GOOGMcbdSV6T5DlLNxhj7B1jXJXknk1YIwBAR2YwAGBLrHnmUJJHJ/ngksu3JHnaOh5jJHlLVY0kvzHGeOk69oVNceON815BcsIJycknz3sVACwwMxhssL17k3375r0KcyCweGaJQ7XKdWMdj/ElY4xbq+rkJG+tqveOMf7sQQ9S9cIkL0yS3bt3r+PuYXa33Tb5es45811Hkhx/fHLddQYDAA5q02cw8xed7N2bnHZacvvt816JORBYPLPEoVuSPHbJ5cckuXXWBxhj3Dr9ureqLsvkFOkHxaHpq1kvTZI9e/asZ/CBme3fP/n6kpckZ501v3XcfHNy3nmTV64MBQAcxKbPYOYvOtm3bxKGLr00mWcLNQcCi2iWOHRVktOq6vFJPpTk3CTnzXLnVXVMkoeMMfZP//7VSV58uIuFjfLIRyannjrvVQDAIZnBYBPs3m0OBFhpzTg0xri3qi5I8uYku5K8fIzxrqr6juntl1TVZye5OsnxSe6vqu9LcnqSRyS5rKoeeKxLxxhv2pxvBQBg5zCDAQBbZZYzhzLGeEOSN6y47pIlf//nTE51Xun2JE85kgUCAHRlBgMAtsIsH2UPAAAAwA4lDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANDZTHKqqi6rq7umfN65y+7Oqan9Vjar64/XsCwDA6sxgAMBWWDMOVdXRSX4yyVcleXiSZ1TVv1ux2QeSXJDkLw5jXwAAVjCDAQBbZZYzh741yb+MMa4cY9yZ5Mok37V0gzHGu8cYv53k3vXuCwDAqr41ZjAAYAscNcM2T0jykSWXb0ryxTPe/5HsCzvejTfO9/HvuSc5+uj5riFJTjghOfnkea8CYOGYwYBNtXdvsm/fvFexOLPgIhyPRTkW9DNLHKpVrhsz3v/M+1bVK5N8fZKceOKJM949bE+33Tb5es45813Hojj++OS66zwRAqyw6TOY+Qv62rs3Oe205Pbb572SxZgFF+V4LMKxoKdZ4tB7kzx/yeVTkvzTjPc/875jjOcleV6S7NmzZ9bBB7al/fsnX1/ykuSss+azhr/5m+QHfmC+a0iSm29Ozjtv8iqNJ0GAZTZ9BjN/QV/79k1CyKWXJrt3z28dizILLsLxWJRjQU+zxKHfSfJrVfVlSf4uydOTfNOM938k+8KO98hHJqeeOp/Hvvnm+a8BgEMygwGbbvdus+BSjgddrRmHxhh3VdVPJ/nTTE5RftsY4/VV9arp7edX1RlJ/j7JriSpqnuTPG6M8aHV9t2k7wUAYMcwgwEAW2WWM4cyxnhxkhevuO78JX9/x8Hua7V9AQBYmxkMANgKs3yUPQAAAAA7lDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOAQAAADQmDgEAAAA0Jg4BAAAANCYOMSOcd99yf79k68AACyG+++ffDWjASwucYht7e67k9e+Njn77ORxj0ue/OTJ17PPnlx/993zXiEAQD9LZ7RnPGNy3emnJ2eckfzu7yZ33TXX5QGwgjjEtnXNNcnnf35y4YXJe9+bjDEZRMaYXL7wwsnt114775UCAPSx2oz2gHe+M3nRi5JHPSq56qr5rRGA5cQhtqVrr02+8RuTffuSO+9cfZs775zc/g3fIBABAGyFWWa0O+5IPvax5JnPFIgAFoU4xLZz993JeeclH//4bNt//OOT7f2KGQDA5lnvjHbnnck55/gVM4BFIA6xto9+NLn2msnXBVjHW1790dxzz/p2veee5PLLN3Ydcz0ei7CGRVoHAGwkz2/LzXg8Lr88657R7r47+cM/PIK1bTU/G8s5Hgc4Fmxz4hCH9rrXJWedlZx77uTr614393V85UVn5ew717eOO+9MLr54Y9cxt+OxCGtYpHUAwEby/LbcOo7HxRcf/FfJDuaOO5Kf+7kjXONW8bOxnONxgGPBDlBj6TvELYjjjjtufMEXfMG8l7EjfOITyfz4pyUAAAfJSURBVN/+bfJ5n5c87GHr3Pnee5N3vSsZ9x+4rh6SPPGJyVFHreuu9u9P3v/+5NRTk+OOO/J13J+H5N15Yu7N+tbxOZ+T3HDDxq1jy4/HIqxhg9dx113Je94zeR799E9f5zqAbevKK698+xhjz7zXwQE7Zf5alNlnUZ7ftvJ4XHPN4a/zy788qTr8/WfhZ2M5x2O5wz4eO/BYsLMdbAZb308r285RRyW7dk3+kTmMvZM8ZflVI8k7D38973//Bq3jMN1wwwavY8uPxyKsYePXsWvXup87AWBVizT7LMLz2yIdj4OpSu67b/OP1SIdCz8by23v47HzjgU9LeSP3ROe8IRcccUV817GjrF37+QTI9btYx9LnvGM5K5PHrjuYZ+WXHFFcuKJ6767e+5Jjj56Y9bxiXxanpkrclvWt453vzu5//6NW8eWH49FWMMmrOOEE5KTTz6MdQDbVm32KQKs206avxZl9lmU57etOB733Zecfvrhr/HKKyf/p3iz+dlYzvFY7rCOxw49FuxcB5vBFjIOsbFOPvlw/4E5MfmtH05e8IJJQbjnnuRlL0u+cP3/yB2Z5ev4xP578h/Hy9Ydhp70pMlpohu1jvkcj0VYwyKtAwAebPvPPhtrq47Hk56UvPMwzpZ44hO3JgwlfjZWcjyWO7zjsTOPBf0s5HsO7dmzZ1x99dXzXgYP+PCHk5tuSk45JTnppLmv4w+uOiXf9kMn5Y47Zt/12GOTSy5Jzj9/49Yx1+OxCGtYpHUA205Vec+hBWP+WsLz23IzHo/f/d3kRS/K/Ga0reBnYznH4wDHgm3iYDOYOMS2c9ddyaMeNTmDc1YnnpjceuthvNkeAJtCHFo85i+OlBkNYPEdbAbzUfZsOw97WPKmNyXHHDPb9sccM9ne0AEAsHnMaADblzjEtvTUpyZve9vk1aZjj119m2OPndz+trdNtgcAYHOZ0QC2J3GIbeupT52chnzJJZM3QKyavAdc1eTyJZdMbjd0AABsHTMawPbj08rY1h72sMkbGJ5//uQjVO+4Y/Jq1FZ94gUAAA9mRgPYXsQhdoxdu5LP+Ix5rwIAgKXMaACLz6+VAQAAADS2kB9lX1UfTvKBea+DZR6R5CPzXkSsY9HWkCzOOoDt5XFjjJPmvQgOMH89iOe35RyPAxyL5RyPAxwLtoNVZ7CFjEMsnqq6eoyxxzoWZx2LsIZFWgcAbCTPb8s5Hgc4Fss5Hgc4Fmxnfq0MAAAAoDFxCAAAAKAxcYhZvXTeC5iyjgMWYQ3J4qwDADaS57flHI8DHIvlHI8DHAu2Le85BAAAANCYM4cAAAAAGhOHmFlV/UJVvbeq/qGqLquqE+a0jm+sqndV1f1VtaWfBlBV51TV+6rq+qq6cCsfe8kaXl5Ve6vqnfN4/CXreGxVva2q3jP93+N757keANhoVfXT07nnmqp6S1U9at5rmpd5zl+LYlFmsEVQVZ9WVX9bVddOfy5+at5rmrequqmq3jH99+Lqea8H1kscYj3emuRJY4wnJ/nHJD88p3W8M8nXJ/mzrXzQqtqV5OIkz0pyepJvrqrTt3INU69Ics4cHnele5P8wBjj85J8YZLvmtPxAIDN8gtjjCePMc5McnmSH5/3guZoLvPXgnlFFmMGWwR3JfmKMcZTkpyZ5Jyq+sI5r2kRPHOMcaaPs2c7EoeY2RjjLWOMe6cX/zrJY+a0jveMMd43h4c+K8n1Y4wbxhh3J3lNkuds9SLGGH+W5GNb/birrOOfxhh/N/37/iTvSfLo+a4KADbOGOP2JRePSdL2zTrnOH8tjEWZwRbBmLhjevHo6Z+2/33ATiAOcbi+Lckb572ILfboJB9ccvmWiCFJkqo6JcnnJ/mb+a4EADZWVf1sVX0wyfnpfeYQLFNVu6rqmiR7k7x1jNF9DhxJ3lJVb6+qF857MbBeR817ASyWqvrfST57lZsuGmP8r+k2F2XyK0Wvmuc65qBWua79KyRVdWySP0ryfSteYQWAhbfWzDHGuCjJRVX1w0kuSPITW7rALbSg8xcLaoxxX5Izp+9DellVPWmM0fn9mL5kjHFrVZ2c5K1V9d7p2WawLYhDLDPG+MpD3V5Vz0/yNUnOHmNsWhhZax1zckuSxy65/Jgkt85pLQuhqo7OJAy9aozx2nmvBwDWax0zx6VJ/iQ7OA4t6PzFghtj7KuqKzJ5P6a2cWiMcev0696quiyTt6QQh9g2/FoZM6uqc5L8UJKvHWN8fN7rmYOrkpxWVY+vqocmOTfJ6+e8prmpqkrysiTvGWO8ZN7rAYCNVlWnLbn4tUneO6+1wCKpqpMe+OTiqvr0JF+Zxv99VNUxVXXcA39P8tVpHMrYnsQh1uNXkxyXyWmS11TVJfNYRFU9t6puSfJFSf6kqt68FY87fTPuC5K8OZM3X/79Mca7tuKxl6qqVyf5qyRPqKpbquoFW72GqS9J8rwkXzH9ebimqp49p7UAwGb4uap6Z1X9Qyb/Z+97572geZnX/LVIFmgGWwSPTPK26X8bV2XynkOXz3lN8/RZSf68qq5N8rdJ/mSM8aY5rwnWpTbxN4MAAAAAWHDOHAIAAABoTBwCAAAAaEwcAgAAAGhMHAIAAABoTBwCAAAAaEwcAgAAAGhMHAIAAABoTBwCAAAAaOz/A0J1FG2O4/dNAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "plt.figure(figsize=(20,10)) \n", "\n", "# values\n", "range_x = np.array([-2,-1,0,1,2])\n", "range_y = np.array([-3,-1,1,3,5])\n", "xpmf_values = np.array([0.4, 0.25, 0.15, 0.1, 0.1])\n", "ypmf_values = np.array([0.4, 0.25, 0.15, 0.1, 0.1])\n", "\n", "# mean \n", "mean_x = np.average(range_x, weights = xpmf_values)\n", "mean_y = np.average(range_y, weights = ypmf_values)\n", "\n", "# variance\n", "mean_x2 = np.average(np.power(range_x,2), weights = xpmf_values)\n", "mean_y2 = np.average(np.power(range_y,2), weights = ypmf_values)\n", "var_x = mean_x2 - mean_x**2\n", "var_y = mean_y2 - mean_y**2\n", "\n", "# set up the figure\n", "fig, [ax1, ax2] = plt.subplots(1,2, num=1, clear=True)\n", "\n", "ax1.set_ylim(-0.01,0.5) \n", "ax1.set_xlim(-5, 10)\n", "ax1.axhline(y=0, color='k')\n", "ax1.set_xticks(range_x)\n", "ax1.set_yticks(xpmf_values)\n", "\n", "ax2.set_ylim(-0.01, 0.5) \n", "ax2.set_xlim(-5, 10)\n", "ax2.axhline(y=0, color='k')\n", "ax2.set_xticks(range_y)\n", "ax2.set_yticks(ypmf_values)\n", "\n", "\n", "# \n", "ax1.scatter(np.array([mean_x]),np.zeros(1), color =\"blue\", s=200)\n", "ax1.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", "ax1.bar(range_x, xpmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", "ax1.set_title(r\"$\\mu_X=${:.2f}, $\\sigma^2_X\\approx${:.3f}\".format(mean_x, var_x))\n", "\n", "\n", "# \n", "ax2.scatter(np.array([mean_y]),np.zeros(1), color =\"blue\", s=200)\n", "ax2.scatter(range_y,np.zeros(range_y.shape), color =\"red\", s=20)\n", "ax2.bar(range_y, ypmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", "ax2.set_title(r\"$\\mu_Y=${:.2f}, $\\sigma^2_Y\\approx${:.3f}\".format(mean_y, var_y))\n", "\n", "plt.show();\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bernoulli trials\n", "\n", "\n", "**Definition**\n", " \n", "
\n", "\n", "In Bernoulli experiment the outcome is either of Type I (success) with probability $p$ or Type II (fail) with probability $q=1-p$. \n", "\n", "
\n", "\n", "A **Bernoulli trials** are the outcomes of several independent Bernoulli experiments with the same success probabilities.\n", "\n", "**Example**\n", "Consider the coin flip experiment with a biased coin that lands head with probability $p$ and tail with probability $q=1-p$.A single coin flip will be a Bernoulli experiment.$n$ independent coin flips will be a Bernoulli trial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binomial distribution\n", "\n", "\n", "**Definition**\n", "\n", "
\n", "\n", "Let $X$ denote the number of successes in $n$ Bernoulli trials with success probability $p$. Then we say that $X$ has the \\textbf{binomial distribution} with parameters $(n,p)$. \n", "\n", "
\n", " \n", "If $X$ has binomial distribution with parameters $(n,p)$ then \n", "* $\\text{range}(X)=\\{0,1,\\dots, n\\}$.\n", "* $$f(x)={n\\choose x}p^xq^{(n-x)}$$ ${n\\choose x}$ ways there can be $x$ successes: the rest $n-x$ are failures." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1bd13f24cdd2437381c913d68f14707e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=50, description='n', min=1, readout_format=''), FloatSlider(value=0.5, d…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from ipywidgets import interact, FloatSlider, IntSlider\n", "from scipy.special import comb\n", "\n", "def binomial_pmf(n, p):\n", " q = 1-p\n", " range_x = np.arange(0, n, 1)\n", "\n", " def binom_pmf(x):\n", " pmf_val = comb(n, x, exact=True) * p**x * q**(n-x)\n", " return pmf_val\n", " mean = n*p\n", "\n", " pmf_values = np.array([binom_pmf(x) for x in range_x])\n", " \n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.2))\n", " plt.xlim(-1, n+1)\n", " plt.xticks(np.arange(0, n+1, 5))\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " #plt.title(\"Binomial distribution\")\n", " plt.figtext(0.8,0.8, \" n={} \\n p={}\".format(n, p), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "p = FloatSlider(min=0.01, max=1, step=0.01, value=0.5, readout_format='')\n", "n = IntSlider(min=1, max=100, step=1, value=50, readout_format='')\n", "# display the interactive plot\n", "interact(binomial_pmf, n=n, p=p);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Binomial vs hypergeometric\n", "\n", "In the hypergeometric distribution with paramaters $(N,K,n)$ when $N>>n$, it is very close to the binomial distribution with paameters $(n,\\frac{K}{N})$. In the plot below you can increase the value of the $N$ while leaving $n$ fixed to see that their histograms approach one another. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0c11ebd4e4a84e9e9c0512f5f4739f0f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=200, description='N', max=200, min=1, readout_format=''), IntSlider(valu…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from scipy.special import comb\n", "from ipywidgets import interact, IntSlider\n", "\n", "def hypergeo_binom_pmf(N=80,K=40,n=30):\n", " p = K/N\n", " q = 1-p\n", " brange_x = np.arange(0, n, 1)\n", " hrange_x = np.arange(max(0, n-(N-K)), min(n, K)+1, 1)\n", "\n", " def hyper_pmf(N,K,n,i):\n", " pmf_val = comb(K, i, exact=True) * comb(N-K, n-i, exact=True) / comb(N, n, exact=True)\n", " return pmf_val\n", " mean = n * K / N\n", "\n", " hpmf_values = np.array([hyper_pmf(N,K,n,i) for i in hrange_x])\n", " \n", " def binom_pmf(x):\n", " pmf_val = comb(n, x, exact=True) * p**x * q**(n-x)\n", " return pmf_val\n", " mean = n*p\n", "\n", " bpmf_values = np.array([binom_pmf(x) for x in brange_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(hpmf_values)+0.05, np.max(bpmf_values)+0.05, 0.2))\n", " plt.xlim(-2,max(np.max(hrange_x), np.max(brange_x))+2)\n", "\n", " # Plotting hypergeometric\n", " plt.bar(hrange_x, hpmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', \n", " linewidth=1.3, label=\"Hypergeometric\")\n", " \n", " # Plotting binomial\n", " plt.bar(brange_x, bpmf_values, width=1, color=(0.3, 0.5, 0.3, 0.1), edgecolor='green',\n", " linewidth=1.3, label=\"Binomial\")\n", " plt.figtext(0.78,0.75, \" N={} \\n K={} \\n n={}\".format(N,K,n), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.legend()\n", " plt.plot();\n", "\n", "# create interactive variables\n", "N = IntSlider(min=1.0, max=200.0, step=1.0, value=300, readout_format='')\n", "K = IntSlider(min=1.0, max=N.value, step=1.0, value=50, readout_format='')\n", "n = IntSlider(min=1.0, max=N.value, step=1.0, value=20, readout_format='')\n", "\n", "# enforce K<=N and n<=N\n", "def update_range(*args):\n", " K.max = N.value\n", " n.max = N.value\n", "N.observe(update_range, 'value')\n", "\n", "# display the interactive plot\n", "interact(hypergeo_binom_pmf, N=N, K=K, n=n);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### cdf of the binomial distribution\n", "\n", "The cdf of the binomial distribution can be computed as \n", "$$F(x)=E[X\\leq x]=\\sum_{y=0}^{\\lfloor x\\rfloor}{n\\choose x}p^xq^{(n-x)}\\text{ for } x\\in [0,n].$$\n", "\n", "However, $F(x)$ does not have a simple closed-form formula. In old times people would use statistical tables that show precomputed values of the cdf for different choices of parameters $(n,p)$. Nowdays, computers can give you very precise values for the cdf for any $n,p,x$.\n", "\n", "In the plot below, the shaded area is the value of the cdf at point $x$. The point $x$ is the black dot on the plot. The value of the cdf up to 4 digit precision is given at the top of the plot. " ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "15c40ff3803a4934a4c163e4088549b3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=3, description='n', min=1, readout_format=''), FloatSlider(value=0.25, d…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from ipywidgets import interact, FloatSlider, IntSlider\n", "from scipy.special import comb\n", "from scipy.stats import binom\n", "\n", "def binomial_cdf(n, p, x):\n", " q = 1-p\n", " range_x = np.arange(0, n, 1)\n", "\n", " def binom_pmf(x):\n", " pmf_val = comb(n, x, exact=True) * p**x * q**(n-x)\n", " return pmf_val\n", " mean = n*p\n", "\n", " pmf_values = np.array([binom_pmf(x) for x in range_x])\n", " \n", " # plot setup\n", " plt.figure(figsize=(20,10)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.2))\n", " plt.xlim(-4, n+4)\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(x,np.zeros(1), color =\"black\", s=100)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " barlist = plt.bar(range_x, pmf_values, width=1, color=(1, 1, 1, 0), edgecolor='blue', linewidth=1.3, label=\"Histogran\")\n", " for ind, val in enumerate(range_x):\n", " if val<=x:\n", " barlist[ind].set_color((0.1, 0.1, 1, 0.15))\n", " barlist[ind].set_edgecolor(\"blue\") \n", " cdf = binom.cdf(x, n, p)\n", " plt.title(\"cdf and pdf of binomial distribution\" )\n", " plt.figtext(0.8,0.8, \" n={} \\n p={} \\n x={} \\n cdf = {:.4f}\".format(n, p, x, cdf), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "p = FloatSlider(min=0.01, max=1, step=0.01, value=0.25, readout_format='')\n", "n = IntSlider(min=1, max=100, step=1, value=3, readout_format='')\n", "x = FloatSlider(min=-4, max=10, step=0.1, value=1.75, readout_format='')\n", "\n", "# enforce values for x\n", "def update_range(*args):\n", " x.max = max(n.value+4,10)\n", "n.observe(update_range, 'value')\n", "\n", "# display the interactive plot\n", "interact(binomial_cdf, n=n, p=p, x=x);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Negative binomial distribution\n", "\n", "\n", "**Definition**\n", "\n", "
\n", "\n", "Let $X$ be the number of independent Bernoulli trials with success probability $p$ needed until exactly r successes occur. Then we say $X$ has the **negative binomial distribution** with parameters $(r,p)$.\n", "
\n", "\n", "\n", "* $\\text{range}(X)=\\{r,r+1,\\dots\\}.$\n", "*If $X(s)=x$, means there are $r-1$ successes and $x-r$ fails in the first $x-1$ trials and the $x-th $ trial is a success. Therefore\n", "\n", "$$f(x)={x-1\\choose r-1}p^{r}q^{x-r}.$$\n", "\n", "* When $r=1$, the negative binomial distribution is the same as the geometric distribution. " ] }, { "cell_type": "code", "execution_count": 166, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4f8724ce4b254192b9fc7fac1afbfdeb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=30, description='r', max=50, min=1, readout_format=''), FloatSlider(valu…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from ipywidgets import interact, FloatSlider\n", "\n", "\n", "def negbinomial_pmf(r, p):\n", " q = 1-p\n", " N=100\n", " range_x = np.arange(r, N, 1)\n", "\n", " def negbin_pmf(x):\n", " pmf_val = comb(x-1, r-1, exact=True) * p**r * q**(x-r)\n", " return pmf_val\n", " mean = r/p\n", "\n", " pmf_values = np.array([negbin_pmf(x) for x in range_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.1))\n", " plt.xlim(np.min(range_x)-2, N+1)\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " plt.title(\"Negative binomial distribution\")\n", " plt.figtext(0.8,0.8, \" r={} \\n p={} \".format(r,p), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "r = IntSlider(min=1, max=50, step=1, value=30, readout_format='')\n", "p = FloatSlider(min=0.01, max=1, step=0.01, value=0.5, readout_format='')\n", "\n", "# display the interactive plot\n", "interact(negbinomial_pmf, r=r, p=p);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Poisson distribution\n", "\n", "\n", "**Definition**\n", "\n", "
\n", "\n", "A discrete random variable has Poisson distribution with parameter $\\lambda>0$ if \n", "\n", "$$\\text{range}(X)=\\{0,1,2,3,\\dots\\}$$\n", "\n", "and \n", "\n", "$$f(x)=\\frac{e^{-\\lambda}\\lambda^x}{x!},\\quad x\\in\\text{range}(X).$$\n", "\n", "$\\lambda$ is called the **expected rate**.\n", "
\n", "\n", " Notice that $f(x)$ is indeed a pmf:\n", " \n", "$$\\sum_{x\\in \\text{range}(X)}f(x)=\\sum_{x=0}^\\infty \\frac{e^{-\\lambda}\\lambda^x}{x!}=e^{-\\lambda} \\sum_{x=0}^\\infty \\frac{\\lambda^x}{x!}=e^{-\\lambda}e^{\\lambda}=1.$$\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "fbe79733d35b4d08b521e57f9bcff687", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=30.0, description='$\\\\lambda$', readout_format='', step=0.5), Output()…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from scipy.special import factorial\n", "from ipywidgets import interact, FloatSlider\n", "\n", "\n", "def poisson_pmf(lam):\n", " N=50\n", " range_x = np.arange(0, N, 1)\n", "\n", " def poiss_pmf(x):\n", " pmf_val = np.exp(-lam) * np.power(lam, x) / factorial(x)\n", " return pmf_val\n", " mean = lam\n", "\n", " pmf_values = np.array([poiss_pmf(x) for x in range_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.1))\n", " plt.xlim(0, N+1)\n", " plt.xticks(np.arange(0, N+1, 5))\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " plt.title(\"Poisson distribution\")\n", " plt.figtext(0.8,0.8, r\"$\\lambda$={}\".format(lam), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "lam = FloatSlider(min=0.0, max=100, step=0.5, value=30, readout_format='', description=r\"$\\lambda$\")\n", "\n", "# display the interactive plot\n", "interact(poisson_pmf, lam=lam);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Motivation \n", "\n", "The Poisson distribution arises in the following way (made up scenario): grocery route bus can arrive every $\\frac{1}{n}$ hour. It is a very unpredictable bus but we know that there are on average $\\lambda$ buses arriving every hour. Assuming whether a bus arrives or not at given time is independent from other times, this becomes a binomial distribution with parameter $p=\\frac{\\lambda}{n}$. We know that the expect number of successes in a binomial distribution with parameters $(n,p)$ is $np$ and so $p=\\frac{\\lambda}{n}$. Therefore, the probability of $x$ number of buses arriving in an hour will be\n", "\n", "$$f(x)={n\\choose x}p^x(1-p)^{n-x}={n\\choose x}\\left(\\frac{\\lambda}{n}\\right)^x\\left(1-\\frac{\\lambda}{n}\\right)^{n-x},\\quad x=0,1,2,\\dots, n. $$\n", "\n", "* Now imagine the bus starts arriving at more irregular times. To accommodate that we can increase the $n$. \n", "* When $n\\to \\infty$ that will correspond to the buses potentially arriving at any moment in the continuous interval. \n", "* In that case\n", "\n", "$$f(x)=\\lim_{n\\to \\infty}{n\\choose x}\\left(\\frac{\\lambda}{n}\\right)^x\\left(1-\\frac{\\lambda}{n}\\right)^{n-x}, \\quad x=0,1,2,\\dots.$$\n", "\n", "To compute this limit notice that \n", "\n", "$$\\lim_{n\\to \\infty}{n\\choose x}\\left(\\frac{\\lambda}{n}\\right)^x=\\frac{\\lambda^x}{x!}\\lim_{n\\to \\infty}\\frac{n(n-1)\\cdots (n-x+1)}{•n^x}=\\frac{\\lambda^x}{x!}$$\n", "\n", "and (you should know this from calculus)\n", "\n", "$$\\lim_{n\\to \\infty}\\left(1-\\frac{\\lambda}{n}\\right)^{n-x}=e^{-\\lambda}.$$\n", "\n", "Consequently, if the buses can potentially arrive any moment within the hour, with $\\lambda$ buses arriving in average at every hour, then the probability that $x$ buses will arrive in a given hour is \n", "$$f(x)=\\frac{e^{-\\lambda}\\lambda^x}{x!},\\quad x=0,1,2,\\dots.$$\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "21b928516720454293bcea5371e0eb83", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=100, description='n', max=200, min=6, readout_format=''), FloatSlider(va…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from scipy.special import comb, factorial\n", "from ipywidgets import interact, IntSlider, FloatSlider\n", "\n", "def poiss_binom_pmf(n, lam):\n", " p = lam/n\n", " q = 1-p\n", " N = 50\n", " brange_x = np.arange(0, n, 1)\n", " prange_x = np.arange(0, N, 1)\n", "\n", " def poiss_pmf(x):\n", " pmf_val = np.exp(-lam) * np.power(lam, x) / factorial(x)\n", " return pmf_val\n", " mean = lam\n", "\n", " ppmf_values = np.array([poiss_pmf(x) for x in prange_x])\n", " \n", " def binom_pmf(x):\n", " pmf_val = comb(n, x, exact=True) * p**x * q**(n-x)\n", " return pmf_val\n", " mean = n*p\n", "\n", " bpmf_values = np.array([binom_pmf(x) for x in brange_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.xlim(-2,20)\n", "\n", " # Plotting hypergeometric\n", " plt.bar(prange_x, ppmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', \n", " linewidth=1.3, label=\"Poisson\")\n", " \n", " # Plotting binomial\n", " plt.bar(brange_x, bpmf_values, width=1, color=(0.3, 0.5, 0.3, 0.1), edgecolor='green',\n", " linewidth=1.3, label=\"Binomial\")\n", " plt.figtext(0.83,0.75, r\" $\\lambda$={}\".format(lam)+\"\\n n = {}\".format(n), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.legend()\n", " plt.plot();\n", "\n", "# create interactive variables\n", "lam = FloatSlider(min=0.0, max=100, step=0.5, value=5, readout_format='', description=r\"$\\lambda$\")\n", "n = IntSlider(min=np.floor(lam.value)+1, max=200, step=1, value=100, readout_format='')\n", "\n", "# enforce K<=N and n<=N\n", "def update_range(*args):\n", " n.min = np.floor(lam.value)+1\n", " \n", "lam.observe(update_range, 'value')\n", "\n", "# display the interactive plot\n", "interact(poiss_binom_pmf, lam=lam, n=n);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us see that the Poisson distribution with parameter $\\lambda$ and the binomial distribution with parameters $(n,p=\\frac{\\lambda}{n})$ are close to each other when $n$ is large by looking at their histograms.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }