{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Week 4: Mathematical expectation\n", "\n", "\n", " #### [Back to main page](https://petrosyan.page/fall2020math3215)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean value of the hypergeometric distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean value of the hypergeometric distribution with parameters $(N,K,n)$ is equal to \n", "\n", "$$\\mu=\\frac{nK}{N}.$$\n", "\n", "As we discussed in class, the mean value was in a sense the center of the histogram: the point around which most of the histogram was concentrated. This is very vividly observed for the hypergeometric distribution. The plot bellow shows the mean value in blue for different values of $(N,K,n)$." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b8355df531f74e289ec832d171d24e3f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=80, description='N', min=1, readout_format=''), IntSlider(value=40, desc…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from scipy.special import comb\n", "from ipywidgets import interact, IntSlider\n", "\n", "def hypergeometric_pmf(N=80,K=40,n=30):\n", " range_x = np.arange(max(0, n-(N-K)), min(n, K)+1, 1)\n", "\n", " def hyper_pmf(N,K,n,i):\n", " pmf_val = comb(K, i, exact=True) * comb(N-K, n-i, exact=True) / comb(N, n, exact=True)\n", " return pmf_val\n", " mean = n * K / N\n", "\n", " pmf_values = np.array([hyper_pmf(N,K,n,i) for i in range_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.2))\n", " plt.xlim(-2,N+2)\n", " plt.xticks(np.arange(0, N+1, 5))\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " plt.title(\"Hypergeometric distribution\")\n", " plt.figtext(0.75,0.8, \" N={} \\n K={} \\n n={} \\n expectation={:.2f}\".format(N,K,n, mean), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "N = IntSlider(min=1.0, max=100.0, step=1.0, value=80, readout_format='')\n", "K = IntSlider(min=1.0, max=N.value, step=1.0, value=40, readout_format='')\n", "n = IntSlider(min=1.0, max=N.value, step=1.0, value=30, readout_format='')\n", "\n", "# enforce K<=N and n<=N\n", "def update_range(*args):\n", " K.max = N.value\n", " n.max = N.value\n", "N.observe(update_range, 'value')\n", "\n", "# display the interactive plot\n", "interact(hypergeometric_pmf, N=N, K=K, n=n);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Geometric distribution\n", "\n", "A random trial has a probability of success equal to $p$ and probability of failure $q=1-p$.Consider the following experiment: we are doing consecutive random trials until we reach a success.The set of outcomes has the form $s=\\overbrace{FF\\cdots F}^{n-1} S$ where number of $F$'s can be any number $n=1,2,\\dots$. Due to independence, $P(s)=q^np$. \n", "\n", "Let $X(s)$ denote the number of trials it took to reach success\n", "\n", "$$X(\\overbrace{FF\\cdots F}^{n-1} S)=n.$$\n", "\n", "Observe that \n", "\n", "$$f(n)=q^{n-1}p,\\quad n=1,2,\\dots.$$\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Definition (Geometric distribution)**\n", "\n", "