{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Week 4: Mathematical expectation\n", "\n", "\n", " #### [Back to main page](https://petrosyan.page/fall2020math3215)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean value of the hypergeometric distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean value of the hypergeometric distribution with parameters $(N,K,n)$ is equal to \n", "\n", "$$\\mu=\\frac{nK}{N}.$$\n", "\n", "As we discussed in class, the mean value was in a sense the center of the histogram: the point around which most of the histogram was concentrated. This is very vividly observed for the hypergeometric distribution. The plot bellow shows the mean value in blue for different values of $(N,K,n)$." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b8355df531f74e289ec832d171d24e3f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=80, description='N', min=1, readout_format=''), IntSlider(value=40, desc…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from scipy.special import comb\n", "from ipywidgets import interact, IntSlider\n", "\n", "def hypergeometric_pmf(N=80,K=40,n=30):\n", " range_x = np.arange(max(0, n-(N-K)), min(n, K)+1, 1)\n", "\n", " def hyper_pmf(N,K,n,i):\n", " pmf_val = comb(K, i, exact=True) * comb(N-K, n-i, exact=True) / comb(N, n, exact=True)\n", " return pmf_val\n", " mean = n * K / N\n", "\n", " pmf_values = np.array([hyper_pmf(N,K,n,i) for i in range_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.2))\n", " plt.xlim(-2,N+2)\n", " plt.xticks(np.arange(0, N+1, 5))\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " plt.title(\"Hypergeometric distribution\")\n", " plt.figtext(0.75,0.8, \" N={} \\n K={} \\n n={} \\n expectation={:.2f}\".format(N,K,n, mean), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "N = IntSlider(min=1.0, max=100.0, step=1.0, value=80, readout_format='')\n", "K = IntSlider(min=1.0, max=N.value, step=1.0, value=40, readout_format='')\n", "n = IntSlider(min=1.0, max=N.value, step=1.0, value=30, readout_format='')\n", "\n", "# enforce K<=N and n<=N\n", "def update_range(*args):\n", " K.max = N.value\n", " n.max = N.value\n", "N.observe(update_range, 'value')\n", "\n", "# display the interactive plot\n", "interact(hypergeometric_pmf, N=N, K=K, n=n);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Geometric distribution\n", "\n", "A random trial has a probability of success equal to $p$ and probability of failure $q=1-p$.Consider the following experiment: we are doing consecutive random trials until we reach a success.The set of outcomes has the form $s=\\overbrace{FF\\cdots F}^{n-1} S$ where number of $F$'s can be any number $n=1,2,\\dots$. Due to independence, $P(s)=q^np$. \n", "\n", "Let $X(s)$ denote the number of trials it took to reach success\n", "\n", "$$X(\\overbrace{FF\\cdots F}^{n-1} S)=n.$$\n", "\n", "Observe that \n", "\n", "$$f(n)=q^{n-1}p,\\quad n=1,2,\\dots.$$\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Definition (Geometric distribution)**\n", "\n", "
\n", "\n", "The pmf of the random variable $X$ is called Geometric distribution.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c001f78008d2491e8e4cd09f155feb73", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=0.5, description='p', max=1.0, min=0.01, readout_format='', step=0.01)…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# nbi:hide_in\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from ipywidgets import interact, FloatSlider\n", "\n", "\n", "def geometric_pmf(p=0.5):\n", " q = 1-p\n", " N=50\n", " range_x = np.arange(1, N, 1)\n", "\n", " def geo_pmf(n):\n", " pmf_val = q**(n-1)*p\n", " return pmf_val\n", " mean = 1/p\n", "\n", " pmf_values = np.array([geo_pmf(n) for n in range_x])\n", "\n", " # plot setup\n", " plt.figure(figsize=(14,7)) \n", " plt.axhline(y=0, color='k')\n", " plt.ylim(-0.02, max(np.max(pmf_values)+0.05, 0.2))\n", " plt.xlim(0, N+1)\n", " plt.xticks(np.arange(0, N+1, 5))\n", "\n", " # PLotting with plt.bar instead of plt.hist works better when f(x) are knowwn\n", " plt.scatter(np.array([mean]),np.zeros(1), color =\"blue\", s=200)\n", " plt.scatter(range_x,np.zeros(range_x.shape), color =\"red\", s=20)\n", " plt.bar(range_x, pmf_values, width=1, color=(0.1, 0.1, 1, 0.1), edgecolor='blue', linewidth=1.3)\n", " plt.title(\"Geometric distribution\")\n", " plt.figtext(0.8,0.8, \"p={}\".format(p), ha=\"left\", va=\"top\",\n", " backgroundcolor=(0.1, 0.1, 1, 0.15), fontsize=\"large\")\n", " plt.plot();\n", "\n", "# create interactive variables\n", "p = FloatSlider(min=0.01, max=1, step=0.01, value=0.5, readout_format='')\n", "\n", "# display the interactive plot\n", "interact(geometric_pmf, p=p);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean and Variance under linear transformation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Theorem**\n", "\n", "
\n", "\n", "Suppose $X$ is a random variable with mean $\\mu_X$ and standard deviation $\\sigma_Y$. Let $Y=aX+b$ where $a$ and $b$ are any two numbers. Then mean and the standard deviation of $Y$ are\n", "$$\\mu_Y=a\\mu_X+b,\\quad \\sigma_Y=|a|\\sigma_X.$$\n", "\n", "
\n", " \n", "**Proof**\n", "$\\mu_Y=E[aX+b]=aE[X]+b=a\\mu_X+b.$ Notice that\n", "\n", "$$\\text{Var}(Y)=E[(Y-E[Y])^2]=E[(aX+b-aE[X]-b)^2]=E[(aX-aE[X])^2]=a^2\\text{Var}(X).$$\n", "\n", "Taking square roots, we have $\\sigma_Y=|a|\\sigma_X$. $\\blacksquare$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example**\n", "\n", "Let $X$ be a random variable with $\\text{range}(X)=\\{-2,-1,0,1,2\\}$ and \n", "\n", "$$f_X(-2)=0.4, \\quad f_X(-1)=0.25, \\quad f_X(0)=0.15, \\quad f_X(1)=0.1, \\quad f_X(2)=0.1.$$\n", "\n", "Take the random variable $Y=2X+1$.Notice that $\\text{range}(Y)=\\{-3,-1,1,3,5\\}$ and \n", "\n", "$$_Y(-3)=0.4, \\quad f_Y(-1)=0.25, \\quad f_Y(1)=0.15, \\quad f_Y(3)=0.1, \\quad f_Y(5)=0.1.$$\n", "\n", "It can be checked that \n", "\n", "$$\\mu_X=-0.75, \\quad \\sigma_X\\approx 1.414$$\n", "\n", "and thus \n", "\n", "$$\\mu_Y=-0.5, \\quad \\sigma_Y\\approx 2.828.$$" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "