{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The Kaplan-Wald Confidence Bound for a Nonnegative Mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section presents an approach to finding lower confidence bounds for the mean of a nonnegative random variable described in H.M. Kaplan's (no longer online) website, http://printmacroj.com/martMean.htm. (An Internet Archive version is available at http://web.archive.org/web/20150220073515/http://printmacroj.com/martMean.htm.) That work fleshes out an idea due to Wald (1945, 2004), and is closely related to a technique presented in Kaplan (1987) based on martingale theory.\n", "\n", "We start with a version for sampling with replacement, then develop finite-population (sampling without replacement) versions.\n", "\n", "For another derivation of the IID version, see Stark & Teague (2014) https://www.usenix.org/system/files/jets/issues/0301/overview/jets_0301-stark.pdf\n", "\n", "\n", "We have an IID sequence of random variables $X_1, X_2, \\ldots$ such that $\\mathbb P \\{X_j \\ge 0 \\} = 1$. Let $F$ be their common distribution function. We seek a lower confidence bound for their common expectation $\\mu := \\mathbb{E} X_1 = \\int_0^\\infty x dF$. \n", "\n", "Under the hypothesis that $\\mu = t$, \n", "\n", "\\begin{equation*} \n", " t^{-1} \\mu = t^{-1} \\int_0^\\infty xdF(x) = \\int_0^\\infty xt^{-1} dF(x) = 1.\n", "\\end{equation*}\n", "Fix $\\gamma \\in [0, 1]$.\n", "Because $\\int_0^\\infty dF = 1$, it follows that if $\\mu = t$,\n", "\n", "\\begin{equation*} \n", " \\mathbb{E} \\left ( (\\gamma/t) X_j + (1-\\gamma) \\right ) = (\\gamma/t) \\mathbb{E} X_j + (1-\\gamma) = \n", " (\\gamma/t)t + (1-\\gamma) = 1.\n", "\\end{equation*}\n", "Now,\n", "\\begin{equation*} \n", " \\mathbb{E} \\left ((\\gamma/t) X_j + (1-\\gamma) \\right ) := \n", " \\int_0^\\infty \\left (x \\gamma/t + (1-\\gamma) \\right ) dF(x).\n", "\\end{equation*}\n", "Since for $x \\ge 0$, $(x \\gamma/t + (1-\\gamma)) \\ge 0$, it follows that if we define\n", "\n", "\\begin{equation*} \n", " dG := (x \\gamma/t + (1-\\gamma))dF\n", "\\end{equation*}\n", "\n", "then $G$ is the cdf of a nonnegative random variable.\n", "\n", "Let $Y$ be a random variable with cdf $G$ and let $X$ be a random variable with cdf $F$.\n", "By Jensen's inequality, $\\mathbb{E} X^2 \\ge (\\mathbb{E} X)^2 = t \\cdot \\mathbb{E} X$ (under the null that $\\mathbb{E} X = t$).\n", "Since $\\mathbb{E} X = t \\ge 0$,\n", "\\begin{align}\n", " \\mathbb{E} Y &= \\int_0^\\infty x dG(x) \\\\\n", " &= \\int_0^\\infty x (x \\gamma/t + (1-\\gamma)) dF(x) \\\\\n", " &= \\gamma/t \\int_0^\\infty x^2 dF(x) + (1-\\gamma) \\int_0^\\infty x dF(x) \\\\\n", " &= \\gamma/t \\cdot \\mathbb{E} X^2 + (1-\\gamma) \\cdot \\mathbb{E} X \\\\\n", " &\\ge \\gamma \\cdot \\mathbb{E} X + (1-\\gamma) \\cdot \\mathbb{E} X \\mbox{ (Jensen's inequality and $\\mathbb{E} X = t$)}\\\\\n", " &= \\mathbb{E} X = t\n", "\\end{align}\n", "If the data allow us to reject the hypothesis $H_0$ that $\\{ X_j\\}$ are IID with cdf $F$\n", "in favor of the alternative hypothesis $H_1$ that they are iid with cdf $G$,\n", "we have strong statistical evidence that $\\mu = \\mathbb{E} X_j > t$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the SPRT to test $H_1$ versus $H_0$\n", "\n", "Now a bit of magic occurs. For a given observation $X_j = x_j$, what is the ratio of its \n", "probability if $H_1$ is true to its probability if $H_0$ is true?\n", "\\begin{equation*} \n", " \\mathrm{PR} = \\frac{dG(x_j)}{dF(x_j)} = \\frac{(x_j\\gamma/t+(1−\\gamma))dF(x_j)}{dF(x_j)} = (x_j\\gamma/t+(1−\\gamma)).\n", "\\end{equation*}\n", "This doesn't depend on $F$ or $G$!\n", "\n", "The $\\mathrm{PR} $ for observations $X_1, \\ldots, X_m$ is\n", "\\begin{equation*} \n", " \\mathrm{PR} = \\prod_{i=1}^m \\left ( (\\gamma/t) X_i + 1 - \\gamma \\right ).\n", "\\end{equation*}\n", "This expression shows why $\\gamma$ was introduced in the first place:\n", "for $\\gamma = 1$, if even a single observation turned out to be zero, $\\mathrm{PR} $ would forever be\n", "zero no matter how large subsequent observations turned out to be.\n", "Taking $\\gamma < 1$ hedges against that possibility.\n", "Any value of $\\gamma \\in [0, 1]$ gives a conservative test, but smaller values provide more \"protection\"\n", "against small values of $X_j$ (but incur a loss of power when all $X_j$ are large).\n", "\n", "Recall that the $1/\\mathrm{PR} $ is the $P$-value of $H_0: \\mu = t$ based on the observations $\\{X_j\\}_{j=1}^m$.\n", "We will use this to find a lower confidence bound for $\\mu$.\n", "\n", "### \"Look back\" and the SPRT\n", "There's more: recall that the SPRT says the chance that $\\mathrm{PR} $ _ever_ gets larger than $1/\\alpha$ is at most $\\alpha$ if $H_0$ is true.\n", "That means that we can use the observations sequentially, maximizing over the partial products.\n", "If any of the partial products exceeds $1/\\alpha$, we can reject $H_0$--even if the ratio becomes small again.\n", "\n", "That is, our level-$\\alpha$ test based on a sample of size $n$ is\n", "\\begin{equation*} \n", " \\mbox{ Reject } H_0 \\mbox{ if } \\max_{m=1}^n \\prod_{i=1}^m \\left ( \\gamma X_i/t + 1 - \\gamma \\right ) \\ge 1/\\alpha.\n", "\\end{equation*}\n", "\n", "It is _only_ legitimate to do this maximization if the data are in random order. \n", "For instance, it's impermissible to sort the data from largest to smallest. \n", "And if you maximize over partial products, the result will, in general, depend on the order of the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Confidence bounds from the Kaplan-Wald test\n", "\n", "To find a lower confidence bound, we can invert hypothesis tests: the lower endpoint of a one-sided confidence bound for $\\mu$ is the largest value \n", "of $t$ for which we would not reject the hypothesis $\\mu = t$ at significance level $\\alpha$.\n", "\n", "For confidence levels above 50%, this lower confidence bound will certainly be between zero and the sample mean. However, for $t=0$, we have a divide-by-zero issue. Hence, for numerical implementation, we search the interval $[\\epsilon, \\bar{X}]$ for the smallest $t$ for which we can reject the hypothesis $\\mu = t$ at significance level $\\alpha$. If that smallest $t$ is $\\epsilon$, we set the confidence bound to zero. \n", "\n", "The following code implements this idea, working with the log of the test statistic to improve numerical stability." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# This is the first cell with code: set up the Python environment\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import math\n", "\n", "import numpy as np\n", "from numpy.polynomial import polynomial as P\n", "\n", "import scipy as sp\n", "import scipy.stats\n", "from scipy.stats import binom\n", "from scipy.optimize import brentq\n", "\n", "import pandas as pd\n", "\n", "from ipywidgets import interact, interactive, fixed\n", "import ipywidgets as widgets\n", "from IPython.display import clear_output, display, HTML" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def kaplanWaldLowerCI(x, cl = 0.95, gamma = 0.99, xtol=1.e-12, logf=True):\n", " '''\n", " Calculates the Kaplan-Wald lower 1-alpha confidence bound for the mean of a nonnegative random\n", " variable.\n", " '''\n", " alpha = 1.0-cl\n", " if any(x < 0):\n", " raise ValueError('Data x must be nonnegative.')\n", " elif all(x <= 0):\n", " lo = 0.0\n", " else:\n", " if logf:\n", " f = lambda t: (np.max(np.cumsum(np.log(gamma*x/t + 1 - gamma))) + np.log(alpha))\n", " else:\n", " f = lambda t: (np.max(np.cumprod(gamma*x/t + 1 - gamma)) - 1/alpha)\n", " xm = np.mean(x)\n", " if f(xtol)*f(xm) > 0.0:\n", " lo = 0.0\n", " else:\n", " lo = sp.optimize.brentq(f, xtol, np.mean(x), xtol=xtol) \n", " return lo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How well does it work?\n", "Let's test the method on our standard test problems: binomial and a mixture of pointmass and uniform. We will fix $\\gamma$; you might experiment using different values." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | mass at 1 | \n", "sample size | \n", "Student-t cov | \n", "Kaplan-Wald cov | \n", "Student-t low | \n", "Kaplan-Wald low | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "0.900 | \n", "25 | \n", "89.99% | \n", "100.0% | \n", "0.6842 | \n", "0.8203 | \n", "
| 1 | \n", "0.900 | \n", "50 | \n", "98.81% | \n", "100.0% | \n", "0.6702 | \n", "0.8708 | \n", "
| 2 | \n", "0.900 | \n", "100 | \n", "99.98% | \n", "97.59% | \n", "0.6641 | \n", "0.8961 | \n", "
| 3 | \n", "0.900 | \n", "400 | \n", "100.0% | \n", "96.77% | \n", "0.6612 | \n", "0.8988 | \n", "
| 4 | \n", "0.990 | \n", "25 | \n", "21.91% | \n", "100.0% | \n", "0.9535 | \n", "0.8792 | \n", "
| 5 | \n", "0.990 | \n", "50 | \n", "39.32% | \n", "100.0% | \n", "0.9413 | \n", "0.9341 | \n", "
| 6 | \n", "0.990 | \n", "100 | \n", "61.75% | \n", "100.0% | \n", "0.9272 | \n", "0.9627 | \n", "
| 7 | \n", "0.990 | \n", "400 | \n", "97.78% | \n", "100.0% | \n", "0.9069 | \n", "0.9847 | \n", "
| 8 | \n", "0.999 | \n", "25 | \n", "2.48% | \n", "100.0% | \n", "0.9952 | \n", "0.8854 | \n", "
| 9 | \n", "0.999 | \n", "50 | \n", "4.69% | \n", "100.0% | \n", "0.9937 | \n", "0.9406 | \n", "
| 10 | \n", "0.999 | \n", "100 | \n", "9.15% | \n", "100.0% | \n", "0.9915 | \n", "0.9695 | \n", "
| 11 | \n", "0.999 | \n", "400 | \n", "32.84% | \n", "100.0% | \n", "0.9845 | \n", "0.9917 | \n", "