{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Kaplan-Wald Confidence Bound for a Nonnegative Mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section presents an approach to finding lower confidence bounds for the mean of a nonnegative random variable described in H.M. Kaplan's website, http://printmacroj.com/martMean.htm. That work fleshes out an idea due to Wald (1945, 2004), and is closely related to a technique presented in Kaplan (1987).\n",
    "For another derivation, see Stark & Teague (2014) https://www.usenix.org/system/files/jets/issues/0301/overview/jets_0301-stark.pdf\n",
    "\n",
    "\n",
    "We have an iid sequence of random variables $X_1, X_2, \\ldots$ such that $\\mathbb P \\{X_j \\ge 0 \\} = 1$. Let $F$ be their common distribution function. We seek a lower confidence bound for their common expectation $\\mu \\equiv \\mathbb E X_1 = \\int_0^\\infty x dF$. \n",
    "\n",
    "Under the hypothesis that $\\mu = t$, \n",
    "\n",
    "$$\n",
    "   t^{-1} \\mu =  t^{-1} \\int_0^\\infty  xdF(x) = \\int_0^\\infty xt^{-1} dF(x) = 1.\n",
    "$$\n",
    "Fix $\\gamma \\in [0, 1]$.\n",
    "Because $\\int_0^\\infty dF = 1$, it follows that if $\\mu  = t$,\n",
    "\n",
    "$$  \n",
    "    \\mathbb E \\left ( (\\gamma/t) X_j + (1-\\gamma) \\right ) = (\\gamma/t) \\mathbb E X_j + (1-\\gamma) =  \n",
    "     (\\gamma/t)t + (1-\\gamma) = 1.\n",
    "$$\n",
    "Now,\n",
    "$$\n",
    "   \\mathbb E \\left ((\\gamma/t) X_j + (1-\\gamma) \\right ) \\equiv \n",
    "     \\int_0^\\infty \\left (x \\gamma/t + (1-\\gamma) \\right ) dF(x).\n",
    "$$\n",
    "Since for $x \\ge 0$, $(x \\gamma/t + (1-\\gamma)) \\ge 0$, it follows that if we define\n",
    "\n",
    "$$\n",
    "   dG \\equiv (x \\gamma/t + (1-\\gamma))dF\n",
    "$$\n",
    "\n",
    "then $G$ is the cdf of a nonnegative random variable.\n",
    "\n",
    "Let $Y$ be a random variable with cdf $G$.\n",
    "By Jensen's inequality, $\\mathbb E X_j^2 \\ge (\\mathbb E X_j)^2 = t \\cdot \\mathbb E X_j$ (by hypothesis).\n",
    "Since $\\mathbb E X_j = t \\ge 0$,\n",
    "\n",
    "\\begin{align}\n",
    "    \\mathbb E Y  &= \\int_0^\\infty x dG(x) \\\\\n",
    "                 &= \\int_0^\\infty x (x \\gamma/t + (1-\\gamma)) dF(x) \\\\\n",
    "                 &= \\gamma/t \\int_0^\\infty x^2 dF(x) + (1-\\gamma) \\int_0^\\infty x dF(x) \\\\\n",
    "                 &= \\gamma/t \\cdot \\mathbb E X_j^2 + (1-\\gamma) \\cdot \\mathbb E X_j \\\\\n",
    "                 &\\ge \\gamma \\cdot \\mathbb E X_j + (1-\\gamma) \\cdot \\mathbb E X_j  = \\mathbb E X_j.\n",
    "\\end{align}\n",
    "(The penultimate step uses Jensen's inequality.)\n",
    "\n",
    "If the data allow us to reject the hypothesis $H_0$ that $\\{ X_j\\}$ are IID with cdf $F$\n",
    "in favor of the alternative hypothesis $H_1$ that they are iid with cdf $G$,\n",
    "we have strong statistical evidence that $\\mu = \\mathbb E X_j > t$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using the SPRT to test $H_1$ versus $H_0$\n",
    "\n",
    "Now a bit of magic occurs.  For a given observation $X_j = x_j$, what is the likelihood ratio of $H_1$ to $H_0$?\n",
    "\n",
    "$$\n",
    "    \\mbox{LR} = \\frac{dG(x_j)}{dF(x_j)} = \\frac{(x_j\\gamma/t+(1−\\gamma))dF(x_j)}{dF(x_j)} = (x_j\\gamma/t+(1−\\gamma)).\n",
    "$$\n",
    "This doesn't depend on $F$ or $G$!\n",
    "\n",
    "The $\\mbox{LR}$ for observations $X_1, \\ldots, X_m$ is\n",
    "$$\n",
    "      \\mbox{LR} = \\Pi_{i=1}^m \\left ( (\\gamma/t) X_i + 1 - \\gamma \\right ).\n",
    "$$\n",
    "This expression shows why $\\gamma$ was introduced in the first place:\n",
    "for $\\gamma = 1$, if even a single observation turned out to be zero, $\\mbox{LR}$ would forever be\n",
    "zero no matter how large subsequent observations turned out to be.\n",
    "Taking $\\gamma < 1$ hedges against that possibility.\n",
    "Any value of $\\gamma \\in [0, 1]$ gives a conservative test, but smaller values provide more \"protection\"\n",
    "against small values of $X_j$.\n",
    "\n",
    "Recall that the $\\mbox{LR}$ is the $P$-value of $H_0: \\mu = t$ based on the observations $\\{X_j\\}_{j=1}^m$.\n",
    "We will use this to find a lower confidence bound for $\\mu$.\n",
    "\n",
    "### \"Lookahead\" and the SPRT\n",
    "There's more: recall that the SPRT says the chance that $\\mbox{LR}$ _ever_ gets larger than $1/\\alpha$ is at most $\\alpha$ if $H_0$ is true.\n",
    "That means that we can use the observations sequentially, maximizing over the partial products.\n",
    "If any of the partial products exceeds $1/\\alpha$, we can reject $H_0$.\n",
    "\n",
    "That is, our level-$\\alpha$ test based on a sample of size $n$ is\n",
    "$$\n",
    "    \\mbox{ Reject } H_0 \\mbox{ if } \\max_{m=1}^n \\Pi_{i=1}^m \\left [ \\gamma X_i/t + 1 - \\gamma \\right ] \\ge 1/\\alpha.\n",
    "$$\n",
    "\n",
    "It is _only_ legitimate to do this maximization if the data are in random order.  For instance, it's impermissible to sort them from largest to smallest. And if you maximize over partial products, the result will, in general, depend on the order of the data.\n",
    "\n",
    "It might be hard to explain to a court or consulting client that your confidence bound depends on the order the data happened to arrive in. If that's a concern, you can always just use all $n$ terms in the product and not maximize over $1 \\le m \\le n$.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Confidence bounds from the Kaplan-Wald test\n",
    "\n",
    "To find a lower confidence bound, we can invert hypothesis tests: the lower endpoint of a one-sided confidence bound for $\\mu$ is the largest value \n",
    "of $t$ for which we would not reject the hypothesis $\\mu = t$ at significance level $\\alpha$.\n",
    "\n",
    "For confidence levels above 50%, this lower confidence bound will certainly be between zero and the sample mean. However, for $t=0$, we have a divide-by-zero issue. Hence, for numerical implementation, we search the interval $[\\epsilon, \\bar{X}]$ for the smallest $t$ for which we can reject the hypothesis $\\mu = t$ at significance level $\\alpha$. If that smallest $t$ is $\\epsilon$, we set the confidence bound to zero. \n",
    "\n",
    "The following code implements this idea, working with the log of the test statistic to improve numerical stability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# This is the first cell with code: set up the Python environment\n",
    "%matplotlib inline\n",
    "from __future__ import print_function, division\n",
    "import matplotlib.pyplot as plt\n",
    "import math\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "import scipy.stats\n",
    "from scipy.stats import binom\n",
    "import scipy.optimize\n",
    "import pandas as pd\n",
    "from ipywidgets import interact, interactive, fixed\n",
    "import ipywidgets as widgets\n",
    "from IPython.display import clear_output, display, HTML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def kaplanWaldLowerCI(x, cl = 0.95, gamma = 0.99, xtol=1.e-12, logf=True):\n",
    "    '''\n",
    "       Calculates the Kaplan-Wald lower 1-alpha confidence bound for the mean of a nonnegative random\n",
    "       variable.\n",
    "    '''\n",
    "    alpha = 1.0-cl\n",
    "    if any(x < 0):\n",
    "        raise ValueError('Data x must be nonnegative.')\n",
    "    elif all(x <= 0):\n",
    "        lo = 0.0\n",
    "    else:\n",
    "        if logf:\n",
    "            f = lambda t: (np.max(np.cumsum(np.log(gamma*x/t + 1 - gamma))) + np.log(alpha))\n",
    "        else:\n",
    "            f = lambda t: (np.max(np.cumprod(gamma*x/t + 1 - gamma)) - 1/alpha)\n",
    "        xm = np.mean(x)\n",
    "        if f(xtol)*f(xm) > 0.0:\n",
    "            lo = 0.0\n",
    "        else:\n",
    "            lo = sp.optimize.brentq(f, xtol, np.mean(x), xtol=xtol) \n",
    "    return lo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How well does it work?\n",
    "Let's test the method on our standard test problems: binomial and a mixture of pointmass and uniform.  We will fix $\\gamma$; you might experiment using different values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h3>Simulated coverage probability and expected lower endpoint of one-sided Student-t and Kaplan-Wald confidence intervals for mixture of U[0,1] and pointmass at 1 population</h3><strong>Nominal coverage probability 95.0%</strong>. Gamma=0.99.<br /><strong>Estimated from 10000 replications.</strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mass at 1</th>\n",
       "      <th>sample size</th>\n",
       "      <th>Student-t cov</th>\n",
       "      <th>Kaplan-Wald cov</th>\n",
       "      <th>Student-t low</th>\n",
       "      <th>Kaplan-Wald low</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.900</td>\n",
       "      <td>25.0</td>\n",
       "      <td>89.64%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.6836</td>\n",
       "      <td>0.82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.900</td>\n",
       "      <td>50.0</td>\n",
       "      <td>98.63%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.6701</td>\n",
       "      <td>0.8713</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.900</td>\n",
       "      <td>100.0</td>\n",
       "      <td>99.96%</td>\n",
       "      <td>97.76%</td>\n",
       "      <td>0.6643</td>\n",
       "      <td>0.8954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.900</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>96.57%</td>\n",
       "      <td>0.6617</td>\n",
       "      <td>0.8951</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.990</td>\n",
       "      <td>25.0</td>\n",
       "      <td>21.83%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9553</td>\n",
       "      <td>0.8794</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.990</td>\n",
       "      <td>50.0</td>\n",
       "      <td>38.78%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9422</td>\n",
       "      <td>0.934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.990</td>\n",
       "      <td>100.0</td>\n",
       "      <td>61.17%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9285</td>\n",
       "      <td>0.9628</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.990</td>\n",
       "      <td>400.0</td>\n",
       "      <td>97.61%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9069</td>\n",
       "      <td>0.9847</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.999</td>\n",
       "      <td>25.0</td>\n",
       "      <td>2.62%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.995</td>\n",
       "      <td>0.8853</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.999</td>\n",
       "      <td>50.0</td>\n",
       "      <td>4.62%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9939</td>\n",
       "      <td>0.9406</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.999</td>\n",
       "      <td>100.0</td>\n",
       "      <td>9.65%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9913</td>\n",
       "      <td>0.9695</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.999</td>\n",
       "      <td>400.0</td>\n",
       "      <td>32.36%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9847</td>\n",
       "      <td>0.9917</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    mass at 1  sample size Student-t cov Kaplan-Wald cov Student-t low  \\\n",
       "0       0.900         25.0        89.64%          100.0%        0.6836   \n",
       "1       0.900         50.0        98.63%          100.0%        0.6701   \n",
       "2       0.900        100.0        99.96%          97.76%        0.6643   \n",
       "3       0.900        400.0        100.0%          96.57%        0.6617   \n",
       "4       0.990         25.0        21.83%          100.0%        0.9553   \n",
       "5       0.990         50.0        38.78%          100.0%        0.9422   \n",
       "6       0.990        100.0        61.17%          100.0%        0.9285   \n",
       "7       0.990        400.0        97.61%          100.0%        0.9069   \n",
       "8       0.999         25.0         2.62%          100.0%         0.995   \n",
       "9       0.999         50.0         4.62%          100.0%        0.9939   \n",
       "10      0.999        100.0         9.65%          100.0%        0.9913   \n",
       "11      0.999        400.0        32.36%          100.0%        0.9847   \n",
       "\n",
       "   Kaplan-Wald low  \n",
       "0             0.82  \n",
       "1           0.8713  \n",
       "2           0.8954  \n",
       "3           0.8951  \n",
       "4           0.8794  \n",
       "5            0.934  \n",
       "6           0.9628  \n",
       "7           0.9847  \n",
       "8           0.8853  \n",
       "9           0.9406  \n",
       "10          0.9695  \n",
       "11          0.9917  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Nonstandard mixture: a pointmass at 1 and a uniform[0,1]\n",
    "ns = np.array([25, 50, 100, 400])  # sample sizes\n",
    "ps = np.array([0.9, 0.99, 0.999])    # mixture fraction, weight of pointmass\n",
    "alpha = 0.05  # 1- (confidence level)\n",
    "reps = int(1.0e4)   # just for demonstration\n",
    "gamma = 0.99\n",
    "xtol = 1.e-12\n",
    "\n",
    "simTable = pd.DataFrame(columns=('mass at 1', 'sample size', 'Student-t cov',\\\n",
    "                                 'Kaplan-Wald cov', 'Student-t low', 'Kaplan-Wald low')\n",
    "                       )\n",
    "\n",
    "for p in ps:\n",
    "    popMean = (1-p)*0.5 + p  # population is a mixture of uniform with mass (1-p) and\n",
    "                             # pointmass at 1 with mass p\n",
    "    \n",
    "    for n in ns:\n",
    "        tCrit = sp.stats.t.ppf(q=1-alpha, df=n-1)\n",
    "        coverT = 0\n",
    "        coverK = 0\n",
    "        lowT = 0.0\n",
    "        lowK = 0.0\n",
    "        \n",
    "        for rep in range(int(reps)):\n",
    "            sam = np.random.uniform(size=n)  # the uniform part of the sample\n",
    "            ptMass = np.random.uniform(size=n)  # for a bunch of p-coin tosses\n",
    "            sam[ptMass < p] = 1.0   # mix in pointmass at 1, with probability p\n",
    "            # t interval\n",
    "            samMean = np.mean(sam)\n",
    "            samSD = np.std(sam, ddof=1)\n",
    "            tLo = samMean - tCrit*samSD  # lower endpoint of the t interval\n",
    "            coverT += ( tLo <= popMean )\n",
    "            lowT += tLo\n",
    "            #  Kaplan-Wald interval\n",
    "            kLo = kaplanWaldLowerCI(sam, cl=1-alpha, gamma=gamma, xtol=xtol) # lower endpoint of the Kaplan-Wald interval\n",
    "            coverK += (kLo <= popMean )\n",
    "            lowK += kLo\n",
    "\n",
    "        simTable.loc[len(simTable)] =  p, n,\\\n",
    "            str(100*float(coverT)/float(reps)) + '%',\\\n",
    "            str(100*float(coverK)/float(reps)) + '%',\\\n",
    "            str(round(lowT/float(reps),4)),\\\n",
    "            str(round(lowK/float(reps),4))\n",
    "#\n",
    "ansStr =  '<h3>Simulated coverage probability and expected lower endpoint of ' +\\\n",
    "          'one-sided Student-t and Kaplan-Wald confidence intervals for ' +\\\n",
    "          'mixture of U[0,1] and pointmass at 1 population</h3>' +\\\n",
    "          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\\\n",
    "          '%</strong>. Gamma=' + str(gamma) +\\\n",
    "          '.<br /><strong>Estimated from ' + str(int(reps)) +\\\n",
    "          ' replications.</strong>'\n",
    "\n",
    "display(HTML(ansStr))\n",
    "display(simTable)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looks pretty good!  Next we will introduce one more method, then we'll compare the various methods we've seen."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upper confidence bounds and two-sided bounds\n",
    "\n",
    "If every $X_j$ has the same finite, a priori upper bound $u$, we can transform the problem by defining $\\tilde{X_j}= u - X_j$.  Then $\\tilde{X_j}$ is a nonnegative random variable, and a lower confidence bound on its mean translated to can be subtracted from $u$ to make an upper confidence bound on $\\mathbb E X_j$.\n",
    "\n",
    "If every $X_j$ has the finite, a priori upper and lower bound, we can find two-sided confidence intervals in the analogous way, applying the Kaplan Wald method to the original data to find lower confidence bounds and to the data subtracted from the a priori upper bound to find upper confidence bounds."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## More\n",
    "Harold Kaplan has introduced a variety of other methods based on Markov's inequality applied to optionally stopped Martingales; see Kaplan (1987). In my experience, this method is about as good as the others, but the results are really neat."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's next?\n",
    "\n",
    "+ [Next: Dollar-unit sampling and taint](dus.ipynb)\n",
    "+ [Previous: Wald's Sequential Probability Ratio Test](sprt.ipynb)\n",
    "+ [Index](index.ipynb)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}