{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Nonparametric Method Shootout\n",
    "\n",
    "I hope you're convinced that Student-t intervals don't necessarily have true coverage levels close to their nominal coverage levels, even for large sample sizes.\n",
    "\n",
    "Moreover, there are a variety of conservative nonparametric methods that can be used for populations with one-sided or two-sided bounds to produce one-sided or two-sided confidence intervals guaranteed to have coverage probabilities at least as large as their nominal confidence level.\n",
    "\n",
    "Which is best?\n",
    "\n",
    "If the population really consists of only two values, it is impossible to improve on exact Binomial intervals for samples drawn with replacement or Hypergeometric intervals for sampling without replacement (for one-sided bounds; for two-sided bounds, there is no unique \"best\" choice).\n",
    "\n",
    "For more general populations, your mileage may vary.\n",
    "\n",
    "Let's do some experiments to compare them. None is best in every situation. Relative performance depends on the population distribution and on sample sizes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Continuous Penny Sampling, Kaplan-Wald, Hoeffding, and MDKW\n",
    "Let's compare the principal methods we've developed, using simulations from a broader variety of populations. We will skip the thresholded binomial, Chebychev's inequality, and Markov's inequality: they are dominated by other methods.\n",
    "\n",
    "Some of the methods (Hoeffding, Penny Sampling) require upper and lower population bounds. When they are applicable, we might expect them to do better than methods that require only one-sided population bounds (MDKW, Kaplan-Wald), since they use more information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the first cell with code: set up the Python environment\n",
    "%matplotlib inline\n",
    "from __future__ import division\n",
    "import matplotlib.pyplot as plt\n",
    "import math\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "import scipy.stats\n",
    "from scipy.stats import binom\n",
    "import scipy.optimize\n",
    "import pandas as pd\n",
    "from ipywidgets import interact, interactive, fixed\n",
    "import ipywidgets as widgets\n",
    "from IPython.display import clear_output, display, HTML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def binoLowerCL(n, x, cl = 0.975, inc=0.000001, p = None):\n",
    "    \"Lower confidence level cl confidence interval for Binomial p, for x successes in n trials\"\n",
    "    if p is None:\n",
    "            p = float(x)/float(n)\n",
    "    lo = 0.0\n",
    "    if (x > 0):\n",
    "            f = lambda q: cl - scipy.stats.binom.cdf(x-1, n, q)\n",
    "            lo = sp.optimize.brentq(f, 0.0, p, xtol=inc)\n",
    "    return lo\n",
    "\n",
    "def binoUpperCL(n, x, cl = 0.975, inc=0.000001, p = None):\n",
    "    \"Upper confidence level cl confidence interval for Binomial p, for x successes in n trials\"\n",
    "    if p is None:\n",
    "            p = float(x)/float(n)\n",
    "    hi = 1.0\n",
    "    if (x < n):\n",
    "            f = lambda q: scipy.stats.binom.cdf(x, n, q) - (1-cl)\n",
    "            hi = sp.optimize.brentq(f, p, 1.0, xtol=inc) \n",
    "    return hi\n",
    "\n",
    "def ecdf(x):\n",
    "    '''\n",
    "       calculates the empirical cdf of data x\n",
    "       returns the unique values of x in ascending order and the cumulative probabity at those values\n",
    "       NOTE: This is not an efficient algorithm: it is O(n^2), where n is the length of x. \n",
    "       A better algorithm would rely on the Collections package or something similar and could work\n",
    "       in O(n log n)\n",
    "    '''\n",
    "    theVals = sorted(np.unique(x))\n",
    "    theProbs = np.array([sum(x <= v) for v in theVals])/float(len(x))\n",
    "    if (theVals[0] > 0.0):\n",
    "        theVals = np.append(0., theVals)\n",
    "        theProbs = np.append(0., theProbs)\n",
    "    return theVals, theProbs    \n",
    "    \n",
    "def ksLowerMean(x, c):\n",
    "    '''\n",
    "       lower confidence bound for the mean of a nonnegative population\n",
    "       x is an iid sample with replacement from the population\n",
    "       c is the Massart constant for the desired coverage\n",
    "    '''\n",
    "    # find the ecdf\n",
    "    vals, probs = ecdf(x)\n",
    "    probs = np.fmin(probs+c, 1)   # This is G^-\n",
    "    gProbs = np.diff(np.append([0.0], probs))  # pre-pend a 0 so that diff does the right thing; \n",
    "                                               # gProbs is the vector of masses\n",
    "    return (vals*gProbs).sum()\n",
    "\n",
    "def kaplanWaldLowerCI(x, cl = 0.95, gamma = 0.99, xtol=1.e-12, logf=True):\n",
    "    '''\n",
    "       Calculates the Kaplan-Wald lower 1-alpha confidence bound for the mean of a nonnegative random\n",
    "       variable.\n",
    "    '''\n",
    "    alpha = 1.0-cl\n",
    "    if any(x < 0):\n",
    "        raise ValueError('Data x must be nonnegative.')\n",
    "    elif all(x <= 0):\n",
    "        lo = 0.0\n",
    "    else:\n",
    "        if logf:\n",
    "            f = lambda t: (np.max(np.cumsum(np.log(gamma*x/t + 1 - gamma))) + np.log(alpha))\n",
    "        else:\n",
    "            f = lambda t: (np.max(np.cumprod(gamma*x/t + 1 - gamma)) - 1/alpha)\n",
    "        xm = np.mean(x)\n",
    "        if f(xtol)*f(xm) > 0.0:\n",
    "            lo = 0.0\n",
    "        else:\n",
    "            lo = sp.optimize.brentq(f, xtol, np.mean(x), xtol=xtol) \n",
    "    return lo\n",
    "\n",
    "def pennySampleReplacement(weights, n):\n",
    "    '''\n",
    "       Weighted random sample of size n drawn with replacement.\n",
    "       Returns indices of the selected items, the \"remainder pennies,\"\n",
    "       and the raw uniform values used to select the sample\n",
    "    '''\n",
    "    if any(weights < 0):\n",
    "        print 'negative weight in weightedRandomSample'\n",
    "        return float('NaN')\n",
    "    else:\n",
    "        totWt = np.sum(weights, dtype=float)\n",
    "        wc = np.cumsum(weights, dtype=float)/totWt  # ensure weights sum to 1\n",
    "        theSam = np.random.random_sample((n,))\n",
    "        inx = np.array(wc).searchsorted(theSam)\n",
    "        penny = [(wc[inx[i]]-theSam[i])*totWt for i in range(n)]\n",
    "        return inx, penny, theSam\n",
    "\n",
    "def pennyBinomialLowerBound(x, inx, pennies, cl=0.95):\n",
    "    '''\n",
    "       Penny sampling lower (one-sided) 1-alpha confidence bound on the mean, for sampling with replacement.\n",
    "       x is the vector of observed values\n",
    "       pennies is the vector of _which_ \"penny\" in each sampled item is to be adjudicated as \"good\" or \"bad\"\n",
    "       The first x_j pennies in item j are deemed \"good,\" the remaining (u_j - x_j) are \"bad.\"\n",
    "       Returns the lower bound and the number of \"good\" pennies in the sample.\n",
    "    '''\n",
    "    s = sum([pennies[i] <= x[inx[i]] for i in range(len(pennies))])\n",
    "    n = len(inx)\n",
    "    return binoLowerCL(n, s, cl=cl), s\n",
    "\n",
    "def pennyBinomialBounds(x, inx, pennies, cl=0.95):\n",
    "    '''\n",
    "       Penny sampling 2-sided confidence interval for the mean, for sampling with replacement.\n",
    "       x is the vector of observed values\n",
    "       pennies is the vector of _which_ \"penny\" in each sampled item is to be adjudicated as \"good\" or \"bad\"\n",
    "       The first x_j pennies in item j are deemed \"good,\" the remaining (u_j - x_j) are \"bad.\"\n",
    "       Returns the lower bound, the upper bound and the number of \"good\" pennies in the sample.\n",
    "    '''\n",
    "    s = sum([pennies[i] <= x[inx[i]] for i in range(len(pennies))])\n",
    "    n = len(inx)\n",
    "    return binoLowerCL(n, s, cl=1-(1-cl)/2), binoUpperCL(n, s, cl=1-(1-cl)/2), s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will compare lower confidence bounds using truncated Hoeffding, MDKW, Kaplan-Wald, and Continuous Penny Sampling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals mixture of U[0,1] and pointmass at 0</h3><strong>Nominal coverage probability 95.0%</strong>. <br /><strong>Estimated from 10000 replications.</strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mass at 0</th>\n",
       "      <th>sample size</th>\n",
       "      <th>Trunc Hoeff cov</th>\n",
       "      <th>MDKW cov</th>\n",
       "      <th>KW cov</th>\n",
       "      <th>Penny cov</th>\n",
       "      <th>Trunc Hoeff low</th>\n",
       "      <th>MDKW low</th>\n",
       "      <th>KW low</th>\n",
       "      <th>Penny low</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.900</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.98%</td>\n",
       "      <td>99.69%</td>\n",
       "      <td>96.74%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0025</td>\n",
       "      <td>0.0093</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.900</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.89%</td>\n",
       "      <td>96.18%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0002</td>\n",
       "      <td>0.0023</td>\n",
       "      <td>0.0143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.900</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.91%</td>\n",
       "      <td>97.14%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0006</td>\n",
       "      <td>0.0022</td>\n",
       "      <td>0.0208</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.900</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.97%</td>\n",
       "      <td>95.3%</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0083</td>\n",
       "      <td>0.0023</td>\n",
       "      <td>0.0336</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.990</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.8%</td>\n",
       "      <td>99.36%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.990</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.85%</td>\n",
       "      <td>97.27%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0004</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.990</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.88%</td>\n",
       "      <td>98.53%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0006</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.990</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.96%</td>\n",
       "      <td>98.53%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.999</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>98.76%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.999</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>97.71%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.999</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.99%</td>\n",
       "      <td>94.98%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.999</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>98.45%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    mass at 0  sample size Trunc Hoeff cov MDKW cov  KW cov Penny cov  \\\n",
       "0       0.900         25.0          100.0%   99.98%  99.69%    96.74%   \n",
       "1       0.900         50.0          100.0%   100.0%  99.89%    96.18%   \n",
       "2       0.900        100.0          100.0%   100.0%  99.91%    97.14%   \n",
       "3       0.900        400.0          100.0%   100.0%  99.97%     95.3%   \n",
       "4       0.990         25.0          100.0%   100.0%   99.8%    99.36%   \n",
       "5       0.990         50.0          100.0%   100.0%  99.85%    97.27%   \n",
       "6       0.990        100.0          100.0%   100.0%  99.88%    98.53%   \n",
       "7       0.990        400.0          100.0%   100.0%  99.96%    98.53%   \n",
       "8       0.999         25.0          100.0%   100.0%  100.0%    98.76%   \n",
       "9       0.999         50.0          100.0%   100.0%  100.0%    97.71%   \n",
       "10      0.999        100.0          100.0%   100.0%  99.99%    94.98%   \n",
       "11      0.999        400.0          100.0%   100.0%  100.0%    98.45%   \n",
       "\n",
       "   Trunc Hoeff low MDKW low  KW low Penny low  \n",
       "0              0.0   0.0001  0.0025    0.0093  \n",
       "1              0.0   0.0002  0.0023    0.0143  \n",
       "2              0.0   0.0006  0.0022    0.0208  \n",
       "3           0.0001   0.0083  0.0023    0.0336  \n",
       "4              0.0      0.0  0.0001    0.0003  \n",
       "5              0.0      0.0     0.0    0.0004  \n",
       "6              0.0      0.0     0.0    0.0006  \n",
       "7              0.0      0.0     0.0    0.0012  \n",
       "8              0.0      0.0     0.0       0.0  \n",
       "9              0.0      0.0     0.0       0.0  \n",
       "10             0.0      0.0     0.0       0.0  \n",
       "11             0.0      0.0     0.0       0.0  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Nonstandard mixture: a pointmass at zero and a uniform[0,1]\n",
    "ns = np.array([25, 50, 100, 400])  # sample sizes\n",
    "ps = np.array([0.9, 0.99, 0.999])    # mixture fraction, weight of pointmass\n",
    "alpha = 0.05  # 1- (confidence level)\n",
    "reps = int(1.0e4) # just for demonstration\n",
    "gamma = 0.99  # tuning constant in Kaplan-Wald\n",
    "xtol = 1.0e-6  # numerical tolerance for Kaplan-Wald\n",
    "\n",
    "cols = ['mass at 0', 'sample size', 'Trunc Hoeff cov', 'MDKW cov', 'KW cov', 'Penny cov',\\\n",
    "        'Trunc Hoeff low', 'MDKW low', 'KW low', 'Penny low']\n",
    "\n",
    "simTable = pd.DataFrame(columns=cols)\n",
    "\n",
    "for p in ps:\n",
    "    popMean = (1-p)*0.5  #  p*0 + (1-p)*.5\n",
    "    for n in ns:\n",
    "        hCrit = np.sqrt(-math.log(alpha/2)/(2*n))  # Hoeffding concentration bound\n",
    "        mCrit = np.sqrt(-np.log(alpha)/(2.0*n))  # the 1-sided MDKW constant\n",
    "        covH = 0\n",
    "        covM = 0\n",
    "        covK = 0\n",
    "        covP = 0\n",
    "        lowH = 0.0\n",
    "        lowM = 0.0\n",
    "        lowK = 0.0\n",
    "        lowP = 0.0\n",
    "\n",
    "        for rep in range(int(reps)):\n",
    "            sam = np.random.uniform(size=n)\n",
    "            ptMass = np.random.uniform(size=n)\n",
    "            pennies = np.random.uniform(size=n)\n",
    "            sam[ptMass < p] = 0.0\n",
    "            samMean = np.mean(sam)\n",
    "            #\n",
    "            hLow = max(samMean - hCrit, 0.0)\n",
    "            covH += (hLow <= popMean)\n",
    "            lowH += hLow\n",
    "            #\n",
    "            mLow = ksLowerMean(sam, mCrit)\n",
    "            covM += (mLow <= popMean)\n",
    "            lowM += mLow\n",
    "            #\n",
    "            kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = 0.99, xtol = xtol)\n",
    "            covK += (kLow <= popMean)\n",
    "            lowK += kLow\n",
    "            #\n",
    "            pLow, s = pennyBinomialLowerBound(sam, np.r_[0:n], pennies, cl=1-alpha)\n",
    "            covP += (pLow <= popMean)\n",
    "            lowP += pLow\n",
    "            \n",
    "        simTable.loc[len(simTable)] =  p, n,\\\n",
    "            str(100*float(covH)/float(reps)) + '%',\\\n",
    "            str(100*float(covM)/float(reps)) + '%',\\\n",
    "            str(100*float(covK)/float(reps)) + '%',\\\n",
    "            str(100*float(covP)/float(reps)) + '%',\\\n",
    "            str(round(lowH/float(reps),4)),\\\n",
    "            str(round(lowM/float(reps),4)),\\\n",
    "            str(round(lowK/float(reps),4)),\\\n",
    "            str(round(lowP/float(reps), 4))\n",
    "#\n",
    "ansStr =  '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\\\n",
    "          'mixture of U[0,1] and pointmass at 0</h3>' +\\\n",
    "          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\\\n",
    "          '%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'\n",
    "\n",
    "display(HTML(ansStr))\n",
    "display(simTable)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Truncated Hoeffding intervals do not appear to be competitive&mdash;despite the fact that they use more information than the Kaplan-Wald interval.  The Kaplan-Wald interval is slightly worse than the continuous penny sampling interval for this population (using this value of $\\gamma$), but KW requires only nonnegativity.\n",
    "\n",
    "Let's look at what happens with a pointmass at 1 instead of 0.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals mixture of U[0,1] and pointmass at 1</h3><strong>Nominal coverage probability 95.0%</strong>. <br /><strong>Estimated from 10000 replications.</strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mass at 1</th>\n",
       "      <th>sample size</th>\n",
       "      <th>trunc Hoeff cov</th>\n",
       "      <th>MDKW cov</th>\n",
       "      <th>KW cov</th>\n",
       "      <th>Penny cov</th>\n",
       "      <th>trunc Hoeff low</th>\n",
       "      <th>MDKW low</th>\n",
       "      <th>KW low</th>\n",
       "      <th>Penny low</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.900</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.6783</td>\n",
       "      <td>0.7052</td>\n",
       "      <td>0.8196</td>\n",
       "      <td>0.8136</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.900</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.7577</td>\n",
       "      <td>0.7767</td>\n",
       "      <td>0.8703</td>\n",
       "      <td>0.8672</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.900</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>97.59%</td>\n",
       "      <td>96.1%</td>\n",
       "      <td>0.8143</td>\n",
       "      <td>0.8277</td>\n",
       "      <td>0.8961</td>\n",
       "      <td>0.8989</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.900</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>96.77%</td>\n",
       "      <td>96.45%</td>\n",
       "      <td>0.8822</td>\n",
       "      <td>0.8889</td>\n",
       "      <td>0.896</td>\n",
       "      <td>0.9284</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.990</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.7234</td>\n",
       "      <td>0.7502</td>\n",
       "      <td>0.8793</td>\n",
       "      <td>0.8792</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.990</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.803</td>\n",
       "      <td>0.822</td>\n",
       "      <td>0.9342</td>\n",
       "      <td>0.934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.990</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.8591</td>\n",
       "      <td>0.8725</td>\n",
       "      <td>0.9626</td>\n",
       "      <td>0.9621</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.990</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9271</td>\n",
       "      <td>0.9338</td>\n",
       "      <td>0.9848</td>\n",
       "      <td>0.9846</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.999</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.7279</td>\n",
       "      <td>0.7547</td>\n",
       "      <td>0.8854</td>\n",
       "      <td>0.8863</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.999</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.8075</td>\n",
       "      <td>0.8264</td>\n",
       "      <td>0.9406</td>\n",
       "      <td>0.941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.999</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.8637</td>\n",
       "      <td>0.8771</td>\n",
       "      <td>0.9695</td>\n",
       "      <td>0.9697</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.999</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.9316</td>\n",
       "      <td>0.9383</td>\n",
       "      <td>0.9917</td>\n",
       "      <td>0.9917</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    mass at 1  sample size trunc Hoeff cov MDKW cov  KW cov Penny cov  \\\n",
       "0       0.900         25.0          100.0%   100.0%  100.0%    100.0%   \n",
       "1       0.900         50.0          100.0%   100.0%  100.0%    100.0%   \n",
       "2       0.900        100.0          100.0%   100.0%  97.59%     96.1%   \n",
       "3       0.900        400.0          100.0%   100.0%  96.77%    96.45%   \n",
       "4       0.990         25.0          100.0%   100.0%  100.0%    100.0%   \n",
       "5       0.990         50.0          100.0%   100.0%  100.0%    100.0%   \n",
       "6       0.990        100.0          100.0%   100.0%  100.0%    100.0%   \n",
       "7       0.990        400.0          100.0%   100.0%  100.0%    100.0%   \n",
       "8       0.999         25.0          100.0%   100.0%  100.0%    100.0%   \n",
       "9       0.999         50.0          100.0%   100.0%  100.0%    100.0%   \n",
       "10      0.999        100.0          100.0%   100.0%  100.0%    100.0%   \n",
       "11      0.999        400.0          100.0%   100.0%  100.0%    100.0%   \n",
       "\n",
       "   trunc Hoeff low MDKW low  KW low Penny low  \n",
       "0           0.6783   0.7052  0.8196    0.8136  \n",
       "1           0.7577   0.7767  0.8703    0.8672  \n",
       "2           0.8143   0.8277  0.8961    0.8989  \n",
       "3           0.8822   0.8889   0.896    0.9284  \n",
       "4           0.7234   0.7502  0.8793    0.8792  \n",
       "5            0.803    0.822  0.9342     0.934  \n",
       "6           0.8591   0.8725  0.9626    0.9621  \n",
       "7           0.9271   0.9338  0.9848    0.9846  \n",
       "8           0.7279   0.7547  0.8854    0.8863  \n",
       "9           0.8075   0.8264  0.9406     0.941  \n",
       "10          0.8637   0.8771  0.9695    0.9697  \n",
       "11          0.9316   0.9383  0.9917    0.9917  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Nonstandard mixture: a pointmass at 1 and a uniform[0,1]\n",
    "ns = np.array([25, 50, 100, 400])  # sample sizes\n",
    "ps = np.array([0.9, 0.99, 0.999])    # mixture fraction, weight of pointmass\n",
    "alpha = 0.05  # 1- (confidence level)\n",
    "reps = int(1.0e4) # just for demonstration\n",
    "gamma = 0.99  # tuning constant in Kaplan-Wald\n",
    "xtol = 1.0e-12\n",
    "\n",
    "cols = ['mass at 1', 'sample size', 'trunc Hoeff cov', 'MDKW cov', 'KW cov', 'Penny cov',\\\n",
    "        'trunc Hoeff low', 'MDKW low', 'KW low', 'Penny low']\n",
    "\n",
    "simTable = pd.DataFrame(columns=cols)\n",
    "\n",
    "for p in ps:\n",
    "    popMean = (1-p)*0.5 + p\n",
    "    for n in ns:\n",
    "        hCrit = np.sqrt(-math.log(alpha/2)/(2*n))  # Hoeffding concentration bound\n",
    "        mCrit = np.sqrt(-np.log(alpha)/(2.0*n))  # the 1-sided MDKW constant\n",
    "        covH = 0\n",
    "        covM = 0\n",
    "        covK = 0\n",
    "        covP = 0\n",
    "        lowH = 0.0\n",
    "        lowM = 0.0\n",
    "        lowK = 0.0\n",
    "        lowP = 0.0\n",
    "\n",
    "        for rep in range(int(reps)):\n",
    "            sam = np.random.uniform(size=n)\n",
    "            ptMass = np.random.uniform(size=n)\n",
    "            pennies = np.random.uniform(size=n)\n",
    "            sam[ptMass < p] = 1.0\n",
    "            samMean = np.mean(sam)\n",
    "            #\n",
    "            hLow = max(samMean - hCrit, 0.0)\n",
    "            covH += (hLow <= popMean)\n",
    "            lowH += hLow\n",
    "            #\n",
    "            mLow = ksLowerMean(sam, mCrit)\n",
    "            covM += (mLow <= popMean)\n",
    "            lowM += mLow\n",
    "            #\n",
    "            kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = gamma, xtol = xtol)\n",
    "            covK += (kLow <= popMean)\n",
    "            lowK += kLow\n",
    "            #\n",
    "            pLow, s = pennyBinomialLowerBound(sam, np.r_[0:n], pennies, cl=1-alpha)\n",
    "            covP += (pLow <= popMean)\n",
    "            lowP += pLow\n",
    "            \n",
    "        simTable.loc[len(simTable)] =  p, n,\\\n",
    "            str(100*float(covH)/float(reps)) + '%',\\\n",
    "            str(100*float(covM)/float(reps)) + '%',\\\n",
    "            str(100*float(covK)/float(reps)) + '%',\\\n",
    "            str(100*float(covP)/float(reps)) + '%',\\\n",
    "            str(round(lowH/float(reps),4)),\\\n",
    "            str(round(lowM/float(reps),4)),\\\n",
    "            str(round(lowK/float(reps),4)),\\\n",
    "            str(round(lowP/float(reps),4))\n",
    "#\n",
    "ansStr =  '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\\\n",
    "          'mixture of U[0,1] and pointmass at 1</h3>' +\\\n",
    "          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\\\n",
    "          '%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'\n",
    "\n",
    "display(HTML(ansStr))\n",
    "display(simTable)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here again, the Kaplan-Wald method performs essentially the same as Continuous Penny Sampling (with $\\gamma = 0.99$), even though KW only requires nonnegativity, and Continuous Penny Sampling requires an upper bound on the population as well.\n",
    "\n",
    "Let's see what happens as $\\gamma$ varies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals mixture of U[0,1] and pointmass at 0</h3><strong>Nominal coverage probability 95.0%</strong>. <br /><strong>Estimated from 10000 replications.</strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mass at 0</th>\n",
       "      <th>sample size</th>\n",
       "      <th>KW cov 0.01</th>\n",
       "      <th>KW cov 0.1</th>\n",
       "      <th>KW cov 0.5</th>\n",
       "      <th>KW cov 0.9</th>\n",
       "      <th>KW cov 0.999</th>\n",
       "      <th>KW low 0.01</th>\n",
       "      <th>KW low 0.1</th>\n",
       "      <th>KW low 0.5</th>\n",
       "      <th>KW low 0.9</th>\n",
       "      <th>KW low 0.999</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.900</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>99.24%</td>\n",
       "      <td>97.77%</td>\n",
       "      <td>99.3%</td>\n",
       "      <td>99.76%</td>\n",
       "      <td>0.002</td>\n",
       "      <td>0.0101</td>\n",
       "      <td>0.0072</td>\n",
       "      <td>0.0033</td>\n",
       "      <td>0.0023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.900</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>98.39%</td>\n",
       "      <td>98.67%</td>\n",
       "      <td>99.57%</td>\n",
       "      <td>99.84%</td>\n",
       "      <td>0.0048</td>\n",
       "      <td>0.0163</td>\n",
       "      <td>0.0069</td>\n",
       "      <td>0.0028</td>\n",
       "      <td>0.0021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.900</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>97.08%</td>\n",
       "      <td>98.94%</td>\n",
       "      <td>99.79%</td>\n",
       "      <td>99.91%</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.0225</td>\n",
       "      <td>0.0069</td>\n",
       "      <td>0.0027</td>\n",
       "      <td>0.002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.900</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>97.43%</td>\n",
       "      <td>99.49%</td>\n",
       "      <td>99.86%</td>\n",
       "      <td>99.98%</td>\n",
       "      <td>0.0263</td>\n",
       "      <td>0.0289</td>\n",
       "      <td>0.0067</td>\n",
       "      <td>0.0029</td>\n",
       "      <td>0.0022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.990</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>98.79%</td>\n",
       "      <td>98.32%</td>\n",
       "      <td>99.84%</td>\n",
       "      <td>99.87%</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0003</td>\n",
       "      <td>0.0003</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.990</td>\n",
       "      <td>50.0</td>\n",
       "      <td>99.99%</td>\n",
       "      <td>98.25%</td>\n",
       "      <td>98.76%</td>\n",
       "      <td>99.92%</td>\n",
       "      <td>99.91%</td>\n",
       "      <td>0.0002</td>\n",
       "      <td>0.0004</td>\n",
       "      <td>0.0002</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.990</td>\n",
       "      <td>100.0</td>\n",
       "      <td>99.9%</td>\n",
       "      <td>98.06%</td>\n",
       "      <td>99.32%</td>\n",
       "      <td>99.89%</td>\n",
       "      <td>99.92%</td>\n",
       "      <td>0.0004</td>\n",
       "      <td>0.0005</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.990</td>\n",
       "      <td>400.0</td>\n",
       "      <td>98.51%</td>\n",
       "      <td>98.92%</td>\n",
       "      <td>99.56%</td>\n",
       "      <td>99.94%</td>\n",
       "      <td>99.98%</td>\n",
       "      <td>0.0014</td>\n",
       "      <td>0.0005</td>\n",
       "      <td>0.0001</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.999</td>\n",
       "      <td>25.0</td>\n",
       "      <td>99.92%</td>\n",
       "      <td>98.64%</td>\n",
       "      <td>99.47%</td>\n",
       "      <td>99.86%</td>\n",
       "      <td>99.99%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.999</td>\n",
       "      <td>50.0</td>\n",
       "      <td>99.91%</td>\n",
       "      <td>98.48%</td>\n",
       "      <td>99.63%</td>\n",
       "      <td>99.84%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.999</td>\n",
       "      <td>100.0</td>\n",
       "      <td>99.61%</td>\n",
       "      <td>98.5%</td>\n",
       "      <td>99.74%</td>\n",
       "      <td>99.92%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.999</td>\n",
       "      <td>400.0</td>\n",
       "      <td>98.4%</td>\n",
       "      <td>99.09%</td>\n",
       "      <td>99.89%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    mass at 0  sample size KW cov 0.01 KW cov 0.1 KW cov 0.5 KW cov 0.9  \\\n",
       "0       0.900         25.0      100.0%     99.24%     97.77%      99.3%   \n",
       "1       0.900         50.0      100.0%     98.39%     98.67%     99.57%   \n",
       "2       0.900        100.0      100.0%     97.08%     98.94%     99.79%   \n",
       "3       0.900        400.0      100.0%     97.43%     99.49%     99.86%   \n",
       "4       0.990         25.0      100.0%     98.79%     98.32%     99.84%   \n",
       "5       0.990         50.0      99.99%     98.25%     98.76%     99.92%   \n",
       "6       0.990        100.0       99.9%     98.06%     99.32%     99.89%   \n",
       "7       0.990        400.0      98.51%     98.92%     99.56%     99.94%   \n",
       "8       0.999         25.0      99.92%     98.64%     99.47%     99.86%   \n",
       "9       0.999         50.0      99.91%     98.48%     99.63%     99.84%   \n",
       "10      0.999        100.0      99.61%      98.5%     99.74%     99.92%   \n",
       "11      0.999        400.0       98.4%     99.09%     99.89%     100.0%   \n",
       "\n",
       "   KW cov 0.999 KW low 0.01 KW low 0.1 KW low 0.5 KW low 0.9 KW low 0.999  \n",
       "0        99.76%       0.002     0.0101     0.0072     0.0033       0.0023  \n",
       "1        99.84%      0.0048     0.0163     0.0069     0.0028       0.0021  \n",
       "2        99.91%        0.01     0.0225     0.0069     0.0027        0.002  \n",
       "3        99.98%      0.0263     0.0289     0.0067     0.0029       0.0022  \n",
       "4        99.87%      0.0001     0.0003     0.0003     0.0001          0.0  \n",
       "5        99.91%      0.0002     0.0004     0.0002        0.0          0.0  \n",
       "6        99.92%      0.0004     0.0005     0.0001        0.0          0.0  \n",
       "7        99.98%      0.0014     0.0005     0.0001        0.0          0.0  \n",
       "8        99.99%         0.0        0.0        0.0        0.0          0.0  \n",
       "9        100.0%         0.0        0.0        0.0        0.0          0.0  \n",
       "10       100.0%         0.0        0.0        0.0        0.0          0.0  \n",
       "11       100.0%         0.0        0.0        0.0        0.0          0.0  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Nonstandard mixture: a pointmass at 0 and a uniform[0,1]\n",
    "ns = np.array([25, 50, 100, 400])  # sample sizes\n",
    "ps = np.array([0.9, 0.99, 0.999])    # mixture fraction, weight of pointmass\n",
    "alpha = 0.05  # 1- (confidence level)\n",
    "reps = int(1.0e4) # just for demonstration\n",
    "gamma = np.array([0.01, 0.1, 0.5, 0.9, 0.999])  # tuning constant in Kaplan-Wald\n",
    "xtol = 1.0e-12\n",
    "\n",
    "cols = ['mass at 0', 'sample size']\n",
    "cols.extend(['KW cov ' + str(g) for g in gamma])\n",
    "cols.extend(['KW low ' + str(g) for g in gamma])\n",
    "\n",
    "\n",
    "simTable = pd.DataFrame(columns=cols)\n",
    "\n",
    "for p in ps:\n",
    "    popMean = (1-p)*0.5\n",
    "    for n in ns:\n",
    "        covK = np.zeros(len(gamma))\n",
    "        lowK = np.zeros(len(gamma))\n",
    "\n",
    "        for rep in range(int(reps)):\n",
    "            sam = np.random.uniform(size=n)\n",
    "            ptMass = np.random.uniform(size=n)\n",
    "            pennies = np.random.uniform(size=n)\n",
    "            sam[ptMass < p] = 0.0\n",
    "            samMean = np.mean(sam)\n",
    "            #\n",
    "            for i in range(len(gamma)):\n",
    "                kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = gamma[i], xtol = xtol)\n",
    "                covK[i] += (kLow <= popMean)\n",
    "                lowK[i] += kLow\n",
    "            #\n",
    "            \n",
    "        theRow = [p, n]\n",
    "        theRow.extend([str(100*float(covK[i])/float(reps)) + '%' for i in range(len(gamma))])\n",
    "        theRow.extend([str(round(lowK[i]/float(reps),4)) for i in range(len(gamma))])\n",
    "        simTable.loc[len(simTable)] = theRow\n",
    "#\n",
    "ansStr =  '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\\\n",
    "          'mixture of U[0,1] and pointmass at 0</h3>' +\\\n",
    "          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\\\n",
    "          '%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'\n",
    "\n",
    "display(HTML(ansStr))\n",
    "display(simTable)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, smaller values of $\\gamma$ improve the confidence bound when many observations are (nearly) zero. The Kaplan-Wald method is quite competitive with Continuous Penny Sampling in this case when $\\gamma = 0.1$. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals mixture of U[0,1] and pointmass at 1</h3><strong>Nominal coverage probability 95.0%</strong>. <br /><strong>Estimated from 10000 replications.</strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mass at 1</th>\n",
       "      <th>sample size</th>\n",
       "      <th>KW cov 0.01</th>\n",
       "      <th>KW cov 0.1</th>\n",
       "      <th>KW cov 0.5</th>\n",
       "      <th>KW cov 0.9</th>\n",
       "      <th>KW cov 0.999</th>\n",
       "      <th>KW low 0.01</th>\n",
       "      <th>KW low 0.1</th>\n",
       "      <th>KW low 0.5</th>\n",
       "      <th>KW low 0.9</th>\n",
       "      <th>KW low 0.999</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.900</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.069</td>\n",
       "      <td>0.4164</td>\n",
       "      <td>0.7496</td>\n",
       "      <td>0.8145</td>\n",
       "      <td>0.8193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.900</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.1322</td>\n",
       "      <td>0.5859</td>\n",
       "      <td>0.8383</td>\n",
       "      <td>0.8707</td>\n",
       "      <td>0.8699</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.900</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>98.33%</td>\n",
       "      <td>97.6%</td>\n",
       "      <td>0.2349</td>\n",
       "      <td>0.7269</td>\n",
       "      <td>0.888</td>\n",
       "      <td>0.899</td>\n",
       "      <td>0.8938</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.900</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>98.08%</td>\n",
       "      <td>96.18%</td>\n",
       "      <td>96.67%</td>\n",
       "      <td>0.5422</td>\n",
       "      <td>0.8824</td>\n",
       "      <td>0.928</td>\n",
       "      <td>0.9067</td>\n",
       "      <td>0.8941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.990</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.0724</td>\n",
       "      <td>0.4375</td>\n",
       "      <td>0.7921</td>\n",
       "      <td>0.8696</td>\n",
       "      <td>0.8798</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.990</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.1386</td>\n",
       "      <td>0.6149</td>\n",
       "      <td>0.8846</td>\n",
       "      <td>0.9288</td>\n",
       "      <td>0.934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.990</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.2462</td>\n",
       "      <td>0.7628</td>\n",
       "      <td>0.9372</td>\n",
       "      <td>0.9605</td>\n",
       "      <td>0.9629</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.990</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.568</td>\n",
       "      <td>0.9253</td>\n",
       "      <td>0.9795</td>\n",
       "      <td>0.9848</td>\n",
       "      <td>0.9848</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.999</td>\n",
       "      <td>25.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.0728</td>\n",
       "      <td>0.4397</td>\n",
       "      <td>0.7965</td>\n",
       "      <td>0.8754</td>\n",
       "      <td>0.8862</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.999</td>\n",
       "      <td>50.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.1393</td>\n",
       "      <td>0.6179</td>\n",
       "      <td>0.8896</td>\n",
       "      <td>0.9351</td>\n",
       "      <td>0.9411</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.999</td>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.2473</td>\n",
       "      <td>0.7664</td>\n",
       "      <td>0.9421</td>\n",
       "      <td>0.9666</td>\n",
       "      <td>0.9697</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.999</td>\n",
       "      <td>400.0</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>100.0%</td>\n",
       "      <td>0.5706</td>\n",
       "      <td>0.9296</td>\n",
       "      <td>0.9847</td>\n",
       "      <td>0.9911</td>\n",
       "      <td>0.9918</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    mass at 1  sample size KW cov 0.01 KW cov 0.1 KW cov 0.5 KW cov 0.9  \\\n",
       "0       0.900         25.0      100.0%     100.0%     100.0%     100.0%   \n",
       "1       0.900         50.0      100.0%     100.0%     100.0%     100.0%   \n",
       "2       0.900        100.0      100.0%     100.0%     100.0%     98.33%   \n",
       "3       0.900        400.0      100.0%     100.0%     98.08%     96.18%   \n",
       "4       0.990         25.0      100.0%     100.0%     100.0%     100.0%   \n",
       "5       0.990         50.0      100.0%     100.0%     100.0%     100.0%   \n",
       "6       0.990        100.0      100.0%     100.0%     100.0%     100.0%   \n",
       "7       0.990        400.0      100.0%     100.0%     100.0%     100.0%   \n",
       "8       0.999         25.0      100.0%     100.0%     100.0%     100.0%   \n",
       "9       0.999         50.0      100.0%     100.0%     100.0%     100.0%   \n",
       "10      0.999        100.0      100.0%     100.0%     100.0%     100.0%   \n",
       "11      0.999        400.0      100.0%     100.0%     100.0%     100.0%   \n",
       "\n",
       "   KW cov 0.999 KW low 0.01 KW low 0.1 KW low 0.5 KW low 0.9 KW low 0.999  \n",
       "0        100.0%       0.069     0.4164     0.7496     0.8145       0.8193  \n",
       "1        100.0%      0.1322     0.5859     0.8383     0.8707       0.8699  \n",
       "2         97.6%      0.2349     0.7269      0.888      0.899       0.8938  \n",
       "3        96.67%      0.5422     0.8824      0.928     0.9067       0.8941  \n",
       "4        100.0%      0.0724     0.4375     0.7921     0.8696       0.8798  \n",
       "5        100.0%      0.1386     0.6149     0.8846     0.9288        0.934  \n",
       "6        100.0%      0.2462     0.7628     0.9372     0.9605       0.9629  \n",
       "7        100.0%       0.568     0.9253     0.9795     0.9848       0.9848  \n",
       "8        100.0%      0.0728     0.4397     0.7965     0.8754       0.8862  \n",
       "9        100.0%      0.1393     0.6179     0.8896     0.9351       0.9411  \n",
       "10       100.0%      0.2473     0.7664     0.9421     0.9666       0.9697  \n",
       "11       100.0%      0.5706     0.9296     0.9847     0.9911       0.9918  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Nonstandard mixture: a pointmass at 1 and a uniform[0,1]\n",
    "ns = np.array([25, 50, 100, 400])  # sample sizes\n",
    "ps = np.array([0.9, 0.99, 0.999])    # mixture fraction, weight of pointmass\n",
    "alpha = 0.05  # 1- (confidence level)\n",
    "reps = int(1.0e4) # just for demonstration\n",
    "gamma = np.array([0.01, 0.1, 0.5, 0.9, 0.999])  # tuning constant in Kaplan-Wald\n",
    "xtol = 1.0e-12\n",
    "\n",
    "cols = ['mass at 1', 'sample size']\n",
    "cols.extend(['KW cov ' + str(g) for g in gamma])\n",
    "cols.extend(['KW low ' + str(g) for g in gamma])\n",
    "\n",
    "simTable = pd.DataFrame(columns=cols)\n",
    "\n",
    "for p in ps:\n",
    "    popMean = (1-p)*0.5 + p\n",
    "    for n in ns:\n",
    "        covK = np.zeros(len(gamma))\n",
    "        lowK = np.zeros(len(gamma))\n",
    "\n",
    "        for rep in range(int(reps)):\n",
    "            sam = np.random.uniform(size=n)\n",
    "            ptMass = np.random.uniform(size=n)\n",
    "            pennies = np.random.uniform(size=n)\n",
    "            sam[ptMass < p] = 1.0\n",
    "            samMean = np.mean(sam)\n",
    "            #\n",
    "            for i in range(len(gamma)):\n",
    "                kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = gamma[i], xtol = xtol)\n",
    "                covK[i] += (kLow <= popMean)\n",
    "                lowK[i] += kLow\n",
    "            #\n",
    "            \n",
    "        theRow = [p, n]\n",
    "        theRow.extend([str(100*float(covK[i])/float(reps)) + '%' for i in range(len(gamma))])\n",
    "        theRow.extend([str(round(lowK[i]/float(reps),4)) for i in range(len(gamma))])\n",
    "        simTable.loc[len(simTable)] = theRow\n",
    "#\n",
    "ansStr =  '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\\\n",
    "          'mixture of U[0,1] and pointmass at 1</h3>' +\\\n",
    "          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\\\n",
    "          '%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'\n",
    "\n",
    "display(HTML(ansStr))\n",
    "display(simTable)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, using a small value of $\\gamma$ hurts the confidence bound&mdash;at least for small sample sizes&mdash;when $x$ tends to have *few* values near zero."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's next?\n",
    "+ [Previous: Penny Sampling](pennySampling.ipynb)\n",
    "+ [Index](index.ipynb)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}