{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Finite-population SPRT for Population Percentage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example of the SPRT without replacement\n",
    "\n",
    "This notebook develops a sequential probability ratio test for the fraction of items labeled \"1\" in a population of $N$ items of which $Np$ are labeled $1$ and $N(1-p)$ are labeled \"0.\"\n",
    "\n",
    "This is a special case of the result derived in the notebook [Wald's Sequential Probability Ratio Test](sprt.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a population of $N$ items. \n",
    "Item $j$ has \"value\" $a_j \\in \\{0, 1\\}$. \n",
    "\n",
    "Define $p = \\frac{1}{N}\\sum_j a_j$ to be the population percentage.\n",
    "\n",
    "We want to test the hypothesis $H_0$ that $p = p_0$ against the\n",
    "alternative hypothesis $H_1$ that $p = p_1 $, for some\n",
    "fixed $p_1 > p_0$.\n",
    "\n",
    "We will draw items sequentially, without replacement, such that the chance that item $i$ is selected in draw $j$, assuming it has not been selected already, is $1/(N-j+1)$.\n",
    "Let ${\\mathcal B_{j-1}}$ be the indices of the items selected up to and including the $j-1$st draw,\n",
    "and ${\\mathcal B_0} \\equiv \\emptyset$. \n",
    "\n",
    "Let $\\mathbb B_j$ denote the index of the item selected at random in the $j$th draw.\n",
    "\n",
    "The chance that the first draw ${\\mathbb B_1}$ gives an item with value 1, i.e., \n",
    "$\\Pr \\{a_{\\mathbb B_1} = 1\\}$, is $\\frac{1}{N}\\sum_b a_b$.\n",
    "Under $H_0$, this chance is $p_{01} = p_0$; under $H_1$, this chance is \n",
    "$p_{11} = p_1$.\n",
    "\n",
    "Given the values of $\\{a_{\\mathbb B_k}\\}_{k=1}^i$, the conditional\n",
    "probability that the $i$th draw gives an item with value 1 is\n",
    "\n",
    "$$\n",
    "   \\Pr \\{a_{\\mathbb B_i} = 1 | {\\mathcal B_{i-1}} \\} = \\frac{ \\sum_{b \\notin {\\mathcal B_{i-1}}} a_b}{N-i+1}.\n",
    "$$\n",
    "\n",
    "Under $H_0$, this chance is\n",
    "\n",
    "$$\n",
    "   p_{0i} =  \\frac{Np_0 - \\sum_{b \\in {\\mathcal B_{i-1}}} a_b}{N - i + 1}.\n",
    "$$\n",
    "\n",
    "Under $H_1$, this chance is\n",
    "\n",
    "$$\n",
    "   p_{1i} = \\frac{Np_1 - \\sum_{b \\in {\\mathcal B_{i-1}}} a_b}{N - i + 1}.\n",
    "$$\n",
    "\n",
    "Let $X_i$ be the indicator of the event that the $i$th draw gives an item with\n",
    "value $1$, i.e., the indicator of the event $a_{\\mathbb B_i} = 1$.\n",
    "The likelihood ratio for a given sequence $\\{X_k\\}_{k=1}^i$ is\n",
    "\n",
    "$$\n",
    "    \\mbox{LR} = \\frac{\\prod_{k=1}^i p_{1k}^{X_k}(1-p_{1k})^{1-X_k}}\n",
    "         {\\prod_{k=1}^i p_{0k}^{X_k}(1-p_{0k})^{1-X_k}}.\n",
    "$$\n",
    "\n",
    "This can be simplified: \n",
    "$p_{0k}$ and $p_{1k}$ have the same denominator,\n",
    "$N - k + 1$, and their numerators share a term.\n",
    "Define $A(k) \\equiv \\sum_{b \\in {\\mathcal B_{k-1}}}$.\n",
    "Then\n",
    "\n",
    "$$\n",
    "    \\mbox{LR} = \\prod_{k=1}^i \n",
    "    \\left ( \\frac{Np_1 - A(k)}{Np_0 - A(k)} \\right )^{X_k}\n",
    "    \\left ( \\frac{N(1-p_1) - (k-1-A(k))}{N(1-p_0) - (k-1 - A(k))} \\right )^{1-X_k}.\n",
    "$$\n",
    "where the products are defined to be infinite if any denominator vanishes (or is negative).\n",
    "\n",
    "If $H_0$ is true, the chance that $\\mbox{LR}$ is ever greater than $1/\\alpha$\n",
    "is at most $\\alpha$.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# This is the first cell with code: set up the Python environment\n",
    "%matplotlib inline\n",
    "from __future__ import division, print_function\n",
    "import matplotlib.pyplot as plt\n",
    "import math\n",
    "import numpy as np\n",
    "import numpy.random\n",
    "import scipy as sp\n",
    "import scipy.stats\n",
    "# For interactive widgets\n",
    "from ipywidgets import interact, interactive, fixed\n",
    "import ipywidgets as widgets\n",
    "from IPython.display import clear_output, display, HTML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "np.random.seed(1234567890) # set seed for reproducibility"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def LRFromTrials(trials, N, p0, p1):\n",
    "    '''\n",
    "       Finds the sequence of likelihood ratios for the hypothesis that the population \n",
    "       percentage is p1 to the hypothesis that it is p0, for sampling without replacement\n",
    "       from a population of size N.\n",
    "    '''\n",
    "    A = np.cumsum(np.insert(trials, 0, 0)) # so that cumsum does the right thing\n",
    "    terms = np.ones(N)\n",
    "    for k in range(len(trials)):\n",
    "        if trials[k] == 1.0:\n",
    "            if (N*p0 - A[k]) > 0:\n",
    "                terms[k] = np.max([N*p1 - A[k], 0])/(N*p0 - A[k])\n",
    "            else:\n",
    "                terms[k] = np.inf\n",
    "        else:\n",
    "            if (N*(1-p0) - k + 1 + A[k]) > 0:\n",
    "                terms[k] = np.max([(N*(1-p1) - k + 1 + A[k]), 0])/(N*(1-p0) - k + 1 + A[k])\n",
    "            else:\n",
    "                terms[k] = np.inf\n",
    "    return(np.cumprod(terms))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "99382cb24ae440fa95cb27ea9a69e667"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "<function __main__.plotBernoulliSPRT>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def plotBernoulliSPRT(N, p, p0, p1, alpha):\n",
    "    '''\n",
    "       Plots the progress of a one-sided SPRT for N dependent Bernoulli trials \n",
    "       in sampling without replacement from a population of size N with a \n",
    "       fraction p of items labeled \"1,\" for testing the hypothesis that p=p0 \n",
    "       against the hypothesis p=p1 at significance level alpha\n",
    "    '''\n",
    "    plt.clf()\n",
    "    fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True)\n",
    "    trials = np.zeros(N)\n",
    "    nOnes = int(math.floor(N*p))\n",
    "    trials[0:nOnes] = np.ones(nOnes)\n",
    "    np.random.shuffle(trials) # items are in random order\n",
    "\n",
    "    LRs = np.nan_to_num(LRFromTrials(trials, N, p0, p1))\n",
    "    logLRs = np.nan_to_num(np.log(LRs))\n",
    "    \n",
    "    LRs[LRs > 10**6] = 10**6 # avoid plot overflow\n",
    "    logLRs[logLRs > 10**6] = 10**6 # avoid plot overflow\n",
    "    \n",
    "    #\n",
    "    ax[0].plot(range(N),LRs, color='b')\n",
    "    ax[0].axhline(y=1/alpha, xmin=0, xmax=N, color='g', label=r'$1/\\alpha$')\n",
    "    ax[0].set_title('Simulation of Wald\\'s SPRT for population percentage, w/o replacement')\n",
    "    ax[0].set_ylabel('LR')\n",
    "    ax[0].legend(loc='best')\n",
    "    #\n",
    "    ax[1].plot(range(N),logLRs, color='b', linestyle='--')\n",
    "    ax[1].axhline(y=math.log(1/alpha), xmin=0, xmax=N, color='g', label=r'$log(1/\\alpha)$')\n",
    "    ax[1].set_ylabel('log(LR)')\n",
    "    ax[1].set_xlabel('trials')\n",
    "    ax[1].legend(loc='best')\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "interact(plotBernoulliSPRT,\\\n",
    "         N=widgets.IntSlider(min=500, max=50000, step=500, value=5000),\\\n",
    "         p=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.51),\\\n",
    "         p0=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.5),\\\n",
    "         p1=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.51),\\\n",
    "         alpha=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.05)\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulate the distribution of sample sizes needed to reject"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "alpha = 0.05                   # significance level\n",
    "reps = int(10**4)              # number of replications\n",
    "p, p0, p1 = [0.525, 0.5, 0.525]  # need p > p0 or might never reject\n",
    "N = 10000                       # population size\n",
    "dist = np.zeros(reps)          # allocate space for the results\n",
    "\n",
    "trials = np.zeros(N)\n",
    "nOnes = int(math.floor(N*p))\n",
    "trials[0:nOnes] = np.ones(nOnes) # trials now contains math.floor(n*p) ones\n",
    "\n",
    "for i in np.arange(reps):\n",
    "    np.random.shuffle(trials) # items are in random order\n",
    "    LRs = LRFromTrials(trials, N, p0, p1) # likelihood ratios for this realization\n",
    "    dist[i] = np.min(np.where(LRs >= 1/alpha)) # trials at which threshold is crossed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1784.1438 [ 1526.  3265.]\n"
     ]
    }
   ],
   "source": [
    "# report mean, median, and 90th percentile\n",
    "print(np.mean(dist), np.percentile(dist, [50, 90]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's next?\n",
    "\n",
    "+ [Next: The Kaplan-Wald Confidence Bound for a Nonnegative Mean](kaplanWald.ipynb)\n",
    "+ [Previous: Wald's Sequential Probability Ratio Test](sprt.ipynb)\n",
    "+ [Index](index.ipynb)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}