{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "10413248",
   "metadata": {},
   "source": [
    "$$\n",
    "\\newcommand{\\argmax}{arg\\,max}\n",
    "\\newcommand{\\argmin}{arg\\,min}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67d6a7eb",
   "metadata": {},
   "source": [
    "\n",
    "<a id='wald-friedman'></a>\n",
    "<div id=\"qe-notebook-header\" align=\"right\" style=\"text-align:right;\">\n",
    "        <a href=\"https://quantecon.org/\" title=\"quantecon.org\">\n",
    "                <img style=\"width:250px;display:inline;\" width=\"250px\" src=\"https://assets.quantecon.org/img/qe-menubar-logo.svg\" alt=\"QuantEcon\">\n",
    "        </a>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7fbe253",
   "metadata": {},
   "source": [
    "# A Problem that Stumped Milton Friedman\n",
    "\n",
    "(and that Abraham Wald solved by inventing sequential analysis)\n",
    "\n",
    "\n",
    "<a id='index-1'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf2f9d0b",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "- [A Problem that Stumped Milton Friedman](#A-Problem-that-Stumped-Milton-Friedman)  \n",
    "  - [Overview](#Overview)  \n",
    "  - [Source of the Problem](#Source-of-the-Problem)  \n",
    "  - [Neyman-Pearson formulation](#Neyman-Pearson-formulation)  \n",
    "  - [Wald’s sequential formulation](#Wald’s-sequential-formulation)  \n",
    "  - [Links between $ A,B $ and $ \\alpha, \\beta $](#Links-between-$-A,B-$-and-$-\\alpha,-\\beta-$)  \n",
    "  - [Simulations](#Simulations)  \n",
    "  - [Related lectures](#Related-lectures)  \n",
    "  - [Exercises](#Exercises)  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec64190f",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "This is the first of two lectures about  a statistical decision problem that a US Navy Captain  presented to Milton\n",
    "Friedman and W. Allen Wallis during World War II when they were analysts at the U.S. Government’s  Statistical Research Group at Columbia University.\n",
    "\n",
    "This problem led Abraham Wald [[Wald, 1947](https://python.quantecon.org/zreferences.html#id125)] to formulate **sequential analysis**,\n",
    "an approach to statistical decision problems that is  intimately related to dynamic programming.\n",
    "\n",
    "In the spirit of [this earlier lecture](https://python.quantecon.org/prob_meaning.html), the present lecture and its [sequel](https://python.quantecon.org/wald_friedman_2.html) approach the problem from two distinct points of view, one frequentist, the other Bayesian.\n",
    "\n",
    "In this lecture, we describe  Wald’s formulation of the problem from the perspective of a  statistician\n",
    "working within the Neyman-Pearson tradition of a frequentist statistician who thinks about testing  hypotheses and consequently  use  laws of large numbers to  investigate limiting properties of particular statistics under a given  **hypothesis**, i.e., a vector of **parameters** that pins down a  particular member of a manifold of statistical models that interest the statistician.\n",
    "\n",
    "- From [this lecture on frequentist and bayesian statistics](https://python.quantecon.org/prob_meaning.html), please remember that a  frequentist statistician routinely calculates functions of sequences of random variables, conditioning on a vector of parameters.  \n",
    "\n",
    "\n",
    "In [this related lecture](https://python.quantecon.org/wald_friedman_2.html) we’ll discuss another formulation that adopts   the perspective of a **Bayesian statistician** who views parameters as random variables that are jointly distributed with  observable variables that he is concerned about.\n",
    "\n",
    "Because we are taking a frequentist perspective that is concerned about relative frequencies conditioned on alternative parameter values, i.e.,\n",
    "alternative **hypotheses**, key ideas in this lecture\n",
    "\n",
    "- Type I and type II statistical errors  \n",
    "  - a type I error occurs when you reject a null hypothesis that is true  \n",
    "  - a type II error occurs when you accept a null hypothesis that is false  \n",
    "- The **power** of a frequentist statistical test  \n",
    "- The **size** of a frequentist statistical test  \n",
    "- The **critical region** of a statistical test  \n",
    "- A **uniformly most powerful test**  \n",
    "- The role of a Law of Large Numbers (LLN) in interpreting **power** and **size** of a frequentist statistical test  \n",
    "- Abraham Wald’s **sequential probability ratio test**  \n",
    "\n",
    "\n",
    "We’ll begin with some imports:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91a9d4e0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from numba import njit, prange, vectorize, jit\n",
    "from numba.experimental import jitclass\n",
    "from math import gamma\n",
    "from scipy.integrate import quad\n",
    "from scipy.stats import beta\n",
    "from collections import namedtuple\n",
    "import pandas as pd\n",
    "import scipy as sp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02caed87",
   "metadata": {},
   "source": [
    "This lecture uses ideas studied in [the lecture on likelihood ratio processes](https://python.quantecon.org/likelihood_ratio_process.html) and  [the lecture on Bayesian learning](https://python.quantecon.org/likelihood_bayes.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06178439",
   "metadata": {},
   "source": [
    "## Source of the Problem\n",
    "\n",
    "On pages 137-139 of his 1998 book *Two Lucky People* with Rose Friedman [[Friedman and Friedman, 1998](https://python.quantecon.org/zreferences.html#id120)],\n",
    "Milton Friedman described a problem presented to him and Allen Wallis\n",
    "during World War II, when they worked at the US Government’s\n",
    "Statistical Research Group at Columbia University.\n",
    "\n",
    ">**Note**\n",
    ">\n",
    ">See pages 25 and 26  of Allen Wallis’s 1980 article [[Wallis, 1980](https://python.quantecon.org/zreferences.html#id17)]  about the Statistical Research Group at Columbia University during World War II for his account of the episode and  for important contributions  that Harold Hotelling made to formulating the problem.   Also see  chapter 5 of Jennifer Burns’ book about\n",
    "Milton Friedman [[Burns, 2023](https://python.quantecon.org/zreferences.html#id18)].\n",
    "\n",
    "Let’s listen to Milton Friedman tell us what happened\n",
    "\n",
    "> In order to understand the story, it is necessary to have an idea of a\n",
    "simple statistical problem, and of the standard procedure for dealing\n",
    "with it. The actual problem out of which sequential analysis grew will\n",
    "serve. The Navy has two alternative designs (say A and B) for a\n",
    "projectile. It wants to determine which is superior. To do so it\n",
    "undertakes a series of paired firings. On each round, it assigns the\n",
    "value 1 or 0 to A accordingly as its performance is superior or inferior\n",
    "to that of B and conversely 0 or 1 to B. The Navy asks the statistician\n",
    "how to conduct the test and how to analyze the results.\n",
    "\n",
    "\n",
    "> The standard statistical answer was to specify a number of firings (say\n",
    "1,000) and a pair of percentages (e.g., 53% and 47%) and tell the client\n",
    "that if A receives a 1 in more than 53% of the firings, it can be\n",
    "regarded as superior; if it receives a 1 in fewer than 47%, B can be\n",
    "regarded as superior; if the percentage is between 47% and 53%, neither\n",
    "can be so regarded.\n",
    "\n",
    "\n",
    "> When Allen Wallis was discussing such a problem with (Navy) Captain\n",
    "Garret L. Schuyler, the captain objected that such a test, to quote from\n",
    "Allen’s account, may prove wasteful. If a wise and seasoned ordnance\n",
    "officer like Schuyler were on the premises, he would see after the first\n",
    "few thousand or even few hundred [rounds] that the experiment need not\n",
    "be completed either because the new method is obviously inferior or\n",
    "because it is obviously superior beyond what was hoped for\n",
    "$ \\ldots $.\n",
    "\n",
    "\n",
    "Friedman and Wallis worked on  the problem for a while but didn’t completely solve it.\n",
    "\n",
    "Realizing that, they told Abraham Wald about the problem.\n",
    "\n",
    "That set  Wald on a path that led him  to create  *Sequential Analysis* [[Wald, 1947](https://python.quantecon.org/zreferences.html#id125)]."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3814dda1",
   "metadata": {},
   "source": [
    "## Neyman-Pearson formulation\n",
    "\n",
    "It is useful to begin by describing the theory underlying the test\n",
    "that the U.S. Navy told  Captain G. S. Schuyler to use.\n",
    "\n",
    "Captain Schuyler’s doubts  motivated  him to tell  Milton Friedman and Allen Wallis his conjecture\n",
    "that superior practical procedures existed.\n",
    "\n",
    "Evidently, the Navy had told Captain Schuyler to use what was then  a state-of-the-art\n",
    "Neyman-Pearson  hypothesis test.\n",
    "\n",
    "We’ll rely on Abraham Wald’s [[Wald, 1947](https://python.quantecon.org/zreferences.html#id125)] elegant summary of Neyman-Pearson theory.\n",
    "\n",
    "Watch for these features of the setup:\n",
    "\n",
    "- the assumption of a *fixed* sample size $ n $  \n",
    "- the application of laws of large numbers, conditioned on alternative\n",
    "  probability models, to interpret  probabilities $ \\alpha $ and\n",
    "  $ \\beta $ of the type I and type II errors defined in the Neyman-Pearson theory  \n",
    "\n",
    "\n",
    "In chapter 1 of **Sequential Analysis** [[Wald, 1947](https://python.quantecon.org/zreferences.html#id125)] Abraham Wald summarizes the\n",
    "Neyman-Pearson approach to hypothesis testing.\n",
    "\n",
    "Wald frames the problem as making a decision about a probability\n",
    "distribution that is partially known.\n",
    "\n",
    "(You have to assume that *something* is already known in order to state a well-posed\n",
    "problem – usually, *something* means *a lot*)\n",
    "\n",
    "By limiting  what is unknown, Wald uses the following simple structure\n",
    "to illustrate the main ideas:\n",
    "\n",
    "- A decision-maker wants to decide which of two distributions\n",
    "  $ f_0 $, $ f_1 $ govern an IID random variable $ z $.  \n",
    "- The null hypothesis $ H_0 $ is the statement that $ f_0 $\n",
    "  governs the data.  \n",
    "- The alternative hypothesis $ H_1 $ is the statement that\n",
    "  $ f_1 $ governs the data.  \n",
    "- The problem is to devise and analyze a test of hypothesis\n",
    "  $ H_0 $ against the alternative hypothesis $ H_1 $ on the\n",
    "  basis of a sample of a fixed number $ n $ independent\n",
    "  observations $ z_1, z_2, \\ldots, z_n $ of the random variable\n",
    "  $ z $.  \n",
    "\n",
    "\n",
    "To quote Abraham Wald,\n",
    "\n",
    "> A test procedure leading to the acceptance or rejection of the [null]\n",
    "hypothesis in question is simply a rule specifying, for each possible\n",
    "sample of size $ n $, whether the [null] hypothesis should be accepted\n",
    "or rejected on the basis of the sample. This may also be expressed as\n",
    "follows: A test procedure is simply a subdivision of the totality of\n",
    "all possible samples of size $ n $ into two mutually exclusive\n",
    "parts, say part 1 and part 2, together with the application of the\n",
    "rule that the [null] hypothesis be accepted if the observed sample is\n",
    "contained in part 2. Part 1 is also called the critical region. Since\n",
    "part 2 is the totality of all samples of size $ n $ which are not\n",
    "included in part 1, part 2 is uniquely determined by part 1. Thus,\n",
    "choosing a test procedure is equivalent to determining a critical\n",
    "region.\n",
    "\n",
    "\n",
    "Let’s listen to Wald longer:\n",
    "\n",
    "> As a basis for choosing among critical regions the following\n",
    "considerations have been advanced by Neyman and Pearson: In accepting\n",
    "or rejecting $ H_0 $ we may commit errors of two kinds. We commit\n",
    "an error of the first kind if we reject $ H_0 $ when it is true;\n",
    "we commit an error of the second kind if we accept $ H_0 $ when\n",
    "$ H_1 $ is true. After a particular critical region $ W $ has\n",
    "been chosen, the probability of committing an error of the first\n",
    "kind, as well as the probability of committing an error of the second\n",
    "kind is uniquely determined. The probability of committing an error\n",
    "of the first kind is equal to the probability, determined by the\n",
    "assumption that $ H_0 $ is true, that the observed sample will be\n",
    "included in the critical region $ W $. The probability of\n",
    "committing an error of the second kind is equal to the probability,\n",
    "determined on the assumption that $ H_1 $ is true, that the\n",
    "probability will fall outside the critical region $ W $. For any\n",
    "given critical region $ W $ we shall denote the probability of an\n",
    "error of the first kind by $ \\alpha $ and the probability of an\n",
    "error of the second kind by $ \\beta $.\n",
    "\n",
    "\n",
    "Let’s listen carefully to how Wald applies law of large numbers to\n",
    "interpret $ \\alpha $ and $ \\beta $:\n",
    "\n",
    "> The probabilities $ \\alpha $ and $ \\beta $ have the\n",
    "following important practical interpretation: Suppose that we draw a\n",
    "large number of samples of size $ n $. Let $ M $ be the\n",
    "number of such samples drawn. Suppose that for each of these\n",
    "$ M $ samples we reject $ H_0 $ if the sample is included in\n",
    "$ W $ and accept $ H_0 $ if the sample lies outside\n",
    "$ W $. In this way we make $ M $ statements of rejection or\n",
    "acceptance. Some of these statements will in general be wrong. If\n",
    "$ H_0 $ is true and if $ M $ is large, the probability is\n",
    "nearly $ 1 $ (i.e., it is practically certain) that the\n",
    "proportion of wrong statements (i.e., the number of wrong statements\n",
    "divided by $ M $) will be approximately $ \\alpha $. If\n",
    "$ H_1 $ is true, the probability is nearly $ 1 $ that the\n",
    "proportion of wrong statements will be approximately $ \\beta $.\n",
    "Thus, we can say that in the long run [ here Wald applies law of\n",
    "large numbers by driving $ M \\rightarrow \\infty $ (our comment,\n",
    "not Wald’s) ] the proportion of wrong statements will be\n",
    "$ \\alpha $ if $ H_0 $ is true and $ \\beta $ if\n",
    "$ H_1 $ is true.\n",
    "\n",
    "\n",
    "The quantity $ \\alpha $ is called the *size* of the critical region,\n",
    "and the quantity $ 1-\\beta $ is called the *power* of the critical\n",
    "region.\n",
    "\n",
    "Wald notes that\n",
    "\n",
    "> one critical region $ W $ is more desirable than another if it\n",
    "has smaller values of $ \\alpha $ and $ \\beta $. Although\n",
    "either $ \\alpha $ or $ \\beta $ can be made arbitrarily small\n",
    "by a proper choice of the critical region $ W $, it is impossible\n",
    "to make both $ \\alpha $ and $ \\beta $ arbitrarily small for a\n",
    "fixed value of $ n $, i.e., a fixed sample size.\n",
    "\n",
    "\n",
    "Wald summarizes Neyman and Pearson’s setup as follows:\n",
    "\n",
    "> Neyman and Pearson show that a region consisting of all samples\n",
    "$ (z_1, z_2, \\ldots, z_n) $ which satisfy the inequality\n",
    "\n",
    "$$\n",
    "\\frac{ f_1(z_1) \\cdots f_1(z_n)}{f_0(z_1) \\cdots f_0(z_n)} \\geq k\n",
    "$$\n",
    "\n",
    "is a most powerful critical region for testing the hypothesis\n",
    "$ H_0 $ against the alternative hypothesis $ H_1 $. The term\n",
    "$ k $ on the right side is a constant chosen so that the region\n",
    "will have the required size $ \\alpha $.\n",
    "\n",
    "\n",
    "Wald goes on to discuss Neyman and Pearson’s concept of *uniformly most\n",
    "powerful* test.\n",
    "\n",
    "Here is how Wald introduces the notion of a sequential test\n",
    "\n",
    "> A rule is given for making one of the following three decisions at any stage of\n",
    "the experiment (at the $ m $ th trial for each integral value of $ m $): (1) to\n",
    "accept the hypothesis $ H $, (2) to reject the hypothesis $ H $, (3) to\n",
    "continue the experiment by making an additional observation. Thus, such\n",
    "a test procedure is carried out sequentially. On the basis of the first\n",
    "observation, one of the aforementioned decision is made. If the first or\n",
    "second decision is made, the process is terminated. If the third\n",
    "decision is made, a second trial is performed. Again, on the basis of\n",
    "the first two observations, one of the three decision is made. If the\n",
    "third decision is made, a third trial is performed, and so on. The\n",
    "process is continued until either the first or the second decisions is\n",
    "made. The number $ n $ of observations required by such a test procedure is\n",
    "a random variable, since the value of $ n $ depends on the outcome of the\n",
    "observations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ca05a19",
   "metadata": {},
   "source": [
    "## Wald’s sequential formulation\n",
    "\n",
    "By way of contrast to Neyman and Pearson’s formulation of the problem, in Wald’s formulation\n",
    "\n",
    "- The sample size $ n $ is not fixed but rather  a random variable.  \n",
    "- Two  parameters $ A $ and $ B $ that are related to but distinct from Neyman and Pearson’s  $ \\alpha $ and  $ \\beta $;\n",
    "  $ A $ and $ B $  characterize cut-off   rules that Wald  uses to determine the random variable $ n $ as a function of random outcomes.  \n",
    "\n",
    "\n",
    "Here is how Wald sets up the problem.\n",
    "\n",
    "A decision-maker can observe a sequence of draws of a random variable $ z $.\n",
    "\n",
    "He (or she) wants to know which of two probability distributions $ f_0 $ or $ f_1 $ governs $ z $.\n",
    "\n",
    "We use beta distributions as examples.\n",
    "\n",
    "We will also work with Jensen-Shannon divergence introduced in [Statistical Divergence Measures](https://python.quantecon.org/divergence_measures.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92040c9a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "@vectorize\n",
    "def p(x, a, b):\n",
    "    \"\"\"Beta distribution density function.\"\"\"\n",
    "    r = gamma(a + b) / (gamma(a) * gamma(b))\n",
    "    return r * x** (a-1) * (1 - x) ** (b-1)\n",
    "\n",
    "def create_beta_density(a, b):\n",
    "    \"\"\"Create a beta density function with specified parameters.\"\"\"\n",
    "    return jit(lambda x: p(x, a, b))\n",
    "\n",
    "def compute_KL(f, g):\n",
    "    \"\"\"Compute KL divergence KL(f, g)\"\"\"\n",
    "    integrand = lambda w: f(w) * np.log(f(w) / g(w))\n",
    "    val, _ = quad(integrand, 1e-5, 1-1e-5)\n",
    "    return val\n",
    "\n",
    "def compute_JS(f, g):\n",
    "    \"\"\"Compute Jensen-Shannon divergence\"\"\"\n",
    "    def m(w):\n",
    "        return 0.5 * (f(w) + g(w))\n",
    "    \n",
    "    js_div = 0.5 * compute_KL(f, m) + 0.5 * compute_KL(g, m)\n",
    "    return js_div"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "904266db",
   "metadata": {},
   "source": [
    "The next figure shows two beta distributions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82218cb2",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "f0 = create_beta_density(1, 1)\n",
    "f1 = create_beta_density(9, 9)\n",
    "grid = np.linspace(0, 1, 50)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(grid, f0(grid), lw=2, label=\"$f_0$\")\n",
    "ax.plot(grid, f1(grid), lw=2, label=\"$f_1$\")\n",
    "ax.legend()\n",
    "ax.set(xlabel=\"$z$ values\", ylabel=\"probability of $z_k$\")\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da3a7138",
   "metadata": {},
   "source": [
    "Conditional on knowing that successive observations are drawn from distribution $ f_0 $, the sequence of\n",
    "random variables is independently and identically distributed (IID).\n",
    "\n",
    "Conditional on knowing that successive observations are drawn from distribution $ f_1 $, the sequence of\n",
    "random variables is also independently and identically distributed (IID).\n",
    "\n",
    "But the observer does not know which of the two distributions generated the sequence.\n",
    "\n",
    "For reasons explained in  [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html), this means that the observer thinks that the sequence is not IID.\n",
    "\n",
    "Consequently, the observer has something to learn, namely, whether the observations are drawn from  $ f_0 $ or from $ f_1 $.\n",
    "\n",
    "The decision maker   wants  to decide which of the  two distributions is generating outcomes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d60d4d7",
   "metadata": {},
   "source": [
    "### Type I and type II errors\n",
    "\n",
    "If we regard  $ f=f_0 $ as a null hypothesis and $ f=f_1 $ as an alternative hypothesis,\n",
    "then\n",
    "\n",
    "- a type I error is an incorrect rejection of a true null hypothesis (a “false positive”)  \n",
    "- a type II error is a failure to reject a false null hypothesis (a “false negative”)  \n",
    "\n",
    "\n",
    "To repeat ourselves\n",
    "\n",
    "- $ \\alpha $ is the probability of a type I error  \n",
    "- $ \\beta $ is the probability of a type II error  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ba8c4c6",
   "metadata": {},
   "source": [
    "### Choices\n",
    "\n",
    "After observing $ z_k, z_{k-1}, \\ldots, z_1 $, the decision-maker\n",
    "chooses among three distinct actions:\n",
    "\n",
    "- He decides that $ f = f_0 $ and draws no more $ z $’s  \n",
    "- He decides that $ f = f_1 $ and draws no more $ z $’s  \n",
    "- He postpones deciding  and instead chooses to draw\n",
    "  $ z_{k+1} $  \n",
    "\n",
    "\n",
    "Wald  defines\n",
    "\n",
    "- $ p_{0m} = f_0(z_1) \\cdots f_0(z_m) $  \n",
    "- $ p_{1m} = f_1(z_1) \\cdots f_1(z_m) $  \n",
    "- $ L_{m} = \\frac{p_{1m}}{p_{0m}} $  \n",
    "\n",
    "\n",
    "Here $ \\{L_m\\}_{m=0}^\\infty $ is a **likelihood ratio process**.\n",
    "\n",
    "Wald’s sequential  decision rule is parameterized by  real numbers $ B < A $.\n",
    "\n",
    "For a given pair $ A, B $, the decision rule is\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\textrm { accept } f=f_1 \\textrm{ if } L_m \\geq A \\\\\n",
    "\\textrm { accept } f=f_0 \\textrm{ if } L_m \\leq B \\\\\n",
    "\\textrm { draw another }  z \\textrm{ if }  B < L_m < A\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "The following figure illustrates aspects of Wald’s procedure.\n",
    "\n",
    "![https://python.quantecon.org/_static/lecture_specific/wald_friedman/wald_dec_rule.png](https://python.quantecon.org/_static/lecture_specific/wald_friedman/wald_dec_rule.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5e1bcf7",
   "metadata": {},
   "source": [
    "## Links between $ A,B $ and $ \\alpha, \\beta $\n",
    "\n",
    "In chapter 3 of **Sequential Analysis** [[Wald, 1947](https://python.quantecon.org/zreferences.html#id125)]  Wald establishes the inequalities\n",
    "\n",
    "$$\n",
    "\\begin{aligned} \n",
    " \\frac{\\alpha}{1 -\\beta} & \\leq \\frac{1}{A} \\\\\n",
    " \\frac{\\beta}{1 - \\alpha} & \\leq B \n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "His analysis of these inequalities leads Wald to recommend the following approximations as rules for setting\n",
    "$ A $ and $ B $ that come close to attaining a decision maker’s target values for probabilities $ \\alpha $ of\n",
    "a  type I  and $ \\beta $ of a type II error:\n",
    "\n",
    "\n",
    "<a id='equation-eq-waldrule'></a>\n",
    "$$\n",
    "\\begin{aligned}\n",
    "A \\approx a(\\alpha,\\beta) & \\equiv \\frac{1-\\beta}{\\alpha} \\\\\n",
    "B \\approx b(\\alpha,\\beta)  & \\equiv \\frac{\\beta}{1-\\alpha} \n",
    "\\end{aligned} \\tag{26.1}\n",
    "$$\n",
    "\n",
    "For small values of $ \\alpha $ and $ \\beta $, Wald shows that approximation  [(26.1)](#equation-eq-waldrule) provides a  good way to set $ A $ and $ B $.\n",
    "\n",
    "In particular, Wald constructs a mathematical argument that leads him to conclude that the use of approximation\n",
    "[(26.1)](#equation-eq-waldrule) rather than the true functions $ A (\\alpha, \\beta), B(\\alpha,\\beta) $ for setting $ A $ and $ B $\n",
    "\n",
    "> $ \\ldots $ cannot result in any appreciable increase in the value of either $ \\alpha $ or $ \\beta $. In other words,\n",
    "for all practical purposes the test corresponding to $ A = a(\\alpha, \\beta), B = b(\\alpha,\\beta) $ provides as\n",
    "least the same protection against wrong decisions as the test corresponding to $ A = A(\\alpha, \\beta) $ and\n",
    "$ B = b(\\alpha, \\beta) $.\n",
    "\n",
    "\n",
    "> Thus, the only disadvantage that may arise from using $ a(\\alpha, \\beta),  b(\\alpha,\\beta) $ instead of\n",
    "$ A(\\alpha, \\beta),  B(\\alpha,\\beta) $, respectively, is that it may result in an appreciable increase in\n",
    "the  number of observations required by the test.\n",
    "\n",
    "\n",
    "We’ll write some Python code to help us illustrate Wald’s claims about how $ \\alpha $ and $ \\beta $ are related to the parameters $ A $ and $ B $\n",
    "that characterize his sequential probability ratio test."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f79af35b",
   "metadata": {},
   "source": [
    "## Simulations\n",
    "\n",
    "We experiment with different distributions $ f_0 $ and $ f_1 $ to examine how Wald’s test performs under various conditions.\n",
    "\n",
    "Our goal in conducting these simulations is to understand  trade-offs between decision speed and accuracy associated with Wald’s  **sequential probability ratio test**.\n",
    "\n",
    "Specifically, we will watch  how:\n",
    "\n",
    "- The decision thresholds $ A $ and $ B $ (or equivalently the target error rates $ \\alpha $ and $ \\beta $) affect the average stopping time  \n",
    "- The discrepancy  between distributions $ f_0 $ and $ f_1 $  affects  average stopping times  \n",
    "\n",
    "\n",
    "We will focus on the case where $ f_0 $ and $ f_1 $ are beta distributions since it is easy to control the overlapping regions of the two densities by adjusting their shape parameters.\n",
    "\n",
    "First, we define a namedtuple to store all the parameters we need for our simulation studies.\n",
    "\n",
    "We also compute Wald’s recommended thresholds $ A $ and $ B $ based on the target type I and type II errors $ \\alpha $ and $ \\beta $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4ed3d66",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "SPRTParams = namedtuple('SPRTParams', \n",
    "                ['α', 'β',  # Target type I and type II errors\n",
    "                'a0', 'b0', # Shape parameters for f_0\n",
    "                'a1', 'b1', # Shape parameters for f_1\n",
    "                'N',        # Number of simulations\n",
    "                'seed'])\n",
    "\n",
    "@njit\n",
    "def compute_wald_thresholds(α, β):\n",
    "    \"\"\"Compute Wald's recommended thresholds.\"\"\"\n",
    "    A = (1 - β) / α\n",
    "    B = β / (1 - α)\n",
    "    return A, B, np.log(A), np.log(B)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf120b43",
   "metadata": {},
   "source": [
    "Now we can run the simulation following Wald’s recommendation.\n",
    "\n",
    "We’ll compare the log-likelihood ratio  to logarithms of the thresholds $ \\log(A) $ and $ \\log(B) $.\n",
    "\n",
    "The following algorithm underlies our simulations.\n",
    "\n",
    "1. Compute thresholds $ A = \\frac{1-\\beta}{\\alpha} $, $ B = \\frac{\\beta}{1-\\alpha} $ and work with $ \\log A $, $ \\log B $.  \n",
    "1. Given true distribution (either $ f_0 $ or $ f_1 $):  \n",
    "  - Initialize log-likelihood ratio $ \\log L_0 = 0 $  \n",
    "  - Repeat:  \n",
    "    - Draw observation $ z $ from the true distribution  \n",
    "    - Update: $ \\log L_{n+1} \\leftarrow \\log L_n + (\\log f_1(z) - \\log f_0(z)) $  \n",
    "    - If $ \\log L_{n+1} \\geq \\log A $: stop, reject $ H_0 $  \n",
    "    - If $ \\log L_{n+1} \\leq \\log B $: stop, accept $ H_0 $  \n",
    "1. Repeat step 2 for $ N $ replications with $ N/2 $ replications\n",
    "  for each distribution, compute the empirical type I error $ \\hat{\\alpha} $ and type II error $ \\hat{\\beta} $ with  \n",
    "\n",
    "\n",
    "$$\n",
    "\\hat{\\alpha} = \\frac{\\text{\\$\\#\\$ of times reject } H_0 \\text{ when } f_0 \\text{ is true}}{\\text{\\$\\#\\$ of replications with } f_0 \\text{ true}}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\hat{\\beta} = \\frac{\\text{\\$\\#\\$ of times accept } H_0 \\text{ when } f_1 \\text{ is true}}{\\text{\\$\\#\\$ of replications with } f_1 \\text{ true}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65f38438",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "@njit\n",
    "def sprt_single_run(a0, b0, a1, b1, logA, logB, true_f0, seed):\n",
    "    \"\"\"Run a single SPRT until a decision is reached.\"\"\"\n",
    "    log_L = 0.0\n",
    "    n = 0\n",
    "    np.random.seed(seed)\n",
    "    \n",
    "    while True:\n",
    "        z = np.random.beta(a0, b0) if true_f0 else np.random.beta(a1, b1)\n",
    "        n += 1\n",
    "        \n",
    "        # Update log-likelihood ratio\n",
    "        log_L += np.log(p(z, a1, b1)) - np.log(p(z, a0, b0))\n",
    "        \n",
    "        # Check stopping conditions\n",
    "        if log_L >= logA:\n",
    "            return n, False  # Reject H0\n",
    "        elif log_L <= logB:\n",
    "            return n, True   # Accept H0\n",
    "\n",
    "@njit(parallel=True)\n",
    "def run_sprt_simulation(a0, b0, a1, b1, α, β, N, seed):\n",
    "    \"\"\"SPRT simulation.\"\"\"\n",
    "    A, B, logA, logB = compute_wald_thresholds(α, β)\n",
    "    \n",
    "    stopping_times = np.zeros(N, dtype=np.int64)\n",
    "    decisions_h0 = np.zeros(N, dtype=np.bool_)\n",
    "    truth_h0 = np.zeros(N, dtype=np.bool_)\n",
    "    \n",
    "    for i in prange(N):\n",
    "        true_f0 = (i % 2 == 0)\n",
    "        truth_h0[i] = true_f0\n",
    "        \n",
    "        n, accept_f0 = sprt_single_run(\n",
    "                        a0, b0, a1, b1, \n",
    "                        logA, logB, \n",
    "                        true_f0, seed + i)\n",
    "        stopping_times[i] = n\n",
    "        decisions_h0[i] = accept_f0\n",
    "    \n",
    "    return stopping_times, decisions_h0, truth_h0\n",
    "\n",
    "def run_sprt(params):\n",
    "    \"\"\"Run SPRT simulations with given parameters.\"\"\"\n",
    "    stopping_times, decisions_h0, truth_h0 = run_sprt_simulation(\n",
    "        params.a0, params.b0, params.a1, params.b1, \n",
    "        params.α, params.β, params.N, params.seed\n",
    "    )\n",
    "    \n",
    "    # Calculate error rates\n",
    "    truth_h0_bool = truth_h0.astype(bool)\n",
    "    decisions_h0_bool = decisions_h0.astype(bool)\n",
    "    \n",
    "    type_I = np.sum(truth_h0_bool & ~decisions_h0_bool) \\\n",
    "            / np.sum(truth_h0_bool)\n",
    "    type_II = np.sum(~truth_h0_bool & decisions_h0_bool) \\\n",
    "            / np.sum(~truth_h0_bool)\n",
    "    \n",
    "    return {\n",
    "        'stopping_times': stopping_times,\n",
    "        'decisions_h0': decisions_h0_bool,\n",
    "        'truth_h0': truth_h0_bool,\n",
    "        'type_I': type_I,\n",
    "        'type_II': type_II\n",
    "    }\n",
    "\n",
    "# Run simulation\n",
    "params = SPRTParams(α=0.05, β=0.10, a0=2, b0=5, a1=5, b1=2, N=20000, seed=1)\n",
    "results = run_sprt(params)\n",
    "\n",
    "print(f\"Average stopping time: {results['stopping_times'].mean():.2f}\")\n",
    "print(f\"Empirical type I  error: {results['type_I']:.3f} (target = {params.α})\")\n",
    "print(f\"Empirical type II error: {results['type_II']:.3f} (target = {params.β})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20928615",
   "metadata": {},
   "source": [
    "As anticipated in the passage above in which Wald discussed the quality of\n",
    "$ a(\\alpha, \\beta), b(\\alpha, \\beta) $ given in approximation [(26.1)](#equation-eq-waldrule),\n",
    "we find that the algorithm actually gives\n",
    "**lower** type I and type II error rates than the target values.\n",
    "\n",
    ">**Note**\n",
    ">\n",
    ">For recent work on the quality of approximation [(26.1)](#equation-eq-waldrule), see, e.g., [[Fischer and Ramdas, 2024](https://python.quantecon.org/zreferences.html#id276)].\n",
    "\n",
    "The following code creates a few graphs that illustrate the results of our simulation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a4f4cd0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "@njit\n",
    "def compute_wald_thresholds(α, β):\n",
    "    \"\"\"Compute Wald's recommended thresholds.\"\"\"\n",
    "    A = (1 - β) / α\n",
    "    B = β / (1 - α)\n",
    "    return A, B, np.log(A), np.log(B)\n",
    "\n",
    "def plot_sprt_results(results, params, title=\"\"):\n",
    "    \"\"\"Plot SPRT results.\"\"\"\n",
    "    fig, axes = plt.subplots(1, 3, figsize=(20, 6))\n",
    "    \n",
    "    # Distribution plots\n",
    "    z_grid = np.linspace(0, 1, 200)\n",
    "    f0 = create_beta_density(params.a0, params.b0)\n",
    "    f1 = create_beta_density(params.a1, params.b1)\n",
    "    \n",
    "    axes[0].plot(z_grid, f0(z_grid), 'b-', lw=2, \n",
    "                 label=f'$f_0 = \\\\text{{Beta}}({params.a0},{params.b0})$')\n",
    "    axes[0].plot(z_grid, f1(z_grid), 'r-', lw=2, \n",
    "                 label=f'$f_1 = \\\\text{{Beta}}({params.a1},{params.b1})$')\n",
    "    axes[0].fill_between(z_grid, 0, \n",
    "                        np.minimum(f0(z_grid), f1(z_grid)), \n",
    "                        alpha=0.3, color='purple', label='overlap')\n",
    "    if title:\n",
    "        axes[0].set_title(title, fontsize=20)\n",
    "    axes[0].set_xlabel('z', fontsize=16)\n",
    "    axes[0].set_ylabel('density', fontsize=16)\n",
    "    axes[0].legend(fontsize=14)\n",
    "    \n",
    "    # Stopping times\n",
    "    max_n = min(results['stopping_times'].max(), 101)\n",
    "    bins = np.arange(1, max_n) - 0.5\n",
    "    axes[1].hist(results['stopping_times'], bins=bins, \n",
    "                 color=\"steelblue\", alpha=0.8, edgecolor=\"black\")\n",
    "    axes[1].set_title(f'stopping times (μ={results[\"stopping_times\"].mean():.1f})', \n",
    "                      fontsize=16)\n",
    "    axes[1].set_xlabel('n', fontsize=16)\n",
    "    axes[1].set_ylabel('frequency', fontsize=16)\n",
    "    axes[1].set_xlim(0, 100)\n",
    "    \n",
    "    # Confusion matrix\n",
    "    plot_confusion_matrix(results, axes[2])\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "def plot_confusion_matrix(results, ax):\n",
    "    \"\"\"Plot confusion matrix for SPRT results.\"\"\"\n",
    "    f0_correct = np.sum(results['truth_h0'] & results['decisions_h0'])\n",
    "    f0_incorrect = np.sum(results['truth_h0'] & (~results['decisions_h0']))\n",
    "    f1_correct = np.sum((~results['truth_h0']) & (~results['decisions_h0']))\n",
    "    f1_incorrect = np.sum((~results['truth_h0']) & results['decisions_h0'])\n",
    "    \n",
    "    confusion_data = np.array([[f0_correct, f0_incorrect], \n",
    "                              [f1_incorrect, f1_correct]])\n",
    "    row_totals = confusion_data.sum(axis=1, keepdims=True)\n",
    "    \n",
    "    im = ax.imshow(confusion_data, cmap='Blues', aspect='equal')\n",
    "    ax.set_title(f'errors: I={results[\"type_I\"]:.3f} II={results[\"type_II\"]:.3f}', \n",
    "                 fontsize=16)\n",
    "    ax.set_xticks([0, 1])\n",
    "    ax.set_xticklabels(['accept $H_0$', 'reject $H_0$'], fontsize=14)\n",
    "    ax.set_yticks([0, 1])\n",
    "    ax.set_yticklabels(['true $f_0$', 'true $f_1$'], fontsize=14)\n",
    "    \n",
    "    for i in range(2):\n",
    "        for j in range(2):\n",
    "            percent = confusion_data[i, j] / row_totals[i, 0] \\\n",
    "                        if row_totals[i, 0] > 0 else 0\n",
    "            color = 'white' if confusion_data[i, j] > confusion_data.max() * 0.5 \\\n",
    "                    else 'black'\n",
    "            ax.text(j, i, f'{confusion_data[i, j]}\\n({percent:.1%})',\n",
    "                   ha=\"center\", va=\"center\", color=color, fontweight='bold', \n",
    "                   fontsize=14)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb19d33b",
   "metadata": {},
   "source": [
    "Let’s plot the results of our simulation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cabbaef",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plot_sprt_results(results, params)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "914be82d",
   "metadata": {},
   "source": [
    "In this example, the stopping time stays below 10.\n",
    "\n",
    "We  can construct a $ 2 \\times 2 $  “confusion matrix” whose  diagonal elements\n",
    "count the number of times that Wald’s  decision rule  correctly  accepts and\n",
    "rejects the null hypothesis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b892336",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "print(\"Confusion Matrix data:\")\n",
    "print(f\"Type I error: {results['type_I']:.3f}\")\n",
    "print(f\"Type II error: {results['type_II']:.3f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7189dd97",
   "metadata": {},
   "source": [
    "Next we use our code to study three different $ f_0, f_1 $ pairs having different discrepancies between distributions.\n",
    "\n",
    "We plot the same three graphs we used above for each pair of distributions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a73732e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "params_1 = SPRTParams(α=0.05, β=0.10, a0=2, b0=8, a1=8, b1=2, N=5000, seed=42)\n",
    "results_1 = run_sprt(params_1)\n",
    "\n",
    "params_2 = SPRTParams(α=0.05, β=0.10, a0=4, b0=5, a1=5, b1=4, N=5000, seed=42)\n",
    "results_2 = run_sprt(params_2)\n",
    "\n",
    "params_3 = SPRTParams(α=0.05, β=0.10, a0=0.5, b0=0.4, a1=0.4, \n",
    "                      b1=0.5, N=5000, seed=42)\n",
    "results_3 = run_sprt(params_3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47623750",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plot_sprt_results(results_1, params_1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9be89c0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plot_sprt_results(results_2, params_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "786bb4e7",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plot_sprt_results(results_3, params_3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61d03d60",
   "metadata": {},
   "source": [
    "Notice that  the stopping times are less when the  two  distributions are farther apart.\n",
    "\n",
    "This makes sense.\n",
    "\n",
    "When two distributions are “far apart”, it should not take too long to decide which one is generating the data.\n",
    "\n",
    "When two distributions are “close”, it should  takes longer to decide which one is generating the data.\n",
    "\n",
    "It is tempting to link this pattern to our discussion of [Kullback–Leibler divergence](https://python.quantecon.org/divergence_measures.html#rel-entropy) in [Likelihood Ratio Processes](https://python.quantecon.org/likelihood_ratio_process.html).\n",
    "\n",
    "While, KL divergence is larger when two distributions differ more, KL divergence is not symmetric, meaning that the KL divergence of distribution $ f $ from distribution $ g $  is not necessarily equal to the KL\n",
    "divergence of $ g $ from $ f $.\n",
    "\n",
    "If we want a symmetric measure of divergence that actually a metric, we can instead use  [Jensen-Shannon distance](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jensenshannon.html).\n",
    "\n",
    "That is what we shall do now.\n",
    "\n",
    "We shall compute Jensen-Shannon distance  and plot it against the average stopping times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "133c52c9",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def js_dist(a0, b0, a1, b1):\n",
    "    \"\"\"Jensen–Shannon distance\"\"\"\n",
    "    f0 = create_beta_density(a0, b0)\n",
    "    f1 = create_beta_density(a1, b1)\n",
    "\n",
    "    # Mixture\n",
    "    m = lambda w: 0.5*(f0(w) + f1(w))\n",
    "    return np.sqrt(0.5*compute_KL(m, f0) + 0.5*compute_KL(m, f1))\n",
    "    \n",
    "def generate_β_pairs(N=100, T=10.0, d_min=0.5, d_max=9.5):\n",
    "    ds = np.linspace(d_min, d_max, N)\n",
    "    a0 = (T - ds) / 2\n",
    "    b0 = (T + ds) / 2\n",
    "    return list(zip(a0, b0, b0, a0))\n",
    "\n",
    "param_comb = generate_β_pairs()\n",
    "\n",
    "# Run simulations for each parameter combination\n",
    "js_dists = []\n",
    "mean_stopping_times = []\n",
    "param_list = []\n",
    "\n",
    "for a0, b0, a1, b1 in param_comb:\n",
    "    # Compute KL divergence\n",
    "    js_div = js_dist(a1, b1, a0, b0)\n",
    "    \n",
    "    # Run SPRT simulation with a fixed set of parameters d d\n",
    "    params = SPRTParams(α=0.05, β=0.10, a0=a0, b0=b0, \n",
    "                        a1=a1, b1=b1, N=5000, seed=42)\n",
    "    results = run_sprt(params)\n",
    "    \n",
    "    js_dists.append(js_div)\n",
    "    mean_stopping_times.append(results['stopping_times'].mean())\n",
    "    param_list.append((a0, b0, a1, b1))\n",
    "\n",
    "# Create the plot\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "scatter = ax.scatter(js_dists, mean_stopping_times, \n",
    "                    s=80, alpha=0.7, linewidth=0.5)\n",
    "\n",
    "ax.set_xlabel('Jensen–Shannon distance', fontsize=14)\n",
    "ax.set_ylabel('mean stopping time', fontsize=14)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d95b63d0",
   "metadata": {},
   "source": [
    "The plot demonstrates a clear negative correlation between relative entropy and mean stopping time.\n",
    "\n",
    "As  Jensen-Shannon divergence increases (distributions become more separated), the mean stopping time decreases exponentially.\n",
    "\n",
    "Below are sampled examples from the experiments we have above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c718c1d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def plot_beta_distributions_grid(param_list, js_dists, mean_stopping_times, \n",
    "                                selected_indices=None):\n",
    "    \"\"\"Plot grid of beta distributions with JS distance and stopping times.\"\"\"\n",
    "    if selected_indices is None:\n",
    "        selected_indices = [0, len(param_list)//6, len(param_list)//3, \n",
    "                          len(param_list)//2, 2*len(param_list)//3, -1]\n",
    "    \n",
    "    fig, axes = plt.subplots(2, 3, figsize=(15, 8))\n",
    "    z_grid = np.linspace(0, 1, 200)\n",
    "    \n",
    "    for i, idx in enumerate(selected_indices):\n",
    "        row, col = i // 3, i % 3\n",
    "        a0, b0, a1, b1 = param_list[idx]\n",
    "        \n",
    "        f0 = create_beta_density(a0, b0)\n",
    "        f1 = create_beta_density(a1, b1)\n",
    "        \n",
    "        axes[row, col].plot(z_grid, f0(z_grid), 'b-', lw=2, label='$f_0$')\n",
    "        axes[row, col].plot(z_grid, f1(z_grid), 'r-', lw=2, label='$f_1$')\n",
    "        axes[row, col].fill_between(z_grid, 0, \n",
    "                                  np.minimum(f0(z_grid), f1(z_grid)), \n",
    "                                  alpha=0.3, color='purple')\n",
    "        \n",
    "        axes[row, col].set_title(f'JS dist: {js_dists[idx]:.3f}'\n",
    "                               f'\\nMean time: {mean_stopping_times[idx]:.1f}', \n",
    "                               fontsize=12)\n",
    "        axes[row, col].set_xlabel('z', fontsize=10)\n",
    "        if i == 0:\n",
    "            axes[row, col].set_ylabel('density', fontsize=10)\n",
    "            axes[row, col].legend(fontsize=10)\n",
    "\n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "plot_beta_distributions_grid(param_list, js_dists, mean_stopping_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d126901e",
   "metadata": {},
   "source": [
    "Again, we find that the stopping time is shorter when the distributions are more separated, as\n",
    "measured by Jensen-Shannon distance.\n",
    "\n",
    "Let’s visualize individual likelihood ratio processes to see how they evolve toward the decision boundaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11827fcf",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def plot_likelihood_paths(params, n_highlight=10, n_background=200):\n",
    "    \"\"\"visualize likelihood ratio paths.\"\"\"\n",
    "    A, B, logA, logB = compute_wald_thresholds(params.α, params.β)\n",
    "    f0, f1 = map(lambda ab: create_beta_density(*ab),\n",
    "             [(params.a0, params.b0), \n",
    "              (params.a1, params.b1)])\n",
    "    \n",
    "    fig, axes = plt.subplots(1, 2, figsize=(14, 7))\n",
    "    \n",
    "    for dist_idx, (true_f0, ax, title) in enumerate([\n",
    "        (True, axes[0], 'true distribution: $f_0$'),\n",
    "        (False, axes[1], 'true distribution: $f_1$')\n",
    "    ]):\n",
    "        rng = np.random.default_rng(seed=42 + dist_idx)\n",
    "        paths_data = []\n",
    "        \n",
    "        # Generate paths\n",
    "        for path in range(n_background + n_highlight):\n",
    "            log_L_path, log_L, n = [0.0], 0.0, 0\n",
    "            \n",
    "            while True:\n",
    "                z = rng.beta(params.a0, params.b0) if true_f0 \\\n",
    "                    else rng.beta(params.a1, params.b1)\n",
    "                n += 1\n",
    "                log_L += np.log(f1(z)) - np.log(f0(z))\n",
    "                log_L_path.append(log_L)\n",
    "                \n",
    "                if log_L >= logA or log_L <= logB:\n",
    "                    paths_data.append((log_L_path, n, log_L >= logA))\n",
    "                    break\n",
    "        \n",
    "        # Plot background paths\n",
    "        for path, _, decision in paths_data[:n_background]:\n",
    "            ax.plot(range(len(path)), path, color='C1' if decision else 'C0', \n",
    "                   alpha=0.2, linewidth=0.5)\n",
    "        \n",
    "        # Plot highlighted paths with labels\n",
    "        for i, (path, _, decision) in enumerate(paths_data[n_background:]):\n",
    "            ax.plot(range(len(path)), path, color='C1' if decision else 'C0', \n",
    "                   alpha=0.8, linewidth=1.5,\n",
    "                   label='reject $H_0$' if decision and i == 0 else (\n",
    "                         'accept $H_0$' if not decision and i == 0 else ''))\n",
    "        \n",
    "        # Add threshold lines and formatting\n",
    "        ax.axhline(y=logA, color='C1', linestyle='--', linewidth=2, \n",
    "                  label=f'$\\\\log A = {logA:.2f}$')\n",
    "        ax.axhline(y=logB, color='C0', linestyle='--', linewidth=2, \n",
    "                  label=f'$\\\\log B = {logB:.2f}$')\n",
    "        ax.axhline(y=0, color='black', linestyle='-', alpha=0.5, linewidth=1)\n",
    "        \n",
    "        ax.set_xlabel(r'$n$') \n",
    "        ax.set_ylabel(r'$\\log(L_n)$')\n",
    "        ax.set_title(title, fontsize=20)\n",
    "        ax.legend(fontsize=18, loc='center right')\n",
    "        \n",
    "        y_margin = max(abs(logA), abs(logB)) * 0.2\n",
    "        ax.set_ylim(logB - y_margin, logA + y_margin)\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "plot_likelihood_paths(params_3, n_highlight=10, n_background=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cc0480e",
   "metadata": {},
   "source": [
    "Next, let’s adjust the decision thresholds $ A $ and $ B $ and examine how the mean stopping time and the type I and type II error rates change.\n",
    "\n",
    "In the code below, we adjust  Wald’s rule by adjusting the thresholds $ A $ and $ B $ using factors $ A_f $ and $ B_f $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b88f8c3e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "@njit(parallel=True)  \n",
    "def run_adjusted_thresholds(a0, b0, a1, b1, α, β, N, seed, A_f, B_f):\n",
    "    \"\"\"SPRT simulation with adjusted thresholds.\"\"\"\n",
    "    \n",
    "    # Calculate original thresholds  \n",
    "    A_original = (1 - β) / α\n",
    "    B_original = β / (1 - α)\n",
    "    \n",
    "    # Apply adjustment factors\n",
    "    A_adj = A_original * A_f\n",
    "    B_adj = B_original * B_f\n",
    "    logA = np.log(A_adj)\n",
    "    logB = np.log(B_adj)\n",
    "    \n",
    "    # Pre-allocate arrays\n",
    "    stopping_times = np.zeros(N, dtype=np.int64)\n",
    "    decisions_h0 = np.zeros(N, dtype=np.bool_)\n",
    "    truth_h0 = np.zeros(N, dtype=np.bool_)\n",
    "    \n",
    "    # Run simulations in parallel\n",
    "    for i in prange(N):\n",
    "        true_f0 = (i % 2 == 0)\n",
    "        truth_h0[i] = true_f0\n",
    "        \n",
    "        n, accept_f0 = sprt_single_run(a0, b0, a1, b1, \n",
    "                        logA, logB, true_f0, seed + i)\n",
    "        stopping_times[i] = n\n",
    "        decisions_h0[i] = accept_f0\n",
    "    \n",
    "    return stopping_times, decisions_h0, truth_h0, A_adj, B_adj\n",
    "\n",
    "def run_adjusted(params, A_f=1.0, B_f=1.0):\n",
    "    \"\"\"Wrapper to run SPRT with adjusted A and B thresholds.\"\"\"\n",
    "    \n",
    "    stopping_times, decisions_h0, truth_h0, A_adj, B_adj = run_adjusted_thresholds(\n",
    "        params.a0, params.b0, params.a1, params.b1, \n",
    "        params.α, params.β, params.N, params.seed, A_f, B_f\n",
    "    )\n",
    "    truth_h0_bool = truth_h0.astype(bool)\n",
    "    decisions_h0_bool = decisions_h0.astype(bool)\n",
    "    \n",
    "    # Calculate error rates\n",
    "    type_I = np.sum(truth_h0_bool \n",
    "                    & ~decisions_h0_bool) / np.sum(truth_h0_bool)\n",
    "    type_II = np.sum(~truth_h0_bool \n",
    "                    & decisions_h0_bool) / np.sum(~truth_h0_bool)\n",
    "    \n",
    "    return {\n",
    "        'stopping_times': stopping_times,\n",
    "        'type_I': type_I,\n",
    "        'type_II': type_II,\n",
    "        'A_used': A_adj,\n",
    "        'B_used': B_adj\n",
    "    }\n",
    "\n",
    "adjustments = [\n",
    "    (5.0, 0.5), \n",
    "    (1.0, 1.0),    \n",
    "    (0.3, 3.0),    \n",
    "    (0.2, 5.0),    \n",
    "    (0.15, 7.0),   \n",
    "]\n",
    "\n",
    "results_table = []\n",
    "for A_f, B_f in adjustments:\n",
    "    result = run_adjusted(params_2, A_f, B_f)\n",
    "    results_table.append([\n",
    "        A_f, B_f, \n",
    "        f\"{result['stopping_times'].mean():.1f}\",\n",
    "        f\"{result['type_I']:.3f}\",\n",
    "        f\"{result['type_II']:.3f}\"\n",
    "    ])\n",
    "\n",
    "df = pd.DataFrame(results_table, \n",
    "                 columns=[\"A_f\", \"B_f\", \"mean stop time\", \n",
    "                          \"Type I error\", \"Type II error\"])\n",
    "df = df.set_index([\"A_f\", \"B_f\"])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fabd1ab5",
   "metadata": {},
   "source": [
    "Let’s pause and think about the table more carefully by referring back to [(26.1)](#equation-eq-waldrule).\n",
    "\n",
    "Recall that $ A = \\frac{1-\\beta}{\\alpha} $ and $ B = \\frac{\\beta}{1-\\alpha} $.\n",
    "\n",
    "When we multiply $ A $ by a factor less than 1 (making $ A $ smaller), we are effectively making it easier to reject the null hypothesis $ H_0 $.\n",
    "\n",
    "This increases the probability of Type I errors.\n",
    "\n",
    "When we multiply $ B $ by a factor greater than 1 (making $ B $ larger), we are making it easier to accept the null hypothesis $ H_0 $.\n",
    "\n",
    "This increases the probability of Type II errors.\n",
    "\n",
    "The table confirms this intuition: as $ A $ decreases and $ B $ increases from their optimal Wald values, both Type I and Type II error rates increase, while the mean stopping time decreases."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1bc2835",
   "metadata": {},
   "source": [
    "## Related lectures\n",
    "\n",
    "We’ll dig deeper into some of the ideas used here in the following earlier and later lectures:\n",
    "\n",
    "- In [this sequel](https://python.quantecon.org/wald_friedman_2.html), we reformulate the problem from the perspective of a **Bayesian statistician** who views parameters as vectors of random variables that are jointly distributed with the observables they are concerned about.  \n",
    "- The concept of **exchangeability**, which underlies much of statistical learning, is explored in depth in our [lecture on exchangeable random variables](https://python.quantecon.org/exchangeable.html).  \n",
    "- For a deeper understanding of likelihood ratio processes and their role in frequentist and Bayesian statistical theories, see [Likelihood Ratio Processes](https://python.quantecon.org/likelihood_ratio_process.html).  \n",
    "- Building on that foundation, [Likelihood Ratio Processes and Bayesian Learning](https://python.quantecon.org/likelihood_bayes.html) examines the role of likelihood ratio processes in **Bayesian learning**.  \n",
    "- Finally, [this later lecture](https://python.quantecon.org/navy_captain.html) revisits the subject discussed here and examines whether the frequentist decision rule that the Navy ordered the captain to use would perform better or worse than Abraham Wald’s sequential decision rule.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df27d298",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "In the two exercises below, please try to rewrite the entire SPRT suite in this lecture."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ac8e21f",
   "metadata": {},
   "source": [
    "## Exercise 26.1\n",
    "\n",
    "In the first exercise, we apply the sequential probability ratio test to distinguish two models generated by 3-state Markov chains\n",
    "\n",
    "(For a review on likelihood ratio processes for Markov chains, see [this section](https://python.quantecon.org/likelihood_ratio_process.html#lrp-markov).)\n",
    "\n",
    "Consider distinguishing between two 3-state Markov chain models using Wald’s sequential probability ratio test.\n",
    "\n",
    "You have competing hypotheses about the transition probabilities:\n",
    "\n",
    "- $ H_0 $: The chain follows transition matrix $ P^{(0)} $  \n",
    "- $ H_1 $: The chain follows transition matrix $ P^{(1)} $  \n",
    "\n",
    "\n",
    "Given transition matrices:\n",
    "\n",
    "$$\n",
    "P^{(0)} = \\begin{bmatrix}\n",
    "0.7 & 0.2 & 0.1 \\\\\n",
    "0.3 & 0.5 & 0.2 \\\\\n",
    "0.1 & 0.3 & 0.6\n",
    "\\end{bmatrix}, \\quad\n",
    "P^{(1)} = \\begin{bmatrix}\n",
    "0.5 & 0.3 & 0.2 \\\\\n",
    "0.2 & 0.6 & 0.2 \\\\\n",
    "0.2 & 0.2 & 0.6\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "For a sequence of observations $ (x_0, x_1, \\ldots, x_t) $, the likelihood ratio is:\n",
    "\n",
    "$$\n",
    "\\Lambda_t = \\frac{\\pi_{x_0}^{(1)}}{\\pi_{x_0}^{(0)}} \\prod_{s=1}^t \\frac{P_{x_{s-1},x_s}^{(1)}}{P_{x_{s-1},x_s}^{(0)}}\n",
    "$$\n",
    "\n",
    "where $ \\pi^{(i)} $ is the stationary distribution under hypothesis $ i $.\n",
    "\n",
    "Tasks:\n",
    "\n",
    "1. Implement the likelihood ratio computation for Markov chains  \n",
    "1. Implement Wald’s sequential test with Type I error $ \\alpha = 0.05 $ and Type II error $ \\beta = 0.10 $  \n",
    "1. Run 1000 simulations under each hypothesis and compute empirical error rates  \n",
    "1. Analyze the distribution of stopping times  \n",
    "\n",
    "\n",
    "The test stops when:\n",
    "\n",
    "- $ \\Lambda_t \\geq A = \\frac{1-\\beta}{\\alpha} = 18 $: Reject $ H_0 $  \n",
    "- $ \\Lambda_t \\leq B = \\frac{\\beta}{1-\\alpha} = 0.105 $: Accept $ H_0 $  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db7fbbad",
   "metadata": {},
   "source": [
    "## Solution\n",
    "\n",
    "Below is one solution to the exercise.\n",
    "\n",
    "In the lecture, we write the code more verbosely to illustrate the concepts clearly.\n",
    "\n",
    "In the code below, we simplified some of the code structure for a shorter presentation.\n",
    "\n",
    "First we define the parameters for the Markov chain SPRT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7c81921",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "MarkovSPRTParams = namedtuple('MarkovSPRTParams', \n",
    "            ['α', 'β', 'P_0', 'P_1', 'N', 'seed'])\n",
    "\n",
    "def compute_stationary_distribution(P):\n",
    "    \"\"\"Compute stationary distribution of transition matrix P.\"\"\"\n",
    "    eigenvalues, eigenvectors = np.linalg.eig(P.T)\n",
    "    idx = np.argmin(np.abs(eigenvalues - 1))\n",
    "    pi = np.real(eigenvectors[:, idx])\n",
    "    return pi / pi.sum()\n",
    "\n",
    "@njit\n",
    "def simulate_markov_chain(P, pi_0, T, seed):\n",
    "    \"\"\"Simulate a Markov chain path.\"\"\"\n",
    "    np.random.seed(seed)\n",
    "    path = np.zeros(T, dtype=np.int32)\n",
    "    \n",
    "    cumsum_pi = np.cumsum(pi_0)\n",
    "    path[0] = np.searchsorted(cumsum_pi, np.random.uniform())\n",
    "    \n",
    "    for t in range(1, T):\n",
    "        cumsum_row = np.cumsum(P[path[t-1]])\n",
    "        path[t] = np.searchsorted(cumsum_row, np.random.uniform())\n",
    "    \n",
    "    return path"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ebabeeb0",
   "metadata": {},
   "source": [
    "Here we define the function that runs SPRT for Markov chains"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcc99db5",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "@njit\n",
    "def markov_sprt_single_run(P_0, P_1, π_0, π_1, \n",
    "                logA, logB, true_P, true_π, seed):\n",
    "    \"\"\"Run single SPRT for Markov chains.\"\"\"\n",
    "    max_n = 10000\n",
    "    path = simulate_markov_chain(true_P, true_π, max_n, seed)\n",
    "    \n",
    "    log_L = np.log(π_1[path[0]] / π_0[path[0]])\n",
    "    if log_L >= logA: return 1, False\n",
    "    if log_L <= logB: return 1, True\n",
    "    \n",
    "    for t in range(1, max_n):\n",
    "        prev_state, curr_state = path[t-1], path[t]\n",
    "        p_1, p_0 = P_1[prev_state, curr_state], P_0[prev_state, curr_state]\n",
    "        \n",
    "        if p_0 > 0:\n",
    "            log_L += np.log(p_1 / p_0)\n",
    "        elif p_1 > 0:\n",
    "            log_L = np.inf\n",
    "            \n",
    "        if log_L >= logA: return t+1, False\n",
    "        if log_L <= logB: return t+1, True\n",
    "    \n",
    "    return max_n, log_L < 0\n",
    "\n",
    "def run_markov_sprt(params):\n",
    "    \"\"\"Run SPRT for Markov chains.\"\"\"\n",
    "    π_0 = compute_stationary_distribution(params.P_0)\n",
    "    π_1 = compute_stationary_distribution(params.P_1)\n",
    "    A, B, logA, logB = compute_wald_thresholds(params.α, params.β)\n",
    "    \n",
    "    stopping_times = np.zeros(params.N, dtype=np.int64)\n",
    "    decisions_h0 = np.zeros(params.N, dtype=bool)\n",
    "    truth_h0 = np.zeros(params.N, dtype=bool)\n",
    "    \n",
    "    for i in range(params.N):\n",
    "        true_P, true_π = (params.P_0, π_0) if i % 2 == 0 else (params.P_1, π_1)\n",
    "        truth_h0[i] = i % 2 == 0\n",
    "        \n",
    "        n, accept_h0 = markov_sprt_single_run(\n",
    "            params.P_0, params.P_1, π_0, π_1, logA, logB, \n",
    "            true_P, true_π, params.seed + i)\n",
    "        \n",
    "        stopping_times[i] = n\n",
    "        decisions_h0[i] = accept_h0\n",
    "    \n",
    "    type_I = np.sum(truth_h0 & ~decisions_h0) / np.sum(truth_h0)\n",
    "    type_II = np.sum(~truth_h0 & decisions_h0) / np.sum(~truth_h0)\n",
    "    \n",
    "    return {\n",
    "        'stopping_times': stopping_times, 'decisions_h0': decisions_h0,\n",
    "        'truth_h0': truth_h0, 'type_I': type_I, 'type_II': type_II\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55f97762",
   "metadata": {},
   "source": [
    "Now we can run the SPRT for the Markov chain models and visualize the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5569987a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# Run Markov chain SPRT\n",
    "P_0 = np.array([[0.7, 0.2, 0.1], \n",
    "                [0.3, 0.5, 0.2], \n",
    "                [0.1, 0.3, 0.6]])\n",
    "\n",
    "P_1 = np.array([[0.5, 0.3, 0.2], \n",
    "                [0.2, 0.6, 0.2], \n",
    "                [0.2, 0.2, 0.6]])\n",
    "\n",
    "params_markov = MarkovSPRTParams(α=0.05, β=0.10, \n",
    "                        P_0=P_0, P_1=P_1, N=1000, seed=42)\n",
    "results_markov = run_markov_sprt(params_markov)\n",
    "\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))\n",
    "\n",
    "ax1.hist(results_markov['stopping_times'], \n",
    "            bins=50, color=\"steelblue\", alpha=0.8)\n",
    "ax1.set_title(\"stopping times\")\n",
    "ax1.set_xlabel(\"n\")\n",
    "ax1.set_ylabel(\"frequency\")\n",
    "\n",
    "plot_confusion_matrix(results_markov, ax2)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b54173fc",
   "metadata": {},
   "source": [
    "## Exercise 26.2\n",
    "\n",
    "In this exercise, apply Wald’s sequential test to distinguish between two VAR(1) models with different dynamics and noise structures.\n",
    "\n",
    "For a review of the likelihood ratio process with VAR models, see [Likelihood Processes For VAR Models](https://python.quantecon.org/likelihood_var.html).\n",
    "\n",
    "Given VAR models under each hypothesis:\n",
    "\n",
    "- $ H_0 $: $ x_{t+1} = A^{(0)} x_t + C^{(0)} w_{t+1} $  \n",
    "- $ H_1 $: $ x_{t+1} = A^{(1)} x_t + C^{(1)} w_{t+1} $  \n",
    "\n",
    "\n",
    "where $ w_t \\sim \\mathcal{N}(0, I) $ and:\n",
    "\n",
    "$$\n",
    "A^{(0)} = \\begin{bmatrix} 0.8 & 0.1 \\\\ 0.2 & 0.7 \\end{bmatrix}, \\quad\n",
    "C^{(0)} = \\begin{bmatrix} 0.3 & 0.1 \\\\ 0.1 & 0.3 \\end{bmatrix}\n",
    "$$\n",
    "\n",
    "$$\n",
    "A^{(1)} = \\begin{bmatrix} 0.6 & 0.2 \\\\ 0.3 & 0.5 \\end{bmatrix}, \\quad\n",
    "C^{(1)} = \\begin{bmatrix} 0.4 & 0 \\\\ 0 & 0.4 \\end{bmatrix}\n",
    "$$\n",
    "\n",
    "Tasks:\n",
    "\n",
    "1. Implement the VAR likelihood ratio using the functions from the VAR lecture  \n",
    "1. Implement Wald’s sequential test with $ \\alpha = 0.05 $ and $ \\beta = 0.10 $  \n",
    "1. Analyze performance under both hypotheses and with model misspecification  \n",
    "1. Compare with the Markov chain case in terms of stopping times and accuracy  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dbf35a2",
   "metadata": {},
   "source": [
    "## Solution\n",
    "\n",
    "Below is one solution to the exercise.\n",
    "\n",
    "First we define the parameters for the VAR models and simulator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91806f92",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "VARSPRTParams = namedtuple('VARSPRTParams', \n",
    "            ['α', 'β', 'A_0', 'C_0', 'A_1', 'C_1', 'N', 'seed'])\n",
    "\n",
    "def create_var_model(A, C):\n",
    "    \"\"\"Create VAR model.\"\"\"\n",
    "    μ_0 = np.zeros(A.shape[0])\n",
    "    CC = C @ C.T\n",
    "    Σ_0 = sp.linalg.solve_discrete_lyapunov(A, CC)\n",
    "    \n",
    "    CC_inv = np.linalg.inv(CC + 1e-10 * np.eye(CC.shape[0]))\n",
    "    Σ_0_inv = np.linalg.inv(Σ_0 + 1e-10 * np.eye(Σ_0.shape[0]))\n",
    "    \n",
    "    return {\n",
    "        'A': A, 'C': C, 'μ_0': μ_0, 'Σ_0': Σ_0,\n",
    "        'CC_inv': CC_inv, 'Σ_0_inv': Σ_0_inv,\n",
    "        'log_det_CC': np.log(\n",
    "            np.linalg.det(CC + 1e-10 * np.eye(CC.shape[0]))),\n",
    "        'log_det_Σ_0': np.log(\n",
    "            np.linalg.det(Σ_0 + 1e-10 * np.eye(Σ_0.shape[0])))\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3442313",
   "metadata": {},
   "source": [
    "Now we define the likelihood ratio for the VAR models and the SPRT function similar to the\n",
    "Markov chain case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0262d3d5",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def var_log_likelihood(x_curr, x_prev, model, initial=False):\n",
    "    \"\"\"Compute VAR log-likelihood.\"\"\"\n",
    "    n = len(x_curr)\n",
    "    if initial:\n",
    "        diff = x_curr - model['μ_0']\n",
    "        return -0.5 * (n * np.log(2 * np.pi) + model['log_det_Σ_0'] + \n",
    "                      diff @ model['Σ_0_inv'] @ diff)\n",
    "    else:\n",
    "        diff = x_curr - model['A'] @ x_prev\n",
    "        return -0.5 * (n * np.log(2 * np.pi) + model['log_det_CC'] + \n",
    "                      diff @ model['CC_inv'] @ diff)\n",
    "\n",
    "def var_sprt_single_run(model_0, model_1, model_true, \n",
    "                        logA, logB, seed):\n",
    "    \"\"\"Single VAR SPRT run.\"\"\"\n",
    "    np.random.seed(seed)\n",
    "    max_T = 500\n",
    "    \n",
    "    # Generate VAR path\n",
    "    Σ_chol = np.linalg.cholesky(model_true['Σ_0'])\n",
    "    x = model_true['μ_0'] + Σ_chol @ np.random.randn(\n",
    "                len(model_true['μ_0']))\n",
    "    \n",
    "    # Initial likelihood ratio\n",
    "    log_L = (var_log_likelihood(x, None, model_1, True) - \n",
    "             var_log_likelihood(x, None, model_0, True))\n",
    "    \n",
    "    if log_L >= logA: return 1, False\n",
    "    if log_L <= logB: return 1, True\n",
    "    \n",
    "    # Sequential updates\n",
    "    for t in range(1, max_T):\n",
    "        x_prev = x.copy()\n",
    "        w = np.random.randn(model_true['C'].shape[1])\n",
    "        x = model_true['A'] @ x + model_true['C'] @ w\n",
    "        \n",
    "        log_L += (var_log_likelihood(x, x_prev, model_1) - \n",
    "                 var_log_likelihood(x, x_prev, model_0))\n",
    "        \n",
    "        if log_L >= logA: return t+1, False\n",
    "        if log_L <= logB: return t+1, True\n",
    "    \n",
    "    return max_T, log_L < 0\n",
    "\n",
    "def run_var_sprt(params):\n",
    "    \"\"\"Run VAR SPRT.\"\"\"\n",
    "\n",
    "    model_0 = create_var_model(params.A_0, params.C_0)\n",
    "    model_1 = create_var_model(params.A_1, params.C_1)\n",
    "    A, B, logA, logB = compute_wald_thresholds(params.α, params.β)\n",
    "    \n",
    "    stopping_times = np.zeros(params.N)\n",
    "    decisions_h0 = np.zeros(params.N, dtype=bool)\n",
    "    truth_h0 = np.zeros(params.N, dtype=bool)\n",
    "    \n",
    "    for i in range(params.N):\n",
    "        model_true = model_0 if i % 2 == 0 else model_1\n",
    "        truth_h0[i] = i % 2 == 0\n",
    "        \n",
    "        n, accept_h0 = var_sprt_single_run(model_0, model_1, model_true, \n",
    "                                          logA, logB, params.seed + i)\n",
    "        stopping_times[i] = n\n",
    "        decisions_h0[i] = accept_h0\n",
    "    \n",
    "    type_I = np.sum(truth_h0 & ~decisions_h0) / np.sum(truth_h0)\n",
    "    type_II = np.sum(~truth_h0 & decisions_h0) / np.sum(~truth_h0)\n",
    "    \n",
    "    return {'stopping_times': stopping_times, \n",
    "            'decisions_h0': decisions_h0,\n",
    "            'truth_h0': truth_h0, \n",
    "            'type_I': type_I, 'type_II': type_II}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d06ca04",
   "metadata": {},
   "source": [
    "Let’s run SPRT and visualize the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43d6cfeb",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# Run VAR SPRT\n",
    "A_0 = np.array([[0.8, 0.1], \n",
    "                [0.2, 0.7]])\n",
    "C_0 = np.array([[0.3, 0.1], \n",
    "                [0.1, 0.3]])\n",
    "A_1 = np.array([[0.6, 0.2], \n",
    "                [0.3, 0.5]])\n",
    "C_1 = np.array([[0.4, 0.0], \n",
    "                [0.0, 0.4]])\n",
    "\n",
    "params_var = VARSPRTParams(α=0.05, β=0.10, \n",
    "                A_0=A_0, C_0=C_0, A_1=A_1, C_1=C_1, \n",
    "                N=1000, seed=42)\n",
    "results_var = run_var_sprt(params_var)\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))\n",
    "\n",
    "ax1.boxplot([results_markov['stopping_times'], \n",
    "             results_var['stopping_times']], \n",
    "           tick_labels=['Markov Chain', 'VAR(1)'])\n",
    "ax1.set_ylabel('stopping time')\n",
    "\n",
    "x = np.arange(2)\n",
    "ax2.bar(x - 0.2, [results_markov['type_I'], results_var['type_I']], \n",
    "        0.4, label='Type I', alpha=0.7)\n",
    "ax2.bar(x + 0.2, [results_markov['type_II'], results_var['type_II']], \n",
    "        0.4, label='Type II', alpha=0.7)\n",
    "ax2.axhline(y=0.05, linestyle='--', alpha=0.5, color='C0')\n",
    "ax2.axhline(y=0.10, linestyle='--', alpha=0.5, color='C1')\n",
    "ax2.set_xticks(x), ax2.set_xticklabels(['Markov', 'VAR'])\n",
    "ax2.legend() \n",
    "plt.tight_layout() \n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "date": 1770028426.052852,
  "filename": "wald_friedman.md",
  "kernelspec": {
   "display_name": "Python",
   "language": "python3",
   "name": "python3"
  },
  "title": "A Problem that Stumped Milton Friedman"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}