{ "cells": [ { "cell_type": "markdown", "id": "f69a5e0b", "metadata": {}, "source": [ "$$\n", "\\newcommand{\\argmax}{arg\\,max}\n", "\\newcommand{\\argmin}{arg\\,min}\n", "$$" ] }, { "cell_type": "markdown", "id": "8ad54f1e", "metadata": {}, "source": [ "\n", "\n", "
\n", " \n", " \"QuantEcon\"\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "ac0be0ee", "metadata": {}, "source": [ "# Likelihood Ratio Processes" ] }, { "cell_type": "markdown", "id": "e1e81897", "metadata": {}, "source": [ "## Contents\n", "\n", "- [Likelihood Ratio Processes](#Likelihood-Ratio-Processes) \n", " - [Overview](#Overview) \n", " - [Likelihood Ratio Process](#Likelihood-Ratio-Process) \n", " - [Nature permanently draws from density g](#Nature-permanently-draws-from-density-g) \n", " - [Peculiar property](#Peculiar-property) \n", " - [Nature permanently draws from density f](#Nature-permanently-draws-from-density-f) \n", " - [Likelihood ratio test](#Likelihood-ratio-test) \n", " - [Hypothesis testing and classification](#Hypothesis-testing-and-classification) \n", " - [Markov chains](#Markov-chains) \n", " - [Related lectures](#Related-lectures) \n", " - [Exercises](#Exercises) " ] }, { "cell_type": "markdown", "id": "09e63db6", "metadata": {}, "source": [ "In addition to what’s in Anaconda, this lecture will need the following libraries:" ] }, { "cell_type": "code", "execution_count": null, "id": "ce2e1a8c", "metadata": { "hide-output": false }, "outputs": [], "source": [ "!pip install --upgrade quantecon" ] }, { "cell_type": "markdown", "id": "bbaf453b", "metadata": {}, "source": [ "## Overview\n", "\n", "This lecture describes likelihood ratio processes and some of their uses.\n", "\n", "We’ll study the same setting that is also used in [this lecture on exchangeability](https://python.quantecon.org/exchangeable.html).\n", "\n", "Among the things that we’ll learn are\n", "\n", "- How a likelihood ratio process is a key ingredient in frequentist hypothesis testing \n", "- How a **receiver operator characteristic curve** summarizes information about a false alarm probability and power in frequentist hypothesis testing \n", "- How a statistician can combine frequentist probabilities of type I and type II errors to form posterior probabilities of mistakes in a model selection or in an individual-classification problem \n", "- How to use a Kullback-Leibler divergence to quantify the difference between two probability distributions with the same support \n", "- How during World War II the United States Navy devised a decision rule for doing quality control on lots of ammunition, a topic that sets the stage for [this lecture](https://python.quantecon.org/wald_friedman.html) \n", "- A peculiar property of likelihood ratio processes \n", "\n", "\n", "Let’s start by importing some Python tools." ] }, { "cell_type": "code", "execution_count": null, "id": "0b7722c0", "metadata": { "hide-output": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from numba import vectorize, jit\n", "from math import gamma\n", "from scipy.integrate import quad\n", "from scipy.optimize import brentq, minimize_scalar\n", "from scipy.stats import beta as beta_dist\n", "import pandas as pd\n", "from IPython.display import display, Math\n", "import quantecon as qe" ] }, { "cell_type": "markdown", "id": "07fe67ef", "metadata": {}, "source": [ "## Likelihood Ratio Process\n", "\n", "A nonnegative random variable $ W $ has one of two probability density functions, either\n", "$ f $ or $ g $.\n", "\n", "Before the beginning of time, nature once and for all decides whether she will draw a sequence of IID draws from either\n", "$ f $ or $ g $.\n", "\n", "We will sometimes let $ q $ be the density that nature chose once and for all, so\n", "that $ q $ is either $ f $ or $ g $, permanently.\n", "\n", "Nature knows which density it permanently draws from, but we the observers do not.\n", "\n", "We know both $ f $ and $ g $ but we don’t know which density nature\n", "chose.\n", "\n", "But we want to know.\n", "\n", "To do that, we use observations.\n", "\n", "We observe a sequence $ \\{w_t\\}_{t=1}^T $ of $ T $ IID draws that we know came from either $ f $ or $ g $.\n", "\n", "We want to use these observations to infer whether nature chose $ f $ or $ g $.\n", "\n", "A **likelihood ratio process** is a useful tool for this task.\n", "\n", "To begin, we define a key component of a likelihood ratio process, namely, the time $ t $ likelihood ratio as the random variable\n", "\n", "$$\n", "\\ell (w_t)=\\frac{f\\left(w_t\\right)}{g\\left(w_t\\right)},\\quad t\\geq1.\n", "$$\n", "\n", "We assume that $ f $ and $ g $ both put positive probabilities on the\n", "same intervals of possible realizations of the random variable $ W $.\n", "\n", "That means that under the $ g $ density, $ \\ell (w_t)=\n", "\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)} $\n", "is a nonnegative random variable with mean $ 1 $.\n", "\n", "A **likelihood ratio process** for sequence\n", "$ \\left\\{ w_{t}\\right\\} _{t=1}^{\\infty} $ is defined as\n", "\n", "$$\n", "L\\left(w^{t}\\right)=\\prod_{i=1}^{t} \\ell (w_i),\n", "$$\n", "\n", "where $ w^t=\\{ w_1,\\dots,w_t\\} $ is a history of\n", "observations up to and including time $ t $.\n", "\n", "Sometimes for shorthand we’ll write $ L_t = L(w^t) $.\n", "\n", "Notice that the likelihood process satisfies the *recursion*\n", "\n", "$$\n", "L(w^t) = \\ell (w_t) L (w^{t-1}) .\n", "$$\n", "\n", "The likelihood ratio and its logarithm are key tools for making\n", "inferences using a classic frequentist approach due to Neyman and\n", "Pearson [[Neyman and Pearson, 1933](https://python.quantecon.org/zreferences.html#id269)].\n", "\n", "To help us appreciate how things work, the following Python code evaluates $ f $ and $ g $ as two different\n", "Beta distributions, then computes and simulates an associated likelihood\n", "ratio process by generating a sequence $ w^t $ from one of the two\n", "probability distributions, for example, a sequence of IID draws from $ g $." ] }, { "cell_type": "code", "execution_count": null, "id": "c6302d72", "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Parameters for the two Beta distributions\n", "F_a, F_b = 1, 1\n", "G_a, G_b = 3, 1.2\n", "\n", "@vectorize\n", "def p(x, a, b):\n", " \"\"\"Beta distribution density function.\"\"\"\n", " r = gamma(a + b) / (gamma(a) * gamma(b))\n", " return r * x** (a-1) * (1 - x) ** (b-1)\n", "\n", "f = jit(lambda x: p(x, F_a, F_b))\n", "g = jit(lambda x: p(x, G_a, G_b))\n", "\n", "def create_beta_density(a, b):\n", " \"\"\"Create a beta density function with specified parameters.\"\"\"\n", " return jit(lambda x: p(x, a, b))\n", "\n", "def likelihood_ratio(w, f_func, g_func):\n", " \"\"\"Compute likelihood ratio for observation(s) w.\"\"\"\n", " return f_func(w) / g_func(w)\n", "\n", "@jit\n", "def simulate_likelihood_ratios(a, b, f_func, g_func, T=50, N=500):\n", " \"\"\"\n", " Generate N sets of T observations of the likelihood ratio.\n", " \"\"\"\n", " l_arr = np.empty((N, T))\n", " for i in range(N):\n", " for j in range(T):\n", " w = np.random.beta(a, b)\n", " l_arr[i, j] = f_func(w) / g_func(w)\n", " return l_arr\n", "\n", "def simulate_sequences(distribution, f_func, g_func, \n", " F_params=(1, 1), G_params=(3, 1.2), T=50, N=500):\n", " \"\"\"\n", " Generate N sequences of T observations from specified distribution.\n", " \"\"\"\n", " if distribution == 'f':\n", " a, b = F_params\n", " elif distribution == 'g':\n", " a, b = G_params\n", " else:\n", " raise ValueError(\"distribution must be 'f' or 'g'\")\n", " \n", " l_arr = simulate_likelihood_ratios(a, b, f_func, g_func, T, N)\n", " l_seq = np.cumprod(l_arr, axis=1)\n", " return l_arr, l_seq\n", "\n", "def plot_likelihood_paths(l_seq, title=\"Likelihood ratio paths\", \n", " ylim=None, n_paths=None):\n", " \"\"\"Plot likelihood ratio paths.\"\"\"\n", " N, T = l_seq.shape\n", " n_show = n_paths or min(N, 100)\n", " \n", " plt.figure(figsize=(10, 6))\n", " for i in range(n_show):\n", " plt.plot(range(T), l_seq[i, :], color='b', lw=0.8, alpha=0.5)\n", " \n", " if ylim:\n", " plt.ylim(ylim)\n", " plt.title(title)\n", " plt.xlabel('t')\n", " plt.ylabel('$L(w^t)$')\n", " plt.show()" ] }, { "cell_type": "markdown", "id": "3bd2f322", "metadata": {}, "source": [ "\n", "" ] }, { "cell_type": "markdown", "id": "be899f6e", "metadata": {}, "source": [ "## Nature permanently draws from density g\n", "\n", "We first simulate the likelihood ratio process when nature permanently\n", "draws from $ g $." ] }, { "cell_type": "code", "execution_count": null, "id": "8eca6618", "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Simulate when nature draws from g\n", "l_arr_g, l_seq_g = simulate_sequences('g', f, g, (F_a, F_b), (G_a, G_b))\n", "plot_likelihood_paths(l_seq_g, \n", " title=\"$L(w^{t})$ paths when nature draws from g\",\n", " ylim=[0, 3])" ] }, { "cell_type": "markdown", "id": "14c63956", "metadata": {}, "source": [ "Evidently, as sample length $ T $ grows, most probability mass\n", "shifts toward zero\n", "\n", "To see this more clearly, we plot over time the fraction of\n", "paths $ L\\left(w^{t}\\right) $ that fall in the interval\n", "$ \\left[0, 0.01\\right] $." ] }, { "cell_type": "code", "execution_count": null, "id": "0c24399a", "metadata": { "hide-output": false }, "outputs": [], "source": [ "N, T = l_arr_g.shape\n", "plt.plot(range(T), np.sum(l_seq_g <= 0.01, axis=0) / N)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "68e1f9fb", "metadata": {}, "source": [ "Despite the evident convergence of most probability mass to a\n", "very small interval near $ 0 $, the unconditional mean of\n", "$ L\\left(w^t\\right) $ under probability density $ g $ is\n", "identically $ 1 $ for all $ t $.\n", "\n", "To verify this assertion, first notice that as mentioned earlier the unconditional mean\n", "$ E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] $ is $ 1 $ for\n", "all $ t $:\n", "\n", "$$\n", "\\begin{aligned}\n", "E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n", " &=\\int f\\left(w_{t}\\right)dw_{t} \\\\\n", " &=1,\n", "\\end{aligned}\n", "$$\n", "\n", "which immediately implies\n", "\n", "$$\n", "\\begin{aligned}\n", "E\\left[L\\left(w^{1}\\right)\\bigm|q=g\\right] &=E\\left[\\ell \\left(w_{1}\\right)\\bigm|q=g\\right]\\\\\n", " &=1.\\\\\n", "\\end{aligned}\n", "$$\n", "\n", "Because $ L(w^t) = \\ell(w_t) L(w^{t-1}) $ and\n", "$ \\{w_t\\}_{t=1}^t $ is an IID sequence, we have\n", "\n", "$$\n", "\\begin{aligned}\n", "E\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right] &=E\\left[L\\left(w^{t-1}\\right)\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] \\\\\n", " &=E\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g,w^{t-1}\\right]\\bigm|q=g\\right] \\\\\n", " &=E\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right]\\bigm|q=g\\right] \\\\\n", " &=E\\left[L\\left(w^{t-1}\\right)\\bigm|q=g\\right] \\\\\n", "\\end{aligned}\n", "$$\n", "\n", "for any $ t \\geq 1 $.\n", "\n", "Mathematical induction implies\n", "$ E\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1 $ for all\n", "$ t \\geq 1 $." ] }, { "cell_type": "markdown", "id": "8583654b", "metadata": {}, "source": [ "## Peculiar property\n", "\n", "How can $ E\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1 $ possibly be true when most probability mass of the likelihood\n", "ratio process is piling up near $ 0 $ as\n", "$ t \\rightarrow + \\infty $?\n", "\n", "The answer is that as $ t \\rightarrow + \\infty $, the\n", "distribution of $ L_t $ becomes more and more fat-tailed:\n", "enough mass shifts to larger and larger values of $ L_t $ to make\n", "the mean of $ L_t $ continue to be one despite most of the probability mass piling up\n", "near $ 0 $.\n", "\n", "To illustrate this peculiar property, we simulate many paths and\n", "calculate the unconditional mean of $ L\\left(w^t\\right) $ by\n", "averaging across these many paths at each $ t $." ] }, { "cell_type": "code", "execution_count": null, "id": "954ad086", "metadata": { "hide-output": false }, "outputs": [], "source": [ "l_arr_g, l_seq_g = simulate_sequences('g', \n", " f, g, (F_a, F_b), (G_a, G_b), N=50000)" ] }, { "cell_type": "markdown", "id": "7f5fa52b", "metadata": {}, "source": [ "It would be useful to use simulations to verify that unconditional means\n", "$ E\\left[L\\left(w^{t}\\right)\\right] $ equal unity by averaging across sample\n", "paths.\n", "\n", "But it would be too computer-time-consuming for us to do that here simply by applying a standard Monte Carlo simulation approach.\n", "\n", "The reason is that the distribution of $ L\\left(w^{t}\\right) $ is extremely skewed for large values of $ t $.\n", "\n", "Because the probability density in the right tail is close to $ 0 $, it just takes too much computer time to sample enough points from the right tail.\n", "\n", "We explain the problem in more detail in [this lecture](https://python.quantecon.org/imp_sample.html).\n", "\n", "There we describe an alternative way to compute the mean of a likelihood ratio by computing the mean of a *different* random variable by sampling from a *different* probability distribution." ] }, { "cell_type": "markdown", "id": "457287ec", "metadata": {}, "source": [ "## Nature permanently draws from density f\n", "\n", "Now suppose that before time $ 0 $ nature permanently decided to draw repeatedly from density $ f $.\n", "\n", "While the mean of the likelihood ratio $ \\ell \\left(w_{t}\\right) $ under density\n", "$ g $ is $ 1 $, its mean under the density $ f $ exceeds one.\n", "\n", "To see this, we compute\n", "\n", "$$\n", "\\begin{aligned}\n", "E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=f\\right] &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}f\\left(w_{t}\\right)dw_{t} \\\\\n", " &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n", " &=\\int \\ell \\left(w_{t}\\right)^{2}g\\left(w_{t}\\right)dw_{t} \\\\\n", " &=E\\left[\\ell \\left(w_{t}\\right)^{2}\\mid q=g\\right] \\\\\n", " &=E\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2}+Var\\left(\\ell \\left(w_{t}\\right)\\mid q=g\\right) \\\\\n", " &>E\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2} = 1 \\\\\n", " \\end{aligned}\n", "$$\n", "\n", "This in turn implies that the unconditional mean of the likelihood ratio process $ L(w^t) $\n", "diverges toward $ + \\infty $.\n", "\n", "Simulations below confirm this conclusion.\n", "\n", "Please note the scale of the $ y $ axis." ] }, { "cell_type": "code", "execution_count": null, "id": "a06b8c1a", "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Simulate when nature draws from f\n", "l_arr_f, l_seq_f = simulate_sequences('f', f, g, \n", " (F_a, F_b), (G_a, G_b), N=50000)" ] }, { "cell_type": "code", "execution_count": null, "id": "e3c5f315", "metadata": { "hide-output": false }, "outputs": [], "source": [ "N, T = l_arr_f.shape\n", "plt.plot(range(T), np.mean(l_seq_f, axis=0))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "1e73380b", "metadata": {}, "source": [ "We also plot the probability that $ L\\left(w^t\\right) $ falls into\n", "the interval $ [10000, \\infty) $ as a function of time and watch how\n", "fast probability mass diverges to $ +\\infty $." ] }, { "cell_type": "code", "execution_count": null, "id": "6bc8471f", "metadata": { "hide-output": false }, "outputs": [], "source": [ "plt.plot(range(T), np.sum(l_seq_f > 10000, axis=0) / N)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "5cc9a3b5", "metadata": {}, "source": [ "## Likelihood ratio test\n", "\n", "We now describe how to employ the machinery\n", "of Neyman and Pearson [[Neyman and Pearson, 1933](https://python.quantecon.org/zreferences.html#id269)] to test the hypothesis that history $ w^t $ is generated by repeated\n", "IID draws from density $ f $.\n", "\n", "Denote $ q $ as the data generating process, so that\n", "$ q=f \\text{ or } g $.\n", "\n", "Upon observing a sample $ \\{W_i\\}_{i=1}^t $, we want to decide\n", "whether nature is drawing from $ g $ or from $ f $ by performing a (frequentist)\n", "hypothesis test.\n", "\n", "We specify\n", "\n", "- Null hypothesis $ H_0 $: $ q=f $, \n", "- Alternative hypothesis $ H_1 $: $ q=g $. \n", "\n", "\n", "Neyman and Pearson proved that the best way to test this hypothesis is to use a **likelihood ratio test** that takes the\n", "form:\n", "\n", "- accept $ H_0 $ if $ L(W^t) > c $, \n", "- reject $ H_0 $ if $ L(W^t) < c $, \n", "\n", "\n", "where $ c $ is a given discrimination threshold.\n", "\n", "Setting $ c =1 $ is a common choice.\n", "\n", "We’ll discuss consequences of other choices of $ c $ below.\n", "\n", "This test is *best* in the sense that it is **uniformly most powerful**.\n", "\n", "To understand what this means, we have to define probabilities of two important events that\n", "allow us to characterize a test associated with a given\n", "threshold $ c $.\n", "\n", "The two probabilities are:\n", "\n", "- Probability of a Type I error in which we reject $ H_0 $ when it is true: \n", " $$\n", " \\alpha \\equiv \\Pr\\left\\{ L\\left(w^{t}\\right)c\\mid q=g\\right\\}\n", " $$\n", "\n", "\n", "These two probabilities underlie the following two concepts:\n", "\n", "- Probability of false alarm (= significance level = probability of\n", " Type I error): \n", " $$\n", " \\alpha \\equiv \\Pr\\left\\{ L\\left(w^{t}\\right) np.log(c)\n", " axs[nr, nc].fill_between(x[1:][ind], hist[ind], \n", " alpha=0.5, label=label)\n", " \n", " axs[nr, nc].legend()\n", " axs[nr, nc].set_title(f\"t={t}\")\n", " \n", " plt.show()\n", "\n", "plot_log_histograms(l_seq_f, l_seq_g, c=c)" ] }, { "cell_type": "markdown", "id": "c268cc13", "metadata": {}, "source": [ "In the above graphs,\n", "\n", "- the blue areas are related to but not equal to probabilities $ \\alpha $ of a type I error because\n", " they are integrals of $ \\log L_t $, not integrals of $ L_t $, over rejection region $ L_t < 1 $ \n", "- the orange areas are related to but not equal to probabilities $ \\beta $ of a type II error because\n", " they are integrals of $ \\log L_t $, not integrals of $ L_t $, over acceptance region $ L_t > 1 $ \n", "\n", "\n", "When we hold $ c $ fixed at $ c=1 $, the following graph shows that\n", "\n", "- the probability of detection monotonically increases with increases in\n", " $ t $ \n", "- the probability of a false alarm monotonically decreases with increases in $ t $. " ] }, { "cell_type": "code", "execution_count": null, "id": "bff9ff52", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_error_probabilities(l_seq_f, l_seq_g, c=1):\n", " \"\"\"\n", " Compute Type I and Type II error probabilities.\n", " \"\"\"\n", " N, T = l_seq_f.shape\n", " \n", " # Type I error (false alarm) - reject H0 when true\n", " PFA = np.array([np.sum(l_seq_f[:, t] < c) / N for t in range(T)])\n", " \n", " # Type II error - accept H0 when false\n", " beta = np.array([np.sum(l_seq_g[:, t] >= c) / N for t in range(T)])\n", " \n", " # Probability of detection (power)\n", " PD = np.array([np.sum(l_seq_g[:, t] < c) / N for t in range(T)])\n", " \n", " return {\n", " 'alpha': PFA,\n", " 'beta': beta, \n", " 'PD': PD,\n", " 'PFA': PFA\n", " }\n", "\n", "def plot_error_probabilities(error_dict, T, c=1, title_suffix=\"\"):\n", " \"\"\"Plot error probabilities over time.\"\"\"\n", " plt.figure(figsize=(10, 6))\n", " plt.plot(range(T), error_dict['PD'], label=\"Probability of detection\")\n", " plt.plot(range(T), error_dict['PFA'], label=\"Probability of false alarm\")\n", " plt.xlabel(\"t\")\n", " plt.ylabel(\"Probability\")\n", " plt.title(f\"Error Probabilities (c={c}){title_suffix}\")\n", " plt.legend()\n", " plt.show()\n", "\n", "error_probs = compute_error_probabilities(l_seq_f, l_seq_g, c=c)\n", "N, T = l_seq_f.shape\n", "plot_error_probabilities(error_probs, T, c)" ] }, { "cell_type": "markdown", "id": "ea1d8340", "metadata": {}, "source": [ "For a given sample size $ t $, the threshold $ c $ uniquely pins down probabilities\n", "of both types of error.\n", "\n", "If for a fixed $ t $ we now free up and move $ c $, we will sweep out the probability\n", "of detection as a function of the probability of false alarm.\n", "\n", "This produces a [receiver operating characteristic\n", "curve (ROC curve)](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).\n", "\n", "Below, we plot receiver operating characteristic curves for different\n", "sample sizes $ t $." ] }, { "cell_type": "code", "execution_count": null, "id": "80f073e5", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def plot_roc_curves(l_seq_f, l_seq_g, t_values=[1, 5, 9, 13], N=None):\n", " \"\"\"Plot ROC curves for different sample sizes.\"\"\"\n", " if N is None:\n", " N = l_seq_f.shape[0]\n", " \n", " PFA = np.arange(0, 100, 1)\n", " \n", " plt.figure(figsize=(10, 6))\n", " for t in t_values:\n", " percentile = np.percentile(l_seq_f[:, t], PFA)\n", " PD = [np.sum(l_seq_g[:, t] < p) / N for p in percentile]\n", " plt.plot(PFA / 100, PD, label=f\"t={t}\")\n", " \n", " plt.scatter(0, 1, label=\"perfect detection\")\n", " plt.plot([0, 1], [0, 1], color='k', ls='--', label=\"random detection\")\n", " \n", " plt.arrow(0.5, 0.5, -0.15, 0.15, head_width=0.03)\n", " plt.text(0.35, 0.7, \"better\")\n", " plt.xlabel(\"Probability of false alarm\")\n", " plt.ylabel(\"Probability of detection\")\n", " plt.legend()\n", " plt.title(\"ROC Curve\")\n", " plt.show()\n", "\n", "\n", "plot_roc_curves(l_seq_f, l_seq_g, t_values=range(1, 15, 4), N=N)" ] }, { "cell_type": "markdown", "id": "a8ed50fd", "metadata": {}, "source": [ "Notice that as $ t $ increases, we are assured a larger probability\n", "of detection and a smaller probability of false alarm associated with\n", "a given discrimination threshold $ c $.\n", "\n", "For a given sample size $ t $, both $ \\alpha $ and $ \\beta $ change as we vary $ c $.\n", "\n", "As we increase $ c $\n", "\n", "- $ \\alpha \\equiv \\Pr\\left\\{ L\\left(w^{t}\\right)c\\mid q=g\\right\\} $ decreases \n", "\n", "\n", "As $ t \\rightarrow + \\infty $, we approach the perfect detection\n", "curve that is indicated by a right angle hinging on the blue dot.\n", "\n", "For a given sample size $ t $, the discrimination threshold $ c $ determines a point on the receiver operating\n", "characteristic curve.\n", "\n", "It is up to the test designer to trade off probabilities of\n", "making the two types of errors.\n", "\n", "But we know how to choose the smallest sample size to achieve given targets for\n", "the probabilities.\n", "\n", "Typically, frequentists aim for a high probability of detection that\n", "respects an upper bound on the probability of false alarm.\n", "\n", "Below we show an example in which we fix the probability of false alarm at\n", "$ 0.05 $.\n", "\n", "The required sample size for making a decision is then determined by a\n", "target probability of detection, for example, $ 0.9 $, as depicted in the following graph." ] }, { "cell_type": "code", "execution_count": null, "id": "372ccbfa", "metadata": { "hide-output": false }, "outputs": [], "source": [ "PFA = 0.05\n", "PD = np.empty(T)\n", "\n", "for t in range(T):\n", "\n", " c = np.percentile(l_seq_f[:, t], PFA * 100)\n", " PD[t] = np.sum(l_seq_g[:, t] < c) / N\n", "\n", "plt.plot(range(T), PD)\n", "plt.axhline(0.9, color=\"k\", ls=\"--\")\n", "\n", "plt.xlabel(\"t\")\n", "plt.ylabel(\"Probability of detection\")\n", "plt.title(f\"Probability of false alarm={PFA}\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "ebed11f3", "metadata": {}, "source": [ "The United States Navy evidently used a procedure like this to select a sample size $ t $ for doing quality\n", "control tests during World War II.\n", "\n", "A Navy Captain who had been ordered to perform tests of this kind had doubts about it that he\n", "presented to Milton Friedman, as we describe in [this lecture](https://python.quantecon.org/wald_friedman.html).\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "id": "9cb2b84f", "metadata": {}, "source": [ "### A third distribution $ h $\n", "\n", "Now let’s consider a case in which neither $ g $ nor $ f $\n", "generates the data.\n", "\n", "Instead, a third distribution $ h $ does.\n", "\n", "Let’s study how accumulated likelihood ratios $ L $ behave\n", "when $ h $ governs the data.\n", "\n", "A key tool here is called **Kullback–Leibler divergence** we studied in [Statistical Divergence Measures](https://python.quantecon.org/divergence_measures.html).\n", "\n", "In our application, we want to measure how much $ f $ or $ g $\n", "diverges from $ h $\n", "\n", "Two Kullback–Leibler divergences pertinent for us are $ K_f $\n", "and $ K_g $ defined as\n", "\n", "$$\n", "\\begin{aligned}\n", "K_{f} = D_{KL}\\bigl(h\\|f\\bigr) = KL(h, f)\n", " &= E_{h}\\left[\\log\\frac{h(w)}{f(w)}\\right] \\\\\n", " &= \\int \\log\\left(\\frac{h(w)}{f(w)}\\right)h(w)dw .\n", "\\end{aligned}\n", "$$\n", "\n", "$$\n", "\\begin{aligned}\n", "K_{g} = D_{KL}\\bigl(h\\|g\\bigr) = KL(h, g)\n", " &= E_{h}\\left[\\log\\frac{h(w)}{g(w)}\\right] \\\\\n", " &= \\int \\log\\left(\\frac{h(w)}{g(w)}\\right)h(w)dw .\n", "\\end{aligned}\n", "$$\n", "\n", "Let’s compute the Kullback–Leibler discrepancies using the same code in [Statistical Divergence Measures](https://python.quantecon.org/divergence_measures.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "0c4c4507", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_KL(f, g):\n", " \"\"\"\n", " Compute KL divergence KL(f, g)\n", " \"\"\"\n", " integrand = lambda w: f(w) * np.log(f(w) / g(w))\n", " val, _ = quad(integrand, 1e-5, 1-1e-5)\n", " return val\n", "\n", "def compute_KL_h(h, f, g):\n", " \"\"\"\n", " Compute KL divergences with respect to reference distribution h\n", " \"\"\"\n", " Kf = compute_KL(h, f)\n", " Kg = compute_KL(h, g)\n", " return Kf, Kg" ] }, { "cell_type": "markdown", "id": "8acb9cbe", "metadata": {}, "source": [ "\n", "" ] }, { "cell_type": "markdown", "id": "57442f46", "metadata": {}, "source": [ "### A helpful formula\n", "\n", "There is a mathematical relationship between likelihood ratios and KL divergence.\n", "\n", "When data is generated by distribution $ h $, the expected log likelihood ratio is:\n", "\n", "\n", "\n", "$$\n", "\\frac{1}{t} E_{h}\\!\\bigl[\\log L_t\\bigr] = K_g - K_f \\tag{22.1}\n", "$$\n", "\n", "where $ L_t=\\prod_{j=1}^{t}\\frac{f(w_j)}{g(w_j)} $ is the likelihood ratio process.\n", "\n", "Equation [(22.1)](#equation-eq-kl-likelihood-link) tells us that:\n", "\n", "- When $ K_g < K_f $ (i.e., $ g $ is closer to $ h $ than $ f $ is), the expected log likelihood ratio is negative, so $ L\\left(w^t\\right) \\rightarrow 0 $. \n", "- When $ K_g > K_f $ (i.e., $ f $ is closer to $ h $ than $ g $ is), the expected log likelihood ratio is positive, so $ L\\left(w^t\\right) \\rightarrow + \\infty $. \n", "\n", "\n", "Let’s verify this using simulation.\n", "\n", "In the simulation, we generate multiple paths using Beta distributions $ f $, $ g $, and $ h $, and compute the paths of $ \\log(L(w^t)) $.\n", "\n", "First, we write a function to compute the likelihood ratio process" ] }, { "cell_type": "code", "execution_count": null, "id": "c8bc89e3", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_likelihood_ratios(sequences, f, g):\n", " \"\"\"Compute likelihood ratios and cumulative products.\"\"\"\n", " l_ratios = f(sequences) / g(sequences)\n", " L_cumulative = np.cumprod(l_ratios, axis=1)\n", " return l_ratios, L_cumulative" ] }, { "cell_type": "markdown", "id": "863642a1", "metadata": {}, "source": [ "We consider three cases: (1) $ h $ is closer to $ f $, (2) $ f $ and $ g $ are approximately equidistant from $ h $, and (3) $ h $ is closer to $ g $." ] }, { "cell_type": "code", "execution_count": null, "id": "23ab207f", "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Define test scenarios\n", "scenarios = [\n", " {\n", " \"name\": \"KL(h,g) > KL(h,f)\",\n", " \"h_params\": (1.2, 1.1),\n", " \"expected\": r\"$L_t \\to \\infty$\"\n", " },\n", " {\n", " \"name\": \"KL(h,g) ≈ KL(h,f)\",\n", " \"h_params\": (2, 1.35),\n", " \"expected\": \"$L_t$ fluctuates\"\n", " },\n", " {\n", " \"name\": \"KL(h,g) < KL(h,f)\", \n", " \"h_params\": (3.5, 1.5),\n", " \"expected\": r\"$L_t \\to 0$\"\n", " }\n", "]\n", "\n", "fig, axes = plt.subplots(2, 3, figsize=(15, 12))\n", "\n", "for i, scenario in enumerate(scenarios):\n", " # Define h\n", " h = lambda x: p(x, scenario[\"h_params\"][0], \n", " scenario[\"h_params\"][1])\n", " \n", " # Compute KL divergences\n", " Kf, Kg = compute_KL_h(h, f, g)\n", " kl_diff = Kg - Kf\n", " \n", " # Simulate paths\n", " N_paths = 100\n", " T = 150\n", "\n", " # Generate data from h\n", " h_data = np.random.beta(scenario[\"h_params\"][0], \n", " scenario[\"h_params\"][1], (N_paths, T))\n", " l_ratios, l_cumulative = compute_likelihood_ratios(h_data, f, g)\n", " log_l_cumulative = np.log(l_cumulative)\n", " \n", " # Plot distributions\n", " ax = axes[0, i]\n", " x_range = np.linspace(0.001, 0.999, 200)\n", " ax.plot(x_range, [f(x) for x in x_range], \n", " 'b-', label='f', linewidth=2)\n", " ax.plot(x_range, [g(x) for x in x_range], \n", " 'r-', label='g', linewidth=2)\n", " ax.plot(x_range, [h(x) for x in x_range], \n", " 'g--', label='h (data)', linewidth=2)\n", " ax.set_xlabel('w')\n", " ax.set_ylabel('density')\n", " ax.set_title(scenario[\"name\"], fontsize=16)\n", " ax.legend()\n", " \n", " # Plot log likelihood ratio paths\n", " ax = axes[1, i]\n", " for j in range(min(20, N_paths)):\n", " ax.plot(log_l_cumulative[j, :], alpha=0.3, color='purple')\n", " \n", " # Plot theoretical expectation\n", " theory_line = kl_diff * np.arange(1, T+1)\n", " ax.plot(theory_line, 'k--', linewidth=2, label=r'$t \\times (K_g - K_f)$')\n", " \n", " ax.set_xlabel('t')\n", " ax.set_ylabel('$log L_t$')\n", " ax.set_title(f'KL(h,f)={Kf:.3f}, KL(h,g)={Kg:.3f}\\n{scenario[\"expected\"]}', \n", " fontsize=16)\n", " ax.legend(fontsize=16)\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "bac7e77a", "metadata": {}, "source": [ "Note that\n", "\n", "- In the first figure, $ \\log L(w^t) $ diverges to $ \\infty $ because $ K_g > K_f $. \n", "- In the second figure, we still have $ K_g > K_f $, but the difference is smaller, so $ L(w^t) $ diverges to infinity at a slower pace. \n", "- In the last figure, $ \\log L(w^t) $ diverges to $ -\\infty $ because $ K_g < K_f $. \n", "- The black dotted line, $ t \\left(D_{KL}(h\\|g) - D_{KL}(h\\|f)\\right) $, closely fits the paths verifying [(22.1)](#equation-eq-kl-likelihood-link). \n", "\n", "\n", "These observations align with the theory.\n", "\n", "In [Heterogeneous Beliefs and Financial Markets](https://python.quantecon.org/likelihood_ratio_process_2.html), we will see an application of these ideas." ] }, { "cell_type": "markdown", "id": "458281b6", "metadata": {}, "source": [ "## Hypothesis testing and classification\n", "\n", "This section discusses another application of likelihood ratio processes.\n", "\n", "We describe how a statistician can combine frequentist probabilities of type I and type II errors in order to\n", "\n", "- compute an anticipated frequency of selecting a wrong model based on a sample length $ T $ \n", "- compute an anticipated error rate in a classification problem \n", "\n", "\n", "We consider a situation in which nature generates data by mixing known densities $ f $ and $ g $ with known mixing\n", "parameter $ \\pi_{-1} \\in (0,1) $ so that the random variable $ w $ is drawn from the density\n", "\n", "$$\n", "h (w) = \\pi_{-1} f(w) + (1-\\pi_{-1}) g(w)\n", "$$\n", "\n", "We assume that the statistician knows the densities $ f $ and $ g $ and also the mixing parameter $ \\pi_{-1} $.\n", "\n", "Below, we’ll set $ \\pi_{-1} = .5 $, although much of the analysis would follow through with other settings of $ \\pi_{-1} \\in (0,1) $.\n", "\n", "We assume that $ f $ and $ g $ both put positive probabilities on the same intervals of possible realizations of the random variable $ W $.\n", "\n", "In the simulations below, we specify that $ f $ is a $ \\text{Beta}(1, 1) $ distribution and that $ g $ is $ \\text{Beta}(3, 1.2) $ distribution.\n", "\n", "We consider two alternative timing protocols.\n", "\n", "- Timing protocol 1 is for the model selection problem \n", "- Timing protocol 2 is for the individual classification problem \n", "\n", "\n", "**Timing Protocol 1:** Nature flips a coin only **once** at time $ t=-1 $ and with probability $ \\pi_{-1} $ generates a sequence $ \\{w_t\\}_{t=1}^T $\n", "of IID draws from $ f $ and with probability $ 1-\\pi_{-1} $ generates a sequence $ \\{w_t\\}_{t=1}^T $\n", "of IID draws from $ g $.\n", "\n", "**Timing Protocol 2.** Nature flips a coin **often**. At each time $ t \\geq 0 $, nature flips a coin and with probability $ \\pi_{-1} $ draws $ w_t $ from $ f $ and with probability $ 1-\\pi_{-1} $ draws $ w_t $ from $ g $.\n", "\n", "Here is Python code that we’ll use to implement timing protocol 1 and 2" ] }, { "cell_type": "code", "execution_count": null, "id": "6e9b5961", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def protocol_1(π_minus_1, T, N=1000, F_params=(1, 1), G_params=(3, 1.2)):\n", " \"\"\"\n", " Simulate Protocol 1: Nature decides once at t=-1 which model to use.\n", " \"\"\"\n", " F_a, F_b = F_params\n", " G_a, G_b = G_params\n", " \n", " # Single coin flip for the true model\n", " true_models_F = np.random.rand(N) < π_minus_1\n", " sequences = np.empty((N, T))\n", " \n", " n_f = np.sum(true_models_F)\n", " n_g = N - n_f\n", " \n", " if n_f > 0:\n", " sequences[true_models_F, :] = np.random.beta(F_a, F_b, (n_f, T))\n", " if n_g > 0:\n", " sequences[~true_models_F, :] = np.random.beta(G_a, G_b, (n_g, T))\n", " \n", " return sequences, true_models_F\n", "\n", "def protocol_2(π_minus_1, T, N=1000, F_params=(1, 1), G_params=(3, 1.2)):\n", " \"\"\"\n", " Simulate Protocol 2: Nature decides at each time step which model to use.\n", " \"\"\"\n", " F_a, F_b = F_params\n", " G_a, G_b = G_params\n", " \n", " # Coin flips for each time step\n", " true_models_F = np.random.rand(N, T) < π_minus_1\n", " sequences = np.empty((N, T))\n", " \n", " n_f = np.sum(true_models_F)\n", " n_g = N * T - n_f\n", " \n", " if n_f > 0:\n", " sequences[true_models_F] = np.random.beta(F_a, F_b, n_f)\n", " if n_g > 0:\n", " sequences[~true_models_F] = np.random.beta(G_a, G_b, n_g)\n", " \n", " return sequences, true_models_F" ] }, { "cell_type": "markdown", "id": "9fd510a5", "metadata": {}, "source": [ "**Remark:** Under timing protocol 2, the $ \\{w_t\\}_{t=1}^T $ is a sequence of IID draws from $ h(w) $. Under timing protocol 1, the $ \\{w_t\\}_{t=1}^T $ is\n", "not IID. It is **conditionally IID** – meaning that with probability $ \\pi_{-1} $ it is a sequence of IID draws from $ f(w) $ and with probability $ 1-\\pi_{-1} $ it is a sequence of IID draws from $ g(w) $. For more about this, see [this lecture about exchangeability](https://python.quantecon.org/exchangeable.html).\n", "\n", "We again deploy a **likelihood ratio process** with time $ t $ component being the likelihood ratio\n", "\n", "$$\n", "\\ell (w_t)=\\frac{f\\left(w_t\\right)}{g\\left(w_t\\right)},\\quad t\\geq1.\n", "$$\n", "\n", "The **likelihood ratio process** for sequence $ \\left\\{ w_{t}\\right\\} _{t=1}^{\\infty} $ is\n", "\n", "$$\n", "L\\left(w^{t}\\right)=\\prod_{i=1}^{t} \\ell (w_i),\n", "$$\n", "\n", "For shorthand we’ll write $ L_t = L(w^t) $." ] }, { "cell_type": "markdown", "id": "63a1484b", "metadata": {}, "source": [ "### Model selection mistake probability\n", "\n", "We first study a problem that assumes timing protocol 1.\n", "\n", "Consider a decision maker who wants to know whether model $ f $ or model $ g $ governs a data set of length $ T $ observations.\n", "\n", "The decision makers has observed a sequence $ \\{w_t\\}_{t=1}^T $.\n", "\n", "On the basis of that observed sequence, a likelihood ratio test selects model $ f $ when\n", "$ L_T \\geq 1 $ and model $ g $ when $ L_T < 1 $.\n", "\n", "When model $ f $ generates the data, the probability that the likelihood ratio test selects the wrong model is\n", "\n", "$$\n", "p_f = {\\rm Prob}\\left(L_T < 1\\Big| f\\right) = \\alpha_T .\n", "$$\n", "\n", "When model $ g $ generates the data, the probability that the likelihood ratio test selects the wrong model is\n", "\n", "$$\n", "p_g = {\\rm Prob}\\left(L_T \\geq 1 \\Big|g \\right) = \\beta_T.\n", "$$\n", "\n", "We can construct a probability that the likelihood ratio selects the wrong model by assigning a Bayesian prior probability of $ \\pi_{-1} = .5 $ that nature selects model $ f $ then averaging $ p_f $ and $ p_g $ to form the Bayesian posterior probability of a detection error equal to\n", "\n", "\n", "\n", "$$\n", "p(\\textrm{wrong decision}) = {1 \\over 2} (\\alpha_T + \\beta_T) . \\tag{22.2}\n", "$$\n", "\n", "Now let’s simulate timing protocol 1 and compute the error probabilities" ] }, { "cell_type": "code", "execution_count": null, "id": "3c32f7f8", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_protocol_1_errors(π_minus_1, T_max, N_simulations, f_func, g_func, \n", " F_params=(1, 1), G_params=(3, 1.2)):\n", " \"\"\"\n", " Compute error probabilities for Protocol 1.\n", " \"\"\"\n", " sequences, true_models = protocol_1(\n", " π_minus_1, T_max, N_simulations, F_params, G_params)\n", " l_ratios, L_cumulative = compute_likelihood_ratios(sequences, \n", " f_func, g_func)\n", " \n", " T_range = np.arange(1, T_max + 1)\n", " \n", " mask_f = true_models\n", " mask_g = ~true_models\n", " \n", " L_f = L_cumulative[mask_f, :]\n", " L_g = L_cumulative[mask_g, :]\n", " \n", " α_T = np.mean(L_f < 1, axis=0)\n", " β_T = np.mean(L_g >= 1, axis=0)\n", " error_prob = 0.5 * (α_T + β_T)\n", " \n", " return {\n", " 'T_range': T_range,\n", " 'alpha': α_T,\n", " 'beta': β_T, \n", " 'error_prob': error_prob,\n", " 'L_cumulative': L_cumulative,\n", " 'true_models': true_models\n", " }" ] }, { "cell_type": "markdown", "id": "d11620d3", "metadata": {}, "source": [ "The following code visualizes the error probabilities for timing protocol 1" ] }, { "cell_type": "code", "execution_count": null, "id": "df6555b5", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def analyze_protocol_1(π_minus_1, T_max, N_simulations, f_func, g_func, \n", " F_params=(1, 1), G_params=(3, 1.2)):\n", " \"\"\"Analyze Protocol 1\"\"\"\n", " result = compute_protocol_1_errors(π_minus_1, T_max, N_simulations, \n", " f_func, g_func, F_params, G_params)\n", " \n", " # Plot results\n", " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))\n", " \n", " ax1.plot(result['T_range'], result['alpha'], 'b-', \n", " label=r'$\\alpha_T$', linewidth=2)\n", " ax1.plot(result['T_range'], result['beta'], 'r-', \n", " label=r'$\\beta_T$', linewidth=2)\n", " ax1.set_xlabel('$T$')\n", " ax1.set_ylabel('error probability')\n", " ax1.legend()\n", " \n", " ax2.plot(result['T_range'], result['error_prob'], 'g-', \n", " label=r'$\\frac{1}{2}(\\alpha_T+\\beta_T)$', linewidth=2)\n", " ax2.set_xlabel('$T$')\n", " ax2.set_ylabel('error probability')\n", " ax2.legend()\n", " \n", " plt.tight_layout()\n", " plt.show()\n", " \n", " # Print summary\n", " print(f\"At T={T_max}:\")\n", " print(f\"α_{T_max} = {result['alpha'][-1]:.4f}\")\n", " print(f\"β_{T_max} = {result['beta'][-1]:.4f}\")\n", " print(f\"Model selection error probability = {result['error_prob'][-1]:.4f}\")\n", " \n", " return result\n", "\n", "# Analyze Protocol 1\n", "π_minus_1 = 0.5\n", "T_max = 30\n", "N_simulations = 10_000\n", "\n", "result_p1 = analyze_protocol_1(π_minus_1, T_max, N_simulations, \n", " f, g, (F_a, F_b), (G_a, G_b))" ] }, { "cell_type": "markdown", "id": "aede765a", "metadata": {}, "source": [ "Notice how the model selection error probability approaches zero as $ T $ grows." ] }, { "cell_type": "markdown", "id": "c412b3a7", "metadata": {}, "source": [ "### Classification\n", "\n", "We now consider a problem that assumes timing protocol 2.\n", "\n", "A decision maker wants to classify components of an observed sequence $ \\{w_t\\}_{t=1}^T $ as having been drawn from either $ f $ or $ g $.\n", "\n", "The decision maker uses the following classification rule:\n", "\n", "$$\n", "\\begin{aligned}\n", "w_t & \\ {\\rm is \\ from \\ } f \\ {\\rm if \\ } l_t > 1 \\\\\n", "w_t & \\ {\\rm is \\ from \\ } g \\ {\\rm if \\ } l_t \\leq 1 . \n", "\\end{aligned}\n", "$$\n", "\n", "Under this rule, the expected misclassification rate is\n", "\n", "\n", "\n", "$$\n", "p(\\textrm{misclassification}) = {1 \\over 2} (\\tilde \\alpha_t + \\tilde \\beta_t) \\tag{22.3}\n", "$$\n", "\n", "where $ \\tilde \\alpha_t = {\\rm Prob}(l_t < 1 \\mid f) $ and $ \\tilde \\beta_t = {\\rm Prob}(l_t \\geq 1 \\mid g) $.\n", "\n", "Now let’s write some code to simulate it" ] }, { "cell_type": "code", "execution_count": null, "id": "1fab573b", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_protocol_2_errors(π_minus_1, T_max, N_simulations, f_func, g_func,\n", " F_params=(1, 1), G_params=(3, 1.2)):\n", " \"\"\"\n", " Compute error probabilities for Protocol 2.\n", " \"\"\"\n", " sequences, true_models = protocol_2(π_minus_1, \n", " T_max, N_simulations, F_params, G_params)\n", " l_ratios, _ = compute_likelihood_ratios(sequences, f_func, g_func)\n", " \n", " T_range = np.arange(1, T_max + 1)\n", " \n", " accuracy = np.empty(T_max)\n", " for t in range(T_max):\n", " predictions = (l_ratios[:, t] >= 1)\n", " actual = true_models[:, t]\n", " accuracy[t] = np.mean(predictions == actual)\n", " \n", " return {\n", " 'T_range': T_range,\n", " 'accuracy': accuracy,\n", " 'l_ratios': l_ratios,\n", " 'true_models': true_models\n", " }" ] }, { "cell_type": "markdown", "id": "4b99f003", "metadata": {}, "source": [ "Since for each $ t $, the decision boundary is the same, the decision boundary can be computed as" ] }, { "cell_type": "code", "execution_count": null, "id": "1680fca5", "metadata": { "hide-output": false }, "outputs": [], "source": [ "root = brentq(lambda w: f(w) / g(w) - 1, 0.001, 0.999)" ] }, { "cell_type": "markdown", "id": "d25a6c6d", "metadata": {}, "source": [ "we can plot the distributions of $ f $ and $ g $ and the decision boundary" ] }, { "cell_type": "code", "execution_count": null, "id": "1f644a40", "metadata": { "hide-output": false }, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(7, 6))\n", "\n", "w_range = np.linspace(1e-5, 1-1e-5, 1000)\n", "f_values = [f(w) for w in w_range]\n", "g_values = [g(w) for w in w_range]\n", "ratio_values = [f(w)/g(w) for w in w_range]\n", "\n", "ax.plot(w_range, f_values, 'b-', \n", " label=r'$f(w) \\sim Beta(1,1)$', linewidth=2)\n", "ax.plot(w_range, g_values, 'r-', \n", " label=r'$g(w) \\sim Beta(3,1.2)$', linewidth=2)\n", "\n", "type1_prob = 1 - beta_dist.cdf(root, F_a, F_b)\n", "type2_prob = beta_dist.cdf(root, G_a, G_b)\n", "\n", "w_type1 = w_range[w_range >= root]\n", "f_type1 = [f(w) for w in w_type1]\n", "ax.fill_between(w_type1, 0, f_type1, alpha=0.3, color='blue', \n", " label=fr'$\\tilde \\alpha_t = {type1_prob:.2f}$')\n", "\n", "w_type2 = w_range[w_range <= root]\n", "g_type2 = [g(w) for w in w_type2]\n", "ax.fill_between(w_type2, 0, g_type2, alpha=0.3, color='red', \n", " label=fr'$\\tilde \\beta_t = {type2_prob:.2f}$')\n", "\n", "ax.axvline(root, color='green', linestyle='--', alpha=0.7, \n", " label=f'decision boundary: $w=${root:.3f}')\n", "\n", "ax.set_xlabel('w')\n", "ax.set_ylabel('probability density')\n", "ax.legend()\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "2812767c", "metadata": {}, "source": [ "To the left of the green vertical line $ g < f $, so $ l_t > 1 $; therefore a $ w_t $ that falls to the left of the green line is classified as a type $ f $ individual.\n", "\n", "- The shaded red area equals $ \\beta $ – the probability of classifying someone as a type $ g $ individual when it is really a type $ f $ individual. \n", "\n", "\n", "To the right of the green vertical line $ g > f $, so $ l_t < 1 $; therefore a $ w_t $ that falls to the right of the green line is classified as a type $ g $ individual.\n", "\n", "- The shaded blue area equals $ \\alpha $ – the probability of classifying someone as a type $ f $ when it is really a type $ g $ individual. \n", "\n", "\n", "This gives us clues about how to compute the theoretical classification error probability" ] }, { "cell_type": "code", "execution_count": null, "id": "c04939c1", "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Compute theoretical tilde α_t and tilde β_t\n", "def α_integrand(w):\n", " \"\"\"Integrand for tilde α_t = P(l_t < 1 | f)\"\"\"\n", " return f(w) if f(w) / g(w) < 1 else 0\n", "\n", "def β_integrand(w):\n", " \"\"\"Integrand for tilde β_t = P(l_t >= 1 | g)\"\"\"\n", " return g(w) if f(w) / g(w) >= 1 else 0\n", "\n", "# Compute the integrals\n", "α_theory, _ = quad(α_integrand, 0, 1, limit=100)\n", "β_theory, _ = quad(β_integrand, 0, 1, limit=100)\n", "\n", "theory_error = 0.5 * (α_theory + β_theory)\n", "\n", "print(f\"theoretical tilde α_t = {α_theory:.4f}\")\n", "print(f\"theoretical tilde β_t = {β_theory:.4f}\")\n", "print(f\"theoretical classification error probability = {theory_error:.4f}\")" ] }, { "cell_type": "markdown", "id": "5403ce7d", "metadata": {}, "source": [ "Now we simulate timing protocol 2 and compute the classification error probability.\n", "\n", "In the next cell, we also compare the theoretical classification accuracy to the empirical classification accuracy" ] }, { "cell_type": "code", "execution_count": null, "id": "27e6b824", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def analyze_protocol_2(π_minus_1, T_max, N_simulations, f_func, g_func, \n", " theory_error=None, F_params=(1, 1), G_params=(3, 1.2)):\n", " \"\"\"Analyze Protocol 2.\"\"\"\n", " result = compute_protocol_2_errors(π_minus_1, T_max, N_simulations, \n", " f_func, g_func, F_params, G_params)\n", " \n", " # Plot results\n", " plt.figure(figsize=(10, 6))\n", " plt.plot(result['T_range'], result['accuracy'], \n", " 'b-', linewidth=2, label='empirical accuracy')\n", " \n", " if theory_error is not None:\n", " plt.axhline(1 - theory_error, color='r', linestyle='--', \n", " label=f'theoretical accuracy = {1 - theory_error:.4f}')\n", " \n", " plt.xlabel('$t$')\n", " plt.ylabel('accuracy')\n", " plt.legend()\n", " plt.ylim(0.5, 1.0)\n", " plt.show()\n", " \n", " return result\n", "\n", "# Analyze Protocol 2\n", "result_p2 = analyze_protocol_2(π_minus_1, T_max, N_simulations, f, g, \n", " theory_error, (F_a, F_b), (G_a, G_b))" ] }, { "cell_type": "markdown", "id": "06c77b86", "metadata": {}, "source": [ "Let’s watch decisions made by the two timing protocols as more and more observations accrue." ] }, { "cell_type": "code", "execution_count": null, "id": "5329bb11", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compare_protocols(result1, result2):\n", " \"\"\"Compare results from both protocols.\"\"\"\n", " plt.figure(figsize=(10, 6))\n", " \n", " plt.plot(result1['T_range'], result1['error_prob'], linewidth=2, \n", " label='Protocol 1 (Model Selection)')\n", " plt.plot(result2['T_range'], 1 - result2['accuracy'], \n", " linestyle='--', linewidth=2, \n", " label='Protocol 2 (classification)')\n", " \n", " plt.xlabel('$T$')\n", " plt.ylabel('error probability')\n", " plt.legend()\n", " plt.show()\n", " \n", "compare_protocols(result_p1, result_p2)" ] }, { "cell_type": "markdown", "id": "aa14d28c", "metadata": {}, "source": [ "From the figure above, we can see:\n", "\n", "- For both timing protocols, the error probability starts at the same level, subject to a little randomness. \n", "- For timing protocol 1, the error probability decreases as the sample size increases because we are making just **one** decision – i.e., selecting whether $ f $ or $ g $ governs **all** individuals. More data provides better evidence. \n", "- For timing protocol 2, the error probability remains constant because we are making **many** decisions – one classification decision for each observation. \n", "\n", "\n", "**Remark:** Think about how laws of large numbers are applied to compute error probabilities for the model selection problem and the classification problem." ] }, { "cell_type": "markdown", "id": "43bc2a8e", "metadata": {}, "source": [ "### Error probability and divergence measures\n", "\n", "A plausible guess is that the ability of a likelihood ratio to distinguish distributions $ f $ and $ g $ depends on how “different” they are.\n", "\n", "We have learnt some measures of “difference” between distributions in [Statistical Divergence Measures](https://python.quantecon.org/divergence_measures.html).\n", "\n", "Let’s now study two more measures of “difference” between distributions that are useful in the context of model selection and classification.\n", "\n", "Recall that Chernoff entropy between probability densities $ f $ and $ g $ is defined as:\n", "\n", "$$\n", "C(f,g) = - \\log \\min_{\\phi \\in (0,1)} \\int f^\\phi(x) g^{1-\\phi}(x) dx\n", "$$\n", "\n", "An upper bound on model selection error probability is\n", "\n", "$$\n", "e^{-C(f,g)T} .\n", "$$\n", "\n", "Let’s compute Chernoff entropy numerically with some Python code" ] }, { "cell_type": "code", "execution_count": null, "id": "2fc1c6a3", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def chernoff_integrand(ϕ, f, g):\n", " \"\"\"\n", " Compute the integrand for Chernoff entropy\n", " \"\"\"\n", " def integrand(w):\n", " return f(w)**ϕ * g(w)**(1-ϕ)\n", " \n", " result, _ = quad(integrand, 1e-5, 1-1e-5)\n", " return result\n", "\n", "def compute_chernoff_entropy(f, g):\n", " \"\"\"\n", " Compute Chernoff entropy C(f,g)\n", " \"\"\"\n", " def objective(ϕ):\n", " return chernoff_integrand(ϕ, f, g)\n", " \n", " # Find the minimum over ϕ in (0,1)\n", " result = minimize_scalar(objective, \n", " bounds=(1e-5, 1-1e-5), \n", " method='bounded')\n", " min_value = result.fun\n", " ϕ_optimal = result.x\n", " \n", " chernoff_entropy = -np.log(min_value)\n", " return chernoff_entropy, ϕ_optimal\n", "C_fg, ϕ_optimal = compute_chernoff_entropy(f, g)\n", "print(f\"Chernoff entropy C(f,g) = {C_fg:.4f}\")\n", "print(f\"Optimal ϕ = {ϕ_optimal:.4f}\")" ] }, { "cell_type": "markdown", "id": "ffe53543", "metadata": {}, "source": [ "Now let’s examine how $ e^{-C(f,g)T} $ behaves as a function of $ T $ and compare it to the model selection error probability" ] }, { "cell_type": "code", "execution_count": null, "id": "05513bb4", "metadata": { "hide-output": false }, "outputs": [], "source": [ "T_range = np.arange(1, T_max+1)\n", "chernoff_bound = np.exp(-C_fg * T_range)\n", "\n", "# Plot comparison\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "\n", "ax.semilogy(T_range, chernoff_bound, 'r-', linewidth=2, \n", " label=f'$e^{{-C(f,g)T}}$')\n", "ax.semilogy(T_range, result_p1['error_prob'], 'b-', linewidth=2, \n", " label='Model selection error probability')\n", "\n", "ax.set_xlabel('T')\n", "ax.set_ylabel('error probability (log scale)')\n", "ax.legend()\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "f1cc7f42", "metadata": {}, "source": [ "Evidently, $ e^{-C(f,g)T} $ is an upper bound on the error rate.\n", "\n", "In `{doc}`divergence_measures`, we also studied **Jensen-Shannon divergence** as\n", "a symmetric measure of distance between distributions.\n", "\n", "We can use Jensen-Shannon divergence to measure the distance between distributions $ f $ and $ g $ and\n", "compute how it covaries with the model selection error probability.\n", "\n", "We also compute Jensen-Shannon divergence numerically with some Python code" ] }, { "cell_type": "code", "execution_count": null, "id": "49c13fa6", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_JS(f, g):\n", " \"\"\"\n", " Compute Jensen-Shannon divergence\n", " \"\"\"\n", " def m(w):\n", " return 0.5 * (f(w) + g(w))\n", " \n", " js_div = 0.5 * compute_KL(f, m) + 0.5 * compute_KL(g, m)\n", " return js_div" ] }, { "cell_type": "markdown", "id": "df4504fe", "metadata": {}, "source": [ "Now let’s return to our guess that the error probability at large sample sizes is related to the Chernoff entropy between two distributions.\n", "\n", "We verify this by computing the correlation between the log of the error probability at $ T=50 $ under Timing Protocol 1 and the divergence measures.\n", "\n", "In the simulation below, nature draws $ N / 2 $ sequences from $ g $ and $ N/2 $ sequences from $ f $.\n", "\n", ">**Note**\n", ">\n", ">Nature does this rather than flipping a fair coin to decide whether to draw from $ g $ or $ f $ once and for all before each simulation of length $ T $.\n", "\n", "We use the following pairs of Beta distributions for $ f $ and $ g $ as test cases" ] }, { "cell_type": "code", "execution_count": null, "id": "a300bedd", "metadata": { "hide-output": false }, "outputs": [], "source": [ "distribution_pairs = [\n", " # (f_params, g_params)\n", " ((1, 1), (0.1, 0.2)),\n", " ((1, 1), (0.3, 0.3)),\n", " ((1, 1), (0.3, 0.4)),\n", " ((1, 1), (0.5, 0.5)),\n", " ((1, 1), (0.7, 0.6)),\n", " ((1, 1), (0.9, 0.8)),\n", " ((1, 1), (1.1, 1.05)),\n", " ((1, 1), (1.2, 1.1)),\n", " ((1, 1), (1.5, 1.2)),\n", " ((1, 1), (2, 1.5)),\n", " ((1, 1), (2.5, 1.8)),\n", " ((1, 1), (3, 1.2)),\n", " ((1, 1), (4, 1)),\n", " ((1, 1), (5, 1))\n", "]" ] }, { "cell_type": "markdown", "id": "49800f36", "metadata": {}, "source": [ "Now let’s run the simmulation" ] }, { "cell_type": "code", "execution_count": null, "id": "77cf99fb", "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Parameters for simulation\n", "T_large = 50\n", "N_sims = 5000\n", "N_half = N_sims // 2\n", "\n", "# Initialize arrays\n", "n_pairs = len(distribution_pairs)\n", "kl_fg_vals = np.zeros(n_pairs)\n", "kl_gf_vals = np.zeros(n_pairs) \n", "js_vals = np.zeros(n_pairs)\n", "chernoff_vals = np.zeros(n_pairs)\n", "error_probs = np.zeros(n_pairs)\n", "pair_names = []\n", "\n", "for i, ((f_a, f_b), (g_a, g_b)) in enumerate(distribution_pairs):\n", " # Create density functions\n", " f = jit(lambda x, a=f_a, b=f_b: p(x, a, b))\n", " g = jit(lambda x, a=g_a, b=g_b: p(x, a, b))\n", "\n", " # Compute divergence measures\n", " kl_fg_vals[i] = compute_KL(f, g)\n", " kl_gf_vals[i] = compute_KL(g, f)\n", " js_vals[i] = compute_JS(f, g)\n", " chernoff_vals[i], _ = compute_chernoff_entropy(f, g)\n", "\n", " # Generate samples\n", " sequences_f = np.random.beta(f_a, f_b, (N_half, T_large))\n", " sequences_g = np.random.beta(g_a, g_b, (N_half, T_large))\n", "\n", " # Compute likelihood ratios and cumulative products\n", " _, L_cumulative_f = compute_likelihood_ratios(sequences_f, f, g)\n", " _, L_cumulative_g = compute_likelihood_ratios(sequences_g, f, g)\n", " \n", " # Get final values\n", " L_cumulative_f = L_cumulative_f[:, -1]\n", " L_cumulative_g = L_cumulative_g[:, -1]\n", "\n", " # Calculate error probabilities\n", " error_probs[i] = 0.5 * (np.mean(L_cumulative_f < 1) + \n", " np.mean(L_cumulative_g >= 1))\n", " pair_names.append(f\"Beta({f_a},{f_b}) and Beta({g_a},{g_b})\")\n", "\n", "cor_data = {\n", " 'kl_fg': kl_fg_vals,\n", " 'kl_gf': kl_gf_vals,\n", " 'js': js_vals, \n", " 'chernoff': chernoff_vals,\n", " 'error_prob': error_probs,\n", " 'names': pair_names,\n", " 'T': T_large}" ] }, { "cell_type": "markdown", "id": "6f0984d2", "metadata": {}, "source": [ "Now let’s visualize the correlations" ] }, { "cell_type": "code", "execution_count": null, "id": "a8b1c9dc", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def plot_error_divergence(data):\n", " \"\"\"\n", " Plot correlations between error probability and divergence measures.\n", " \"\"\"\n", " # Filter out near-zero error probabilities for log scale\n", " nonzero_mask = data['error_prob'] > 1e-6\n", " log_error = np.log(data['error_prob'][nonzero_mask])\n", " js_vals = data['js'][nonzero_mask]\n", " chernoff_vals = data['chernoff'][nonzero_mask]\n", "\n", " # Create figure and axes\n", " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))\n", " \n", " # function for plotting correlation\n", " def plot_correlation(ax, x_vals, x_label, color):\n", " ax.scatter(x_vals, log_error, alpha=0.7, s=60, color=color)\n", " ax.set_xlabel(x_label)\n", " ax.set_ylabel(f'log(Error probability) at T={data[\"T\"]}')\n", " \n", " # Calculate correlation and trend line\n", " corr = np.corrcoef(x_vals, log_error)[0, 1]\n", " z = np.polyfit(x_vals, log_error, 2)\n", " x_trend = np.linspace(x_vals.min(), x_vals.max(), 100)\n", " ax.plot(x_trend, np.poly1d(z)(x_trend), \n", " \"r--\", alpha=0.8, linewidth=2)\n", " ax.set_title(f'Log error probability and {x_label}\\n'\n", " f'Correlation = {corr:.3f}')\n", " \n", " # Plot both correlations\n", " plot_correlation(ax1, js_vals, 'JS divergence', 'C0')\n", " plot_correlation(ax2, chernoff_vals, 'Chernoff entropy', 'C1')\n", "\n", " plt.tight_layout()\n", " plt.show()\n", "\n", "plot_error_divergence(cor_data)" ] }, { "cell_type": "markdown", "id": "53465633", "metadata": {}, "source": [ "Evidently, Chernoff entropy and Jensen-Shannon entropy each covary tightly with the model selection error probability.\n", "\n", "We’ll encounter related ideas in [A Problem that Stumped Milton Friedman](https://python.quantecon.org/wald_friedman.html) very soon.\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "id": "4ca75f09", "metadata": {}, "source": [ "## Markov chains\n", "\n", "Let’s now look at a likelihood ratio process for a sequence of random variables that is not independently and identically distributed.\n", "\n", "Here we assume that the sequence is generated by a Markov chain on a finite state space.\n", "\n", "We consider two $ n $-state irreducible and aperiodic Markov chain models on the same state space $ \\{1, 2, \\ldots, n\\} $ with positive transition matrices $ P^{(f)} $, $ P^{(g)} $ and initial distributions $ \\pi_0^{(f)} $, $ \\pi_0^{(g)} $.\n", "\n", "We assume that nature samples from chain $ f $.\n", "\n", "For a sample path $ (x_0, x_1, \\ldots, x_T) $, let $ N_{ij} $ count transitions from state $ i $ to $ j $.\n", "\n", "The likelihood process under model $ m \\in \\{f, g\\} $ is\n", "\n", "$$\n", "L_T^{(m)} = \\pi_{0,x_0}^{(m)} \\prod_{i=1}^n \\prod_{j=1}^n \\left(P_{ij}^{(m)}\\right)^{N_{ij}}\n", "$$\n", "\n", "Hence,\n", "\n", "$$\n", "\\log L_T^{(m)} =\\log\\pi_{0,x_0}^{(m)} +\\sum_{i,j}N_{ij}\\log P_{ij}^{(m)}\n", "$$\n", "\n", "The log-likelihood ratio is\n", "\n", "\n", "\n", "$$\n", "\\log \\frac{L_T^{(f)}}{L_T^{(g)}} = \\log \\frac{\\pi_{0,x_0}^{(f)}}{\\pi_{0,x_0}^{(g)}} + \\sum_{i,j}N_{ij}\\log \\frac{P_{ij}^{(f)}}{P_{ij}^{(g)}} \\tag{22.4}\n", "$$" ] }, { "cell_type": "markdown", "id": "880c5c39", "metadata": {}, "source": [ "### KL divergence rate\n", "\n", "By the ergodic theorem for irreducible, aperiodic Markov chains, we have\n", "\n", "$$\n", "\\frac{N_{ij}}{T} \\xrightarrow{a.s.} \\pi_i^{(f)}P_{ij}^{(f)} \\quad \\text{as } T \\to \\infty\n", "$$\n", "\n", "where $ \\boldsymbol{\\pi}^{(f)} $ is the stationary distribution satisfying $ \\boldsymbol{\\pi}^{(f)} = \\boldsymbol{\\pi}^{(f)} P^{(f)} $.\n", "\n", "Therefore,\n", "\n", "$$\n", "\\frac{1}{T}\\log \\frac{L_T^{(f)}}{L_T^{(g)}} = \\frac{1}{T}\\log \\frac{\\pi_{0,x_0}^{(f)}}{\\pi_{0,x_0}^{(g)}} + \\frac{1}{T}\\sum_{i,j}N_{ij}\\log \\frac{P_{ij}^{(f)}}{P_{ij}^{(g)}}\n", "$$\n", "\n", "Taking the limit as $ T \\to \\infty $, we have:\n", "\n", "- The first term: $ \\frac{1}{T}\\log \\frac{\\pi_{0,x_0}^{(f)}}{\\pi_{0,x_0}^{(g)}} \\to 0 $ \n", "- The second term: $ \\frac{1}{T}\\sum_{i,j}N_{ij}\\log \\frac{P_{ij}^{(f)}}{P_{ij}^{(g)}} \\xrightarrow{a.s.} \\sum_{i,j}\\pi_i^{(f)}P_{ij}^{(f)}\\log \\frac{P_{ij}^{(f)}}{P_{ij}^{(g)}} $ \n", "\n", "\n", "Define the **KL divergence rate** as\n", "\n", "$$\n", "h_{KL}(f, g) = \\sum_{i=1}^n \\pi_i^{(f)} \\underbrace{\\sum_{j=1}^n P_{ij}^{(f)} \\log \\frac{P_{ij}^{(f)}}{P_{ij}^{(g)}}}_{=: KL(P_{i\\cdot}^{(f)}, P_{i\\cdot}^{(g)})}\n", "$$\n", "\n", "where $ KL(P_{i\\cdot}^{(f)}, P_{i\\cdot}^{(g)}) $ is the row-wise KL divergence.\n", "\n", "By the ergodic theorem, we have\n", "\n", "$$\n", "\\frac{1}{T}\\log \\frac{L_T^{(f)}}{L_T^{(g)}} \\xrightarrow{a.s.} h_{KL}(f, g) \\quad \\text{as } T \\to \\infty\n", "$$\n", "\n", "Taking expectations and using the dominated convergence theorem, we obtain\n", "\n", "$$\n", "\\frac{1}{T}E_f\\left[\\log \\frac{L_T^{(f)}}{L_T^{(g)}}\\right] \\to h_{KL}(f, g) \\quad \\text{as } T \\to \\infty\n", "$$\n", "\n", "Here we invite readers to pause and compare this result with [(22.1)](#equation-eq-kl-likelihood-link).\n", "\n", "Let’s confirm this in the simulation below." ] }, { "cell_type": "markdown", "id": "747517eb", "metadata": {}, "source": [ "### Simulations\n", "\n", "Let’s implement simulations to illustrate these concepts with a three-state Markov chain.\n", "\n", "We start by writing functions to compute the stationary distribution and the KL divergence rate for Markov chain models." ] }, { "cell_type": "code", "execution_count": null, "id": "7814eac5", "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_stationary_dist(P):\n", " \"\"\"\n", " Compute stationary distribution of transition matrix P\n", " \"\"\"\n", " eigenvalues, eigenvectors = np.linalg.eig(P.T)\n", " idx = np.argmax(np.abs(eigenvalues))\n", " stationary = np.real(eigenvectors[:, idx])\n", " return stationary / stationary.sum()\n", "\n", "def markov_kl_divergence(P_f, P_g, pi_f):\n", " \"\"\"\n", " Compute KL divergence rate between two Markov chains\n", " \"\"\"\n", " if np.any((P_f > 0) & (P_g == 0)):\n", " return np.inf\n", " \n", " valid_mask = (P_f > 0) & (P_g > 0)\n", " log_ratios = np.zeros_like(P_f)\n", " log_ratios[valid_mask] = np.log(P_f[valid_mask] / P_g[valid_mask])\n", " \n", " # Weight by stationary probabilities and sum\n", " kl_rate = np.sum(pi_f[:, np.newaxis] * P_f * log_ratios)\n", " return kl_rate\n", "\n", "def simulate_markov_chain(P, pi_0, T, N_paths=1000):\n", " \"\"\"\n", " Simulate N_paths sample paths from a Markov chain\n", " \"\"\"\n", " mc = qe.MarkovChain(P, state_values=None)\n", " initial_states = np.random.choice(len(P), size=N_paths, p=pi_0)\n", " paths = np.zeros((N_paths, T+1), dtype=int)\n", " \n", " for i in range(N_paths):\n", " path = mc.simulate(T+1, init=initial_states[i])\n", " paths[i, :] = path\n", " \n", " return paths\n", "\n", "def compute_likelihood_ratio_markov(paths, P_f, P_g, π_0_f, π_0_g):\n", " \"\"\"\n", " Compute likelihood ratio process for Markov chain paths\n", " \"\"\"\n", " N_paths, T_plus_1 = paths.shape\n", " T = T_plus_1 - 1\n", " L_ratios = np.ones((N_paths, T+1))\n", " \n", " # Initial likelihood ratio\n", " L_ratios[:, 0] = π_0_f[paths[:, 0]] / π_0_g[paths[:, 0]]\n", " \n", " # Compute sequential likelihood ratios\n", " for t in range(1, T+1):\n", " prev_states = paths[:, t-1]\n", " curr_states = paths[:, t]\n", " \n", " transition_ratios = (P_f[prev_states, curr_states] / \n", " P_g[prev_states, curr_states])\n", " L_ratios[:, t] = L_ratios[:, t-1] * transition_ratios\n", " \n", " return L_ratios\n", "\n", "def analyze_markov_chains(P_f, P_g, \n", " T=500, N_paths=1000, plot_paths=True, n_show=50):\n", " \"\"\"\n", " Complete analysis of two Markov chains\n", " \"\"\"\n", " # Compute stationary distributions\n", " π_f = compute_stationary_dist(P_f)\n", " π_g = compute_stationary_dist(P_g)\n", " \n", " print(f\"Stationary distribution (f): {π_f}\")\n", " print(f\"Stationary distribution (g): {π_g}\")\n", " \n", " # Compute KL divergence rates\n", " kl_rate_fg = markov_kl_divergence(P_f, P_g, π_f)\n", " kl_rate_gf = markov_kl_divergence(P_g, P_f, π_g)\n", " \n", " print(f\"\\nKL divergence rate h(f, g): {kl_rate_fg:.4f}\")\n", " print(f\"KL divergence rate h(g, f): {kl_rate_gf:.4f}\")\n", " \n", " if plot_paths:\n", " # Simulate and plot paths\n", " paths_from_f = simulate_markov_chain(P_f, π_f, T, N_paths)\n", " L_ratios_f = compute_likelihood_ratio_markov(\n", " paths_from_f, P_f, P_g, π_f, π_g)\n", " \n", " plt.figure(figsize=(10, 6))\n", " \n", " # Plot individual paths\n", " for i in range(min(n_show, N_paths)):\n", " plt.plot(np.log(L_ratios_f[i, :]), alpha=0.3, color='blue', lw=0.8)\n", " \n", " # Plot theoretical expectation\n", " theory_line = kl_rate_fg * np.arange(T+1)\n", " plt.plot(theory_line, 'k--', linewidth=2.5, \n", " label=r'$T \\times h_{KL}(f,g)$')\n", " \n", " # Plot empirical mean\n", " avg_log_L = np.mean(np.log(L_ratios_f), axis=0)\n", " plt.plot(avg_log_L, 'r-', linewidth=2.5, \n", " label='empirical average', alpha=0.7)\n", " \n", " plt.axhline(y=0, color='gray', linestyle='--', alpha=0.5)\n", " plt.xlabel(r'$T$')\n", " plt.ylabel(r'$\\log L_T$')\n", " plt.title('Markov chain likelihood ratios (nature = f)')\n", " plt.legend()\n", " plt.show()\n", " \n", " return {\n", " 'stationary_f': π_f,\n", " 'stationary_g': π_g,\n", " 'kl_rate_fg': kl_rate_fg,\n", " 'kl_rate_gf': kl_rate_gf\n", " }\n", "\n", "def compute_markov_selection_error(T_values, P_f, P_g, π_0_f, π_0_g, N_sim=1000):\n", " \"\"\"\n", " Compute model selection error probability for Markov chains\n", " \"\"\"\n", " errors = []\n", " \n", " for T in T_values:\n", " # Simulate from both models\n", " paths_f = simulate_markov_chain(P_f, π_0_f, T, N_sim//2)\n", " paths_g = simulate_markov_chain(P_g, π_0_g, T, N_sim//2)\n", " \n", " # Compute likelihood ratios\n", " L_f = compute_likelihood_ratio_markov(paths_f, P_f, P_g, π_0_f, π_0_g)\n", " L_g = compute_likelihood_ratio_markov(paths_g, P_f, P_g, π_0_f, π_0_g)\n", " \n", " # Decision rule: choose f if L_T >= 1\n", " error_f = np.mean(L_f[:, -1] < 1) # Type I error\n", " error_g = np.mean(L_g[:, -1] >= 1) # Type II error\n", " \n", " total_error = 0.5 * (error_f + error_g)\n", " errors.append(total_error)\n", " \n", " return np.array(errors)" ] }, { "cell_type": "markdown", "id": "ad9b7cfa", "metadata": {}, "source": [ "Now let’s create an example with two different 3-state Markov chains.\n", "\n", "We are now ready to simulate paths and visualize how likelihood ratios evolve.\n", "\n", "We verify $ \\frac{1}{T}E_f\\left[\\log \\frac{L_T^{(f)}}{L_T^{(g)}}\\right] = h_{KL}(f, g) $ starting from the stationary distribution by plotting both the empirical average and the line predicted by the theory" ] }, { "cell_type": "code", "execution_count": null, "id": "04531388", "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Define example Markov chain transition matrices\n", "P_f = np.array([[0.7, 0.2, 0.1],\n", " [0.3, 0.5, 0.2],\n", " [0.1, 0.3, 0.6]])\n", "\n", "P_g = np.array([[0.5, 0.3, 0.2],\n", " [0.2, 0.6, 0.2],\n", " [0.2, 0.2, 0.6]])\n", "\n", "markov_results = analyze_markov_chains(P_f, P_g)" ] }, { "cell_type": "markdown", "id": "78f0bf20", "metadata": {}, "source": [ "## Related lectures\n", "\n", "Likelihood processes play an important role in Bayesian learning, as described in [Likelihood Ratio Processes and Bayesian Learning](https://python.quantecon.org/likelihood_bayes.html)\n", "and as applied in [Job Search VIII: Search with Learning](https://python.quantecon.org/odu.html).\n", "\n", "Likelihood ratio processes are central to Lawrence Blume and David Easley’s answer to their question “If you’re so smart, why aren’t you rich?” [[Blume and Easley, 2006](https://python.quantecon.org/zreferences.html#id14)], the subject of the lecture[Heterogeneous Beliefs and Financial Markets](https://python.quantecon.org/likelihood_ratio_process_2.html).\n", "\n", "Likelihood ratio processes also appear in [Additive and Multiplicative Functionals](https://python-advanced.quantecon.org/additive_functionals.html), which contains another illustration of the **peculiar property** of likelihood ratio processes described above." ] }, { "cell_type": "markdown", "id": "5582979c", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "id": "13d9d5b6", "metadata": {}, "source": [ "## Exercise 22.1\n", "\n", "Consider the setting where nature generates data from a third density $ h $.\n", "\n", "Let $ \\{w_t\\}_{t=1}^T $ be IID draws from $ h $, and let $ L_t = L(w^t) $ be the likelihood ratio process defined as in the lecture.\n", "\n", "Show that:\n", "\n", "$$\n", "\\frac{1}{t} E_h[\\log L_t] = K_g - K_f\n", "$$\n", "\n", "with finite $ K_g, K_f $, $ E_h |\\log f(W)| < \\infty $ and $ E_h |\\log g(W)| < \\infty $.\n", "\n", "*Hint:* Start by expressing $ \\log L_t $ as a sum of $ \\log \\ell(w_i) $ terms and compare with the definition of $ K_f $ and $ K_g $." ] }, { "cell_type": "markdown", "id": "0af769f1", "metadata": {}, "source": [ "## Solution\n", "\n", "Since $ w_1, \\ldots, w_t $ are IID draws from $ h $, we can write\n", "\n", "$$\n", "\\log L_t = \\log \\prod_{i=1}^t \\ell(w_i) = \\sum_{i=1}^t \\log \\ell(w_i) = \\sum_{i=1}^t \\log \\frac{f(w_i)}{g(w_i)}\n", "$$\n", "\n", "Taking the expectation under $ h $\n", "\n", "$$\n", "E_h[\\log L_t] \n", "= E_h\\left[\\sum_{i=1}^t \\log \\frac{f(w_i)}{g(w_i)}\\right] \n", "= \\sum_{i=1}^t E_h\\left[\\log \\frac{f(w_i)}{g(w_i)}\\right]\n", "$$\n", "\n", "Since the $ w_i $ are identically distributed\n", "\n", "$$\n", "E_h[\\log L_t] = t \\cdot E_h\\left[\\log \\frac{f(w)}{g(w)}\\right]\n", "$$\n", "\n", "where $ w \\sim h $.\n", "\n", "Therefore\n", "\n", "$$\n", "\\frac{1}{t} E_h[\\log L_t] = E_h\\left[\\log \\frac{f(w)}{g(w)}\\right] = E_h[\\log f(w)] - E_h[\\log g(w)]\n", "$$\n", "\n", "Now, from the definition of Kullback-Leibler divergence\n", "\n", "$$\n", "K_f = \\int h(w) \\log \\frac{h(w)}{f(w)} dw = E_h[\\log h(w)] - E_h[\\log f(w)]\n", "$$\n", "\n", "This gives us\n", "\n", "$$\n", "E_h[\\log f(w)] = E_h[\\log h(w)] - K_f\n", "$$\n", "\n", "Similarly\n", "\n", "$$\n", "E_h[\\log g(w)] = E_h[\\log h(w)] - K_g\n", "$$\n", "\n", "Substituting back\n", "\n", "$$\n", "\\begin{aligned}\n", "\\frac{1}{t} E_h[\\log L_t] &= E_h[\\log f(w)] - E_h[\\log g(w)] \\\\\n", "&= [E_h[\\log h(w)] - K_f] - [E_h[\\log h(w)] - K_g] \\\\\n", "&= K_g - K_f\n", "\\end{aligned}\n", "$$" ] }, { "cell_type": "markdown", "id": "e1281d59", "metadata": {}, "source": [ "## Exercise 22.2\n", "\n", "Building on Exercise 22.1, use the result to explain what happens to $ L_t $ as $ t \\to \\infty $ in the following cases:\n", "\n", "1. When $ K_g > K_f $ (i.e., $ f $ is “closer” to $ h $ than $ g $ is) \n", "1. When $ K_g < K_f $ (i.e., $ g $ is “closer” to $ h $ than $ f $ is) \n", "\n", "\n", "Relate your answer to the simulation results shown in [this section](#llr-h)." ] }, { "cell_type": "markdown", "id": "596f3cec", "metadata": {}, "source": [ "## Solution\n", "\n", "From Exercise 22.1, we know that:\n", "\n", "$$\n", "\\frac{1}{t} E_h[\\log L_t] = K_g - K_f\n", "$$\n", "\n", "**Case 1:** When $ K_g > K_f $\n", "\n", "Here, $ f $ is “closer” to $ h $ than $ g $ is. Since $ K_g - K_f > 0 $\n", "\n", "$$\n", "E_h[\\log L_t] = t \\cdot (K_g - K_f) \\to +\\infty \\text{ as } t \\to \\infty\n", "$$\n", "\n", "By the Law of Large Numbers, $ \\frac{1}{t} \\log L_t \\to K_g - K_f > 0 $ almost surely.\n", "\n", "Therefore $ L_t \\to +\\infty $ almost surely.\n", "\n", "**Case 2:** When $ K_g < K_f $\n", "\n", "Here, $ g $ is “closer” to $ h $ than $ f $ is. Since $ K_g - K_f < 0 $\n", "\n", "$$\n", "E_h[\\log L_t] = t \\cdot (K_g - K_f) \\to -\\infty \\text{ as } t \\to \\infty\n", "$$\n", "\n", "Therefore by similar reasoning $ L_t \\to 0 $ almost surely." ] } ], "metadata": { "date": 1770028419.331166, "filename": "likelihood_ratio_process.md", "kernelspec": { "display_name": "Python", "language": "python3", "name": "python3" }, "title": "Likelihood Ratio Processes" }, "nbformat": 4, "nbformat_minor": 5 }