{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy import stats\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate confidence intervals on prevalence that appear in Skowronski et al preprint (July 15, 2020)\n", "\n", "For each value of prev (=prevalence), calculate the distribution of possible outcomes (ie. number of observed cases). The 95% confidence belt contains the central 95% of possible outcomes. The 95% confidence interval, are those values of prev for which the actual observation is in the central 95% of possible outcomes.\n", "\n", "* n_trial: number of trials\n", "* obs: observed number of positives\n", "* n_control: number of negative samples used to means false positive rate (ie. specificity)\n", "* n_fp: number of false positives in the control\n", "* sensitiviy: fraction of positive samples that would be identifies as positive\n", "\n", "The sensitivity is assumed to be known well. The confidence intervals are not very sensitive to this uncertainty.\n", "\n", "Distribution of possible outcomes: use binomial for signal+background on \"n_trial\" trials\n", "\n", "The background is from false positives. We know the control: \"n_fp\" false positives in \"n_control\" negative sample size. \n", "The knowledge from the control test is encapsulated as the posterior probability distribution of fpr from that control study. Use a uniform prior for fpr. The posterior probability for fpr proportional to the likelihood.\n", "\n", "Make many repetitions of the experiment. For each repetition of the experiment draw fpr from its distributions, and then do a binomial throw given the expectation of signal+background." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "study = 7\n", "joint = False\n", "\n", "if study == 1:\n", " obs = [7]\n", " n_trial = [870]\n", " n_control = [470 + 390 + 99]\n", " n_fp = [3] \n", " sensitivity = [0.85]\n", "elif study == 2:\n", " obs = [4]\n", " n_trial = [869]\n", " n_control = [149 + 105]\n", " n_fp = [1]\n", " sensitivity = [0.93]\n", "elif study == 3:\n", " obs = [7,4]\n", " n_trial = [870,869]\n", " n_control = [470 + 390 + 99,149 + 105]\n", " n_fp = [3,1] \n", " sensitivity = [0.85,0.93]\n", "elif study == 4:\n", " obs = [2]\n", " n_trial = [869]\n", " n_control = [1091+399+100]\n", " n_fp = [2]\n", " sensitivity = [0.85*0.86]\n", "elif study == 5:\n", " obs = [6]\n", " n_trial = [889]\n", " n_control = [470 + 390 + 99]\n", " n_fp = [3] \n", " sensitivity = [0.85]\n", "elif study == 6:\n", " obs = [7]\n", " n_trial = [885]\n", " n_control = [149 + 105]\n", " n_fp = [1]\n", " sensitivity = [0.93]\n", "elif study == 7:\n", " joint = True\n", " obs = [4]\n", " n_trial = [885]\n", " n_control = [470 + 390 + 99,149 + 105]\n", " n_fp = [3,1] \n", " sensitivity = [0.93*0.85]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is the un-normalized posterior for fpr after the control experiment is done (uniform prior)\n", "fpr_plot = np.arange(0.,0.1,0.001)\n", "for i in range(len(n_fp)):\n", " p_fpr_plot = stats.binom.pmf(n_fp[i], n_control[i], fpr_plot)\n", " plt.plot(fpr_plot,p_fpr_plot,label=str(n_fp[i])+'/'+str(n_control[i]))\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# draw a fpr from that distribution\n", "def get_fpr(i_fp, i_control):\n", " nu = 1.*i_fp/i_control\n", " big = stats.binom.pmf(i_fp, i_control, nu)\n", " max_fpr = 0.05 # choose this value using the plot above, fpr will be restricted to be below this value\n", " fpr_result = -1.\n", " while fpr_result < 0.:\n", " fpr_trial = stats.uniform.rvs(0.,max_fpr)\n", " pdf = stats.binom.pmf(i_fp, i_control, fpr_trial)\n", " if stats.uniform.rvs(0.,big) < pdf:\n", " fpr_result = fpr_trial\n", " return fpr_result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Do repetions of experiment to find boundaries of the x% CL interval on prevalence\n", "# \n", "cl = 0.95\n", "upper = 1.-(1-cl)/2.\n", "lower = (1-cl)/2.\n", "\n", "# Do two studys, one coarse to find the approximate boundaries (3 digits) then a fine one to 4 digits\n", "\n", "# number of repititions to use:\n", "n_rep_coarse = 1000\n", "n_rep_fine = 10000\n", "n_reps = [n_rep_coarse, n_rep_fine]\n", "\n", "# range of prevalences to consider:\n", "prevs_course = np.arange(0., 0.025, 0.001)\n", "prevs_fine = np.arange(0.0142,0.0148,0.0001)\n", "prevs = [prevs_course,prevs_fine]\n", "\n", "sum_obs = 0\n", "for ob in obs:\n", " sum_obs += ob\n", "\n", "for i_study in range(1):\n", " first_in = -1\n", " last_in = -1\n", " prev_list = prevs[i_study]\n", " for prev in prev_list:\n", " results = []\n", " for i in range(n_reps[i_study]):\n", " total = 0\n", " if not joint:\n", " for j in range(len(n_fp)):\n", " fpr = get_fpr(n_fp[j], n_control[j])\n", " prob = prev*sensitivity[j] + fpr\n", " total += stats.binom.rvs(n_trial[j], prob)\n", " results.append(total)\n", " else:\n", " fpr = 1.\n", " for j in range(len(n_fp)):\n", " fpr *= get_fpr(n_fp[j], n_control[j])\n", " prob = prev*sensitivity[0] + fpr\n", " total += stats.binom.rvs(n_trial[0], prob)\n", " results.append(total)\n", "\n", " results.sort()\n", " i_upper = int(upper*n_reps[i_study])\n", " i_lower = int(lower*n_reps[i_study])\n", " \n", " print('prevalence = {:.4f}'.format(prev)+' '+str(cl*100)+'% range =['+\n", " str(results[i_lower])+','+str(results[i_upper])+']')\n", "\n", " if first_in<0:\n", " if sum_obs>=results[i_lower] and sum_obs<=results[i_upper]:\n", " first_in = prev\n", " if sum_obs>=results[i_lower] and sum_obs<=results[i_upper]:\n", " last_in = prev\n", " if last_in > 0 and sum_obs