{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The Lincoln Index Problem" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "Copyright 2020 Allen B. Downey\n", "\n", "License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# If we're running on Colab, install libraries\n", "\n", "import sys\n", "IN_COLAB = 'google.colab' in sys.modules\n", "\n", "if IN_COLAB:\n", " !pip install arviz==0.6.1\n", " !pip install pymc3==3.8\n", " !pip install Theano==1.0.4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In [an excellent blog post](http://www.johndcook.com/blog/2010/07/13/lincoln-index/), John D. Cook wrote about the Lincoln index, which is a way to estimate the\n", "number of errors in a document (or program) by comparing results from\n", "two independent testers. \n", "\n", "Here's his presentation of the problem:\n", "\n", ">\"Suppose you have a tester who finds 20 bugs in your program. You\n", "want to estimate how many bugs are really in the program. You know\n", "there are at least 20 bugs, and if you have supreme confidence in your\n", "tester, you may suppose there are around 20 bugs. But maybe your\n", "tester isn't very good. Maybe there are hundreds of bugs. How can you\n", "have any idea how many bugs there are? There's no way to know with one\n", "tester. But if you have two testers, you can get a good idea, even if\n", "you don't know how skilled the testers are.\"\n", "\n", "Suppose the first tester finds 20 bugs, the second finds 15, and they\n", "find 3 in common; how can we estimate the number of bugs?\n", "\n", "I'll use the following notation for the data:\n", "\n", "* k11 is the number of bugs found by both testers,\n", "\n", "* k10 is the number of bugs found by the first tester but not the second,\n", "\n", "* k01 is the number of bugs found by the second tester but not the first, and\n", "\n", "* k00 is the unknown number of undiscovered bugs.\n", "\n", "Here are the values for all but `k00`:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "k10 = 20 - 3\n", "k01 = 15 - 3\n", "k11 = 3\n", "\n", "num_seen = k01 + k10 + k11\n", "num_seen" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the model:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import pymc3 as pm\n", "\n", "with pm.Model() as model5:\n", " p0 = pm.Beta('p0', alpha=1, beta=1)\n", " p1 = pm.Beta('p1', alpha=1, beta=1)\n", " N = pm.DiscreteUniform('N', num_seen, 350)\n", " \n", " q0 = 1-p0\n", " q1 = 1-p1\n", " ps = [q0*q1, q0*p1, p0*q1, p0*p1]\n", " \n", " k00 = N - num_seen\n", " data = pm.math.stack((k00, k01, k10, k11))\n", " y = pm.Multinomial('y', n=N, p=ps, observed=data)\n", "\n", "with model5:\n", " trace5 = pm.sample(1000)\n", "\n", "with model5:\n", " pm.plot_posterior(trace5)\n", "\n", "with model5:\n", " pm.traceplot(trace5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }