{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Think Bayes\n", "\n", "This notebook presents example code and exercise solutions for Think Bayes.\n", "\n", "Copyright 2018 Allen B. Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Configure Jupyter so figures appear in the notebook\n", "%matplotlib inline\n", "\n", "# Configure Jupyter to display the assigned value after an assignment\n", "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n", "\n", "# import classes from thinkbayes2\n", "from thinkbayes2 import Hist, Pmf, Suite\n", "import thinkplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Grizzly Bear Problem\n", "\n", "In 1996 and 1997 Mowat and Strobeck deployed bear traps in locations in British Columbia and Alberta, in an effort to estimate the population of grizzly bears. They describe the experiment in \"Estimating Population Size of Grizzly Bears Using Hair Capture, DNA Profiling, and Mark-Recapture Analysis\"\n", "\n", "The \"trap\" consists of a lure and several strands of barbed wire intended to capture samples of hair from bears that visit the lure. Using the hair samples, the researchers use DNA analysis to identify individual bears.\n", "\n", "During the first session, on June 29, 1996, the researchers deployed traps at 76 sites. Returning 10 days later, they obtained 1043 hair samples and identified 23 different bears. During a second 10-day session they obtained 1191 samples from 19 different bears, where 4 of the 19 were from bears they had identified in the first batch.\n", "\n", "To estimate the population of bears from this data, we need a model for the probability that each bear will be observed during each session. As a starting place, we'll make the simplest assumption, that every bear in the population has the same (unknown) probability of being sampled during each round.\n", "\n", "We also need a prior distribution for the population. As a starting place, let's suppose that, prior to this study, an expert in this domain would have estimated that the population is between 100 and 500, and equally likely to be any value in that range.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Solution:\n", "\n", "Define:\n", "\n", "* N: population size\n", "\n", "* K: number of bears that have ever been identified\n", "\n", "* n: number of bears observed in the second second\n", "\n", "* k: the number of bears in the second session that had previously been identified\n", "\n", "\n", "For given values of N, K, and n, the distribution of k is the hypergeometric distribution:\n", "\n", "$PMF(k) = {K \\choose k}{N-K \\choose n-k}/{N \\choose n}$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Solution\n", "\n", "from scipy.special import binom\n", "\n", "class Grizzly(Suite):\n", " \"\"\"Represents hypotheses about how many bears there are.\"\"\"\n", "\n", " def Likelihood(self, data, hypo):\n", " \"\"\"Computes the likelihood of the data under the hypothesis.\n", "\n", " hypo: total population (N)\n", " data: # tagged (K), # caught (n), # of caught who were tagged (k)\n", " \"\"\"\n", " N = hypo\n", " K, n, k = data\n", "\n", " if hypo < K + (n - k):\n", " return 0\n", "\n", " like = binom(N-K, n-k) / binom(N, n)\n", " return like" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8.05801258299152e-06" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Solution\n", "\n", "hypos = range(100, 501)\n", "suite = Grizzly(hypos)\n", "\n", "data = 23, 19, 4\n", "suite.Update(data)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Solution\n", "\n", "thinkplot.Pdf(suite)\n", "thinkplot.Config(xlabel='Number of bears', ylabel='PMF', legend=False)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Posterior mean 193.93845891363907\n", "Maximum a posteriori estimate 109\n", "90% credible interval (105, 379)\n" ] } ], "source": [ "# Solution\n", "\n", "print('Posterior mean', suite.Mean())\n", "print('Maximum a posteriori estimate', suite.MaximumLikelihood())\n", "print('90% credible interval', suite.CredibleInterval(90))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Solution\n", "\n", "# Alternatively, we can take advantage of the `hypergeom`\n", "# object in scipy.stats.\n", "\n", "from scipy import stats\n", "\n", "class Grizzly2(Suite):\n", " \"\"\"Represents hypotheses about how many bears there are.\"\"\"\n", "\n", " def Likelihood(self, data, hypo):\n", " \"\"\"Computes the likelihood of the data under the hypothesis.\n", "\n", " hypo: total population (N)\n", " data: # tagged (K), # caught (n), # of caught who were tagged (k)\n", " \"\"\"\n", " N = hypo\n", " K, n, k = data\n", "\n", " if hypo < K + (n - k):\n", " return 0\n", "\n", " like = stats.hypergeom.pmf(k, N, K, n)\n", " return like" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.07135370142238903" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Solution\n", "\n", "hypos = range(100, 501)\n", "suite = Grizzly2(hypos)\n", "\n", "data = 23, 19, 4\n", "suite.Update(data)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Posterior mean 193.9384589136376\n", "Maximum a posteriori estimate 109\n", "90% credible interval (105, 379)\n" ] } ], "source": [ "# Solution\n", "\n", "print('Posterior mean', suite.Mean())\n", "print('Maximum a posteriori estimate', suite.MaximumLikelihood())\n", "print('90% credible interval', suite.CredibleInterval(90))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }