{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Think Bayes\n", "\n", "Copyright 2018 Allen B. Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Configure Jupyter so figures appear in the notebook\n", "%matplotlib inline\n", "\n", "# Configure Jupyter to display the assigned value after an assignment\n", "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# import classes from thinkbayes2\n", "from thinkbayes2 import Pmf, Cdf, Suite, Joint\n", "\n", "from thinkbayes2 import MakePoissonPmf, EvalBinomialPmf, MakeMixture\n", "\n", "import thinkplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lions and Tigers and Bears\n", "\n", "Suppose we visit a wild animal preserve where we know that the only animals are lions and tigers and bears, but we don't know how many of each there are.\n", "\n", "During the tour, we see 3 lions, 2 tigers and one bear. Assuming that every animal had an equal chance to appear in our sample, estimate the prevalence of each species.\n", "\n", "What is the probability that the next animal we see is a bear?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Grid algorithm\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class LionsTigersBears(Suite, Joint):\n", " \n", " def Likelihood(self, data, hypo):\n", " \"\"\"\n", " \n", " data: string 'L' , 'T', 'B'\n", " hypo: p1, p2, p3\n", " \"\"\"\n", " # Fill this in." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "ps = np.linspace(0, 1, 101);" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from itertools import product\n", "\n", "def enumerate_triples(ps):\n", " for p1, p2, p3 in product(ps, ps, ps):\n", " if p1+p2+p3 == 1:\n", " yield p1, p2, p3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write a better version of `enumerate_triples` that doesn't run into problems with floating-point." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "suite = LionsTigersBears(enumerate_triples(ps));" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def plot_marginal_pmfs(joint):\n", " pmf_lion = joint.Marginal(0)\n", " pmf_tiger = joint.Marginal(1)\n", " pmf_bear = joint.Marginal(2)\n", "\n", " thinkplot.Pdf(pmf_lion, label='lions')\n", " thinkplot.Pdf(pmf_tiger, label='tigers')\n", " thinkplot.Pdf(pmf_bear, label='bears')\n", " \n", " thinkplot.decorate(xlabel='Prevalence',\n", " ylabel='PMF')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "plot_marginal_pmfs(suite)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "for data in 'LLLTTB':\n", " suite.Update(data)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "plot_marginal_pmfs(suite)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def plot_marginal_cdfs(joint):\n", " pmf_lion = joint.Marginal(0)\n", " pmf_tiger = joint.Marginal(1)\n", " pmf_bear = joint.Marginal(2)\n", "\n", " thinkplot.Cdf(pmf_lion.MakeCdf(), label='lions')\n", " thinkplot.Cdf(pmf_tiger.MakeCdf(), label='tigers')\n", " thinkplot.Cdf(pmf_bear.MakeCdf(), label='bears')\n", " \n", " thinkplot.decorate(xlabel='Prevalence',\n", " ylabel='CDF')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "plot_marginal_cdfs(suite)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the Dirichlet object" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from thinkbayes2 import Dirichlet\n", "\n", "def DirichletMarginal(dirichlet, i):\n", " return dirichlet.MarginalBeta(i).MakePmf()\n", "\n", "Dirichlet.Marginal = DirichletMarginal" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "dirichlet = Dirichlet(3)\n", "plot_marginal_pmfs(dirichlet)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "dirichlet.Update((3, 2, 1))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "plot_marginal_pmfs(dirichlet)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "plot_marginal_cdfs(dirichlet)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "thinkplot.PrePlot(6)\n", "plot_marginal_cdfs(dirichlet)\n", "plot_marginal_cdfs(suite)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### MCMC\n", "\n", "Implement this model using MCMC. You might want to start with [this example](http://christianherta.de/lehre/dataScience/bayesian/Multinomial-Dirichlet.slides.php)." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "import pymc3 as pm" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "observed = [0,0,0,1,1,2]\n", "k = len(Pmf(observed))\n", "a = np.ones(k)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "model = pm.Model()\n", "\n", "with model:\n", " \"\"\"FILL THIS IN\"\"\"" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def plot_trace_cdfs(trace):\n", " rows = trace['ps'].transpose()\n", "\n", " cdf_lion = Cdf(rows[0])\n", " cdf_tiger = Cdf(rows[1])\n", " cdf_bear = Cdf(rows[2])\n", "\n", " thinkplot.Cdf(cdf_lion, label='lions')\n", " thinkplot.Cdf(cdf_tiger, label='tigers')\n", " thinkplot.Cdf(cdf_bear, label='bears')\n", " \n", " thinkplot.decorate(xlabel='Prevalence',\n", " ylabel='CDF')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "#plot_trace_cdfs(trace)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "#pmf = Pmf(trace['xs'][0])\n", "#thinkplot.Hist(pmf)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "with model:\n", " start = pm.find_MAP()\n", " step = pm.Metropolis()\n", " trace = pm.sample(1000, start=start, step=step, tune=1000)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "pm.traceplot(trace);" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "plot_trace_cdfs(trace)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "thinkplot.PrePlot(6)\n", "plot_marginal_cdfs(dirichlet)\n", "plot_trace_cdfs(trace)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }