{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Think Bayes\n", "\n", "This notebook presents code and exercises from Think Bayes, second edition.\n", "\n", "Copyright 2018 Allen B. Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Configure Jupyter so figures appear in the notebook\n", "%matplotlib inline\n", "\n", "# Configure Jupyter to display the assigned value after an assignment\n", "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from thinkbayes2 import Pmf, Cdf, Suite, Joint\n", "import thinkplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The height problem\n", "\n", "For adult male residents of the US, the mean and standard deviation of height are 178 cm and 7.7 cm. For adult female residents the corresponding stats are 163 cm and 7.3 cm. Suppose you learn that someone is 170 cm tall. What is the probability that they are male? \n", "\n", "Run this analysis again for a range of observed heights from 150 cm to 200 cm, and plot a curve that shows P(male) versus height. What is the mathematical form of this function?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To represent the likelihood functions, I'll use `norm` from `scipy.stats`, which returns a \"frozen\" random variable (RV) that represents a normal distribution with given parameters.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from scipy.stats import norm\n", "\n", "dist_height = dict(male=norm(178, 7.7),\n", " female=norm(163, 7.3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write a class that implements `Likelihood` using the frozen distributions. Here's starter code:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "class Height(Suite):\n", " \n", " def Likelihood(self, data, hypo):\n", " \"\"\"\n", " data: height in cm\n", " hypo: 'male' or 'female'\n", " \"\"\"\n", " return 1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the prior." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "suite = Height(['male', 'female'])\n", "for hypo, prob in suite.Items():\n", " print(hypo, prob)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the update:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "suite.Update(170)\n", "for hypo, prob in suite.Items():\n", " print(hypo, prob)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the probability of being male as a function of height, for a range of values between 150 and 200." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are curious, you can derive the mathematical form of this curve from the PDF of the normal distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How tall is A?\n", "\n", "Suppose I choose two residents of the U.S. at random. A is taller than B. How tall is A?\n", "\n", "What if I tell you that A is taller than B by more than 5 cm. How tall is A?\n", "\n", "For adult male residents of the US, the mean and standard deviation of height are 178 cm and 7.7 cm. For adult female residents the corresponding stats are 163 cm and 7.3 cm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are distributions that represent the heights of men and women in the U.S." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "dist_height = dict(male=norm(178, 7.7),\n", " female=norm(163, 7.3))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "hs = np.linspace(130, 210)\n", "ps = dist_height['male'].pdf(hs)\n", "male_height_pmf = Pmf(dict(zip(hs, ps)));" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "ps = dist_height['female'].pdf(hs)\n", "female_height_pmf = Pmf(dict(zip(hs, ps)));" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "thinkplot.Pdf(male_height_pmf, label='Male')\n", "thinkplot.Pdf(female_height_pmf, label='Female')\n", "\n", "thinkplot.decorate(xlabel='Height (cm)',\n", " ylabel='PMF',\n", " title='Adult residents of the U.S.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `thinkbayes2.MakeMixture` to make a `Pmf` that represents the height of all residents of the U.S." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write a class that inherits from Suite and Joint, and provides a Likelihood function that computes the probability of the data under a given hypothesis." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write a function that initializes your `Suite` with an appropriate prior." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "suite = make_prior(mix)\n", "suite.Total()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "thinkplot.Contour(suite)\n", "thinkplot.decorate(xlabel='B Height (cm)',\n", " ylabel='A Height (cm)',\n", " title='Posterior joint distribution')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Update your `Suite`, then plot the joint distribution and the marginal distribution, and compute the posterior means for `A` and `B`." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }