{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Think Bayes\n", "\n", "Copyright 2018 Allen B. Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Configure Jupyter so figures appear in the notebook\n", "%matplotlib inline\n", "\n", "# Configure Jupyter to display the assigned value after an assignment\n", "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# import classes from thinkbayes2\n", "from thinkbayes2 import Pmf, Cdf, Suite, Joint\n", "\n", "import thinkplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Space Shuttle problem\n", "\n", "Here's a problem from [Bayesian Methods for Hackers](http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC2.ipynb)\n", "\n", ">On January 28, 1986, the twenty-fifth flight of the U.S. space shuttle program ended in disaster when one of the rocket boosters of the Shuttle Challenger exploded shortly after lift-off, killing all seven crew members. The presidential commission on the accident concluded that it was caused by the failure of an O-ring in a field joint on the rocket booster, and that this failure was due to a faulty design that made the O-ring unacceptably sensitive to a number of factors including outside temperature. Of the previous 24 flights, data were available on failures of O-rings on 23, (one was lost at sea), and these data were discussed on the evening preceding the Challenger launch, but unfortunately only the data corresponding to the 7 flights on which there was a damage incident were considered important and these were thought to show no obvious trend. The data are shown below (see [1](https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1989.10478858)):\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# !wget https://raw.githubusercontent.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/Chapter2_MorePyMC/data/challenger_data.csv" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "columns = ['Date', 'Temperature', 'Incident']\n", "df = pd.read_csv('challenger_data.csv', parse_dates=[0])\n", "df.drop(labels=[3, 24], inplace=True)\n", "df" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df['Incident'] = df['Damage Incident'].astype(float)\n", "df" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.scatter(df.Temperature, df.Incident, s=75, color=\"k\",\n", " alpha=0.5)\n", "plt.yticks([0, 1])\n", "plt.ylabel(\"Damage Incident?\")\n", "plt.xlabel(\"Outside temperature (Fahrenheit)\")\n", "plt.title(\"Defects of the Space Shuttle O-Rings vs temperature\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Grid algorithm\n", "\n", "We can solve the problem first using a grid algorithm, with parameters `b0` and `b1`, and\n", "\n", "$\\mathrm{logit}(p) = b0 + b1 * T$\n", "\n", "and each datum being a temperature `T` and a boolean outcome `fail`, which is true is there was damage and false otherwise.\n", "\n", "Hint: the `expit` function from `scipy.special` computes the inverse of the `logit` function." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from scipy.special import expit\n", "\n", "class Logistic(Suite, Joint):\n", " \n", " def Likelihood(self, data, hypo):\n", " \"\"\"\n", " \n", " data: T, fail\n", " hypo: b0, b1\n", " \"\"\"\n", " return 1" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "b0 = np.linspace(0, 50, 101);" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "b1 = np.linspace(-1, 1, 101);" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from itertools import product\n", "hypos = product(b0, b1)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "suite = Logistic(hypos);" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "for data in zip(df.Temperature, df.Incident):\n", " print(data)\n", " suite.Update(data)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "thinkplot.Pdf(suite.Marginal(0))\n", "thinkplot.decorate(xlabel='Intercept',\n", " ylabel='PMF',\n", " title='Posterior marginal distribution')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "thinkplot.Pdf(suite.Marginal(1))\n", "thinkplot.decorate(xlabel='Log odds ratio',\n", " ylabel='PMF',\n", " title='Posterior marginal distribution')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "According to the posterior distribution, what was the probability of damage when the shuttle launched at 31 degF?" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### MCMC\n", "\n", "Implement this model using MCMC. As a starting place, you can use this example from [the PyMC3 docs](https://docs.pymc.io/notebooks/GLM-logistic.html#The-model).\n", "\n", "As a challege, try writing the model more explicitly, rather than using the GLM module." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "import pymc3 as pm" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "pm.traceplot(trace);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior distributions for these parameters should be similar to what we got with the grid algorithm." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }