{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Think Bayes\n",
    "\n",
    "Copyright 2018 Allen B. Downey\n",
    "\n",
    "MIT License: https://opensource.org/licenses/MIT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configure Jupyter so figures appear in the notebook\n",
    "%matplotlib inline\n",
    "\n",
    "# Configure Jupyter to display the assigned value after an assignment\n",
    "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "# import classes from thinkbayes2\n",
    "from thinkbayes2 import Pmf, Cdf, Suite, Joint\n",
    "\n",
    "import thinkplot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Space Shuttle problem\n",
    "\n",
    "Here's a problem from [Bayesian Methods for Hackers](http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC2.ipynb)\n",
    "\n",
    ">On January 28, 1986, the twenty-fifth flight of the U.S. space shuttle program ended in disaster when one of the rocket boosters of the Shuttle Challenger exploded shortly after lift-off, killing all seven crew members. The presidential commission on the accident concluded that it was caused by the failure of an O-ring in a field joint on the rocket booster, and that this failure was due to a faulty design that made the O-ring unacceptably sensitive to a number of factors including outside temperature. Of the previous 24 flights, data were available on failures of O-rings on 23, (one was lost at sea), and these data were discussed on the evening preceding the Challenger launch, but unfortunately only the data corresponding to the 7 flights on which there was a damage incident were considered important and these were thought to show no obvious trend. The data are shown below (see [1](https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1989.10478858)):\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !wget https://raw.githubusercontent.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/Chapter2_MorePyMC/data/challenger_data.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "columns = ['Date', 'Temperature', 'Incident']\n",
    "df = pd.read_csv('challenger_data.csv', parse_dates=[0])\n",
    "df.drop(labels=[3, 24], inplace=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['Incident'] = df['Damage Incident'].astype(float)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.scatter(df.Temperature, df.Incident, s=75, color=\"k\",\n",
    "            alpha=0.5)\n",
    "plt.yticks([0, 1])\n",
    "plt.ylabel(\"Damage Incident?\")\n",
    "plt.xlabel(\"Outside temperature (Fahrenheit)\")\n",
    "plt.title(\"Defects of the Space Shuttle O-Rings vs temperature\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grid algorithm\n",
    "\n",
    "We can solve the problem first using a grid algorithm, with parameters `b0` and `b1`, and\n",
    "\n",
    "$\\mathrm{logit}(p) = b0 + b1 * T$\n",
    "\n",
    "and each datum being a temperature `T` and a boolean outcome `fail`, which is true is there was damage and false otherwise.\n",
    "\n",
    "Hint: the `expit` function from `scipy.special` computes the inverse of the `logit` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.special import expit\n",
    "\n",
    "class Logistic(Suite, Joint):\n",
    "    \n",
    "    def Likelihood(self, data, hypo):\n",
    "        \"\"\"\n",
    "        \n",
    "        data: T, fail\n",
    "        hypo: b0, b1\n",
    "        \"\"\"\n",
    "        return 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "b0 = np.linspace(0, 50, 101);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "b1 = np.linspace(-1, 1, 101);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import product\n",
    "hypos = product(b0, b1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "suite = Logistic(hypos);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "for data in zip(df.Temperature, df.Incident):\n",
    "    print(data)\n",
    "    suite.Update(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "thinkplot.Pdf(suite.Marginal(0))\n",
    "thinkplot.decorate(xlabel='Intercept',\n",
    "                   ylabel='PMF',\n",
    "                   title='Posterior marginal distribution')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "thinkplot.Pdf(suite.Marginal(1))\n",
    "thinkplot.decorate(xlabel='Log odds ratio',\n",
    "                   ylabel='PMF',\n",
    "                   title='Posterior marginal distribution')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "According to the posterior distribution, what was the probability of damage when the shuttle launched at 31 degF?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MCMC\n",
    "\n",
    "Implement this model using MCMC.  As a starting place, you can use this example from [the PyMC3 docs](https://docs.pymc.io/notebooks/GLM-logistic.html#The-model).\n",
    "\n",
    "As a challege, try writing the model more explicitly, rather than using the GLM module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pymc3 as pm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "pm.traceplot(trace);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The posterior distributions for these parameters should be similar to what we got with the grid algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}