{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bayesian Zig Zag\n", "\n", "Developing probabilistic models using grid methods and MCMC.\n", "\n", "Thanks to Chris Fonnesback for his help with this notebook, and to Colin Carroll, who added features to pymc3 to support some of these examples.\n", "\n", "Copyright 2018 Allen Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n", "\n", "import numpy as np\n", "import pymc3 as pm\n", "\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulating hockey\n", "\n", "I'll model hockey as a Poisson process, where each team has some long-term average scoring rate, lambda, in goals per game.\n", "\n", "For the first example, I'll assume that lambda is somehow known to be 2.7. Since regulation play (as opposed to overtime) is 60 minutes, we can compute the goal scoring rate per minute." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.045000000000000005, 0.0020250000000000003)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lam_per_game = 2.7\n", "min_per_game = 60\n", "lam_per_min = lam_per_game / min_per_game\n", "lam_per_min, lam_per_min**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we assume that a goal is equally likely during any minute of the game, and we ignore the possibility of scoring more than one goal in the same minute, we can simulate a game by generating one random value each minute." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def simulate_game(p, n=60):\n", " goals = np.random.choice([0, 1], n, p=[1-p, p])\n", " return np.sum(goals)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And simulate 10 games." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 6, 2, 3, 0, 1, 3, 2, 2, 1]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "size = 10\n", "sample = [simulate_game(lam_per_min) for i in range(size)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we simulate 1000 games, we can see what the distribution looks like. The average of this sample should be close to lam_per_game." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2.706, 2.7)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "size = 1000\n", "sample_sim = [simulate_game(lam_per_min) for i in range(size)]\n", "np.mean(sample_sim), lam_per_game" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PMFs\n", "\n", "To visualize distributions, I'll start with a probability mass function (PMF), which I'll implement using a Counter.\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from collections import Counter\n", "\n", "class Pmf(Counter):\n", " \n", " def normalize(self):\n", " \"\"\"Normalizes the PMF so the probabilities add to 1.\"\"\"\n", " total = sum(self.values())\n", " for key in self:\n", " self[key] /= total\n", " \n", " def sorted_items(self):\n", " \"\"\"Returns the outcomes and their probabilities.\"\"\"\n", " return zip(*sorted(self.items()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some functions for plotting PMFs." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "plot_options = dict(linewidth=3, alpha=0.6)\n", "\n", "def underride(options):\n", " \"\"\"Add key-value pairs to d only if key is not in d.\n", "\n", " options: dictionary\n", " \"\"\"\n", "\n", " for key, val in plot_options.items():\n", " options.setdefault(key, val)\n", " return options\n", "\n", "def plot(xs, ys, **options):\n", " \"\"\"Line plot with plot_options.\"\"\"\n", " plt.plot(xs, ys, **underride(options))\n", "\n", "def bar(xs, ys, **options):\n", " \"\"\"Bar plot with plot_options.\"\"\"\n", " plt.bar(xs, ys, **underride(options))\n", "\n", "def plot_pmf(sample, **options):\n", " \"\"\"Compute and plot a PMF.\"\"\"\n", " pmf = Pmf(sample)\n", " pmf.normalize()\n", " xs, ps = pmf.sorted_items()\n", " bar(xs, ps, **options)\n", " \n", "def decorate_pmf_goals():\n", " \"\"\"Decorate the axes.\"\"\"\n", " plt.xlabel('Number of goals')\n", " plt.ylabel('PMF')\n", " plt.title('Distribution of goals scored')\n", " legend()\n", " \n", "def legend(**options):\n", " \"\"\"Draw a legend only if there are labeled items.\n", " \"\"\"\n", " ax = plt.gca()\n", " handles, labels = ax.get_legend_handles_labels()\n", " if len(labels):\n", " plt.legend(**options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what the results from the simulation look like." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "