{ "metadata": { "name": "", "signature": "sha256:3345dba809db891401ac28f22492df296588c1af9241164c9c30420717772350" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Understanding Waiting Times Between Events with the Poisson and Exponential Distributions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A webhook POSTs to our database each time a particular event occurs on our website. We receive about two of these requests per minute. I was mindlessly monitoring the log files one day and noticed it had been roughly 90 seconds since our database had been hit by this request. Before worrying, though, I wondered how rare that observation is. What is the likelihood of waiting longer than 1.5 minutes for the next request?\n", "\n", "This is a probability problem that can be solved with an understanding of Poisson processes and the exponential distribution. A Poisson process is any process where independent events occur at constant known rate, e.g. babies are born at a hospital at a rate of three per hour, or calls come into a call center at a rate of 10 per minute. The exponential distribution is the probability distribution that models the waiting times between these events, e.g. the times between calls at the call center are exponentially distributed. To model Poisson processes and exponental distributions, we need to know two things: a time-unit $t$ and a rate $\\lambda$. " ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Poisson Distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start with the Poisson distribution: If we let $N(t)$ denote the number of events that occur between now and time $t$, then the probability that $n$ events occur within the next $t$ time-units, or $P(N(t) = n)$, is\n", "\n", "$$\n", " P(N(t) = n) = \\frac{(\\lambda t)^n e^{-\\lambda t}}{n!}\n", "$$\n", "\n", "As mentioned earlier, we receive an average of two requests from this webhook per minute. Thus, the time-unit $t$ is one minute and the rate $\\lambda$ is two. Knowing these, we can answer questions such as:\n", "\n", "* What is the probability that we receive no requests in the next two minutes?\n", "\n", "$$\n", " P(N(2) = 0) = \\frac{(2 \\cdot 2)^0 e^{-2 \\cdot 2}}{0!} = e^{-4} \\approx 0.0183\n", "$$\n", "\n", "* What is the probability that we receive at least two requests in the next three minutes?\n", "\n", "$$\n", "\\begin{aligned}\n", "P(N(3) \\geq 2) & = 1 - P(N(3) = 1) - P(N(3) = 0) \\\\\\\\\n", " & = 1 - \\frac{(2 \\cdot 3)^1 e^{-2 \\cdot 3}}{1!} - \\frac{(2 \\cdot 3)^0 e^{-2 \\cdot 3}}{0!} \\\\\\\\\n", " & = 1 - 6e^{-6} - e^{-6} \\\\\\\\\n", " & = 1 - 7e^{-6} \\\\\\\\\n", " & \\approx 0.9826\n", "\\end{aligned}\n", "$$\n", "\n", "For those who prefer reading code, we can write a class `Poisson` that's initialized with its rate $\\lambda$:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from math import pow, exp, factorial\n", "\n", "class Poisson:\n", "\n", " def __init__(self, rate):\n", " self.rate = rate\n", "\n", " def prob_exactly(self, n, t):\n", " rate = self.rate * t\n", " return pow(rate, n) * exp(-rate) / factorial(n)\n", "\n", " def prob_at_least(self, n, t):\n", " complements = range(n)\n", " total = 0.0\n", "\n", " for c in complements:\n", " p = self.prob_exactly(c, t)\n", " total += p\n", "\n", " return 1 - total\n", "\n", " def prob_at_most(self, n, t):\n", " return 1 - self.prob_at_least(n + 1, t)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To answer the same questions, we can create an instance of `Poisson` initialized with rate $\\lambda = 2$." ] }, { "cell_type": "code", "collapsed": false, "input": [ "pois = Poisson(2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want the probability that exactly `n = 0` events occur within `t = 2` minutes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pois.prob_exactly(0, 2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "0.01831563888873418" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And if we want the probability that at least `n = 2` events occur within `t = 3` minutes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pois.prob_at_least(2, 3)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "0.9826487347633355" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exponential Distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's move onto the exponential distribution. As mentioned earlier, the waiting times between events in a Poisson process are exponentially distributed. The exponential distribution can be derived from the Poisson distribution: Let $X$ be the waiting time between now and the next event. The probability that $X$ is greater than $t$ is identical to the probability that 0 events occur between now and time $t$, which we already know:\n", "\n", "$$\n", "P(X > t) = P(N(t) = 0) = \\frac{(\\lambda t)^0 e^{-\\lambda t}}{0!} = e^{-\\lambda t}\n", "$$\n", "\n", "We also know that the probability of $X$ being less than or equal to $t$ is the complement of $X$ being greater than $t$:\n", "\n", "$$\n", "P(X \\leq t) = 1 - P(X > t) = 1 - e^{-\\lambda t}\n", "$$\n", "\n", "Thus, the distribution function of the waiting times between events in a Poisson process is $1 - e^{-\\lambda t}$. With this, and recalling that our time-unit $t$ is one minute and our rate $\\lambda$ is two requests per minute, we can answer questions such as:\n", "\n", "* What is the probability that the next request occurs within 15 seconds?\n", "\n", "$$\n", "P(X \\leq 0.25) = 1 - e^{-0.25 \\cdot 2} = 1 - e^{-0.5} \\approx 0.3935\n", "$$\n", "\n", "* What is the probability that the next request is between 15 and 30 seconds from now?\n", "\n", "$$\n", "\\begin{aligned}\n", "P(0.25 \\leq X \\leq 0.5) & = P(X \\leq 0.5) - P(X \\leq 0.25) \\\\\\\\\n", " & = (1 - e^{-2 \\cdot 0.5}) - (1 - e^{-2 \\cdot 0.25}) \\\\\\\\\n", " & = e^{-0.5} - e^{-1} \\\\\\\\\n", " & \\approx 0.2387\n", "\\end{aligned}\n", "$$\n", "\n", "Again, for those who prefer reading code, let's write a class `Exponential` that's initialized with its rate $\\lambda$." ] }, { "cell_type": "code", "collapsed": false, "input": [ "class Exponential:\n", "\n", " def __init__(self, rate):\n", " self.rate = rate\n", "\n", " def prob_less_than_or_equal(self, t):\n", " rate = self.rate * t\n", " return 1 - exp(-rate)\n", "\n", " def prob_greater_than(self, t):\n", " return 1 - self.prob_less_than_or_equal(t)\n", "\n", " def prob_between(self, t1, t2):\n", " p1 = self.prob_less_than_or_equal(t1)\n", " p2 = self.prob_less_than_or_equal(t2)\n", "\n", " return p2 - p1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To answer the same questions, we can create an instance of `Exponential` initialized with the rate $\\lambda = 2$." ] }, { "cell_type": "code", "collapsed": false, "input": [ "expo = Exponential(2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want the probability that the next request occurs within `t = 0.25` minutes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "expo.prob_less_than_or_equal(0.25)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "0.3934693402873666" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And if we want the probability that the next request occurs between `t1 = 0.25` and `t2 = 0.5` minutes:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "expo.prob_between(0.25, 0.5)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "0.2386512185411911" ] } ], "prompt_number": 8 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, referring back to the original question: What is the probability of waiting longer than 1.5 minutes for the next request?\n", "\n", "$$\n", "P(X > 1.5) = e^{-2 \\cdot 1.5} = e^{-3} \\approx 0.0498\n", "$$\n", "\n", "The probability of waiting longer than 1.5 minutes for the next request is 4.98%." ] }, { "cell_type": "code", "collapsed": false, "input": [ "expo.prob_greater_than(1.5)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "0.04978706836786395" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this particular example, we could have answered the question with the Poisson distribution by finding $P(N(1.5) = 0))$, or the probability that `n = 0` events occur within `t = 1.5` minutes." ] }, { "cell_type": "code", "collapsed": false, "input": [ "pois.prob_exactly(0, 1.5)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "0.049787068367863944" ] } ], "prompt_number": 10 } ], "metadata": {} } ] }