{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m registry at `~/.julia/registries/General.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1m No Changes\u001b[22m\u001b[39m to `~/.julia/environments/v1.7/Project.toml`\n", "\u001b[32m\u001b[1m No Changes\u001b[22m\u001b[39m to `~/.julia/environments/v1.7/Manifest.toml`\n" ] } ], "source": [ "] add Distributions Plots" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "using Distributions, Plots\n", "default(leg=:none)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate some Poisson distributed sample data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n=60 # the number of samples\n", "λ=6.0\n", "\n", "# Generate a data set for testing\n", "dist = Poisson(λ)\n", "x = rand(dist, n)\n", "\n", "time = plot(x, xlabel=\"t\", ylabel=\"Spend per unit time\")\n", "hist = histogram(x, xlabel=\"Spend per unit time\", normed=true)\n", "plot(time, hist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimating the parameter from the sample data\n", "\n", "We can calculate the posterior distribution of $\\lambda$ and the predictive posterior distribution of the next sample based on the data seen up to time 't'." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = [sum(x[1:i]) for i in 1:n]\n", "\n", "alpha(t) = X[Int(t)]\n", "theta(t) = 1/(t+0)\n", "\n", "# Posterior distribution of λ\n", "posterior(t) = Gamma(alpha(t), theta(t))\n", "\n", "# Predictive posterior of next sample\n", "predictive(t) = NegativeBinomial(alpha(t), 1-theta(t))\n", "\n", "T=Int(round(n*0.1))+1 # observed % of test data\n", "plot(0:1:maximum(x), [x -> Distributions.pdf(posterior(T), x), x -> Distributions.pdf(predictive(T), x)], title=[\"Posterior distribution of λ\" \"Predictive distribution\"], xlabel=[\"λ\" \"predicted events per unit time\"], seriestype=:bar, layout=2, leg=:none)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior distribution is the current estimate of the mean. \n", "The predictive distribution is a Negative Binomial and incorporates the uncertainty in the value of λ into the Poisson distribution, essentially stretching it horizontally.\n", "\n", "See how the posterior distribution behaves over time and settles close to the known value. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "λes = [mean(posterior(i)) for i in 1:n]\n", "λlower = [quantile(posterior(i), 0.01)-λes[i] for i in 1:n]\n", "λupper = [λes[i]-quantile(posterior(i), 0.99) for i in 1:n]\n", "plot(λes, ribbon=(λlower, λupper), xlabel=\"t\", ylabel=\"λ\", leg=:none)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Limit spend based on current estimate of spend rate\n", "\n", "First, construct a whole ensemble of trajectories to use as a background guide." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "ensemble = []\n", "for j in 1:1000\n", "samples = Array{Real}(undef, n)\n", "for i in 1:n\n", " samples[i] = 12*rand() + ((i>1) ? samples[i-1] : 0)\n", "end\n", "push!(ensemble, samples)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to throttle the total spend to a specific value by only allowing a fraction of the samples through and we want to do it with 90% certainty. \n", "\n", "To do that we can calculate the overshoot factor with the data up to the current time, t, and apply it as a correcting factor to the samples." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "T = 300 # Desired total samples allowed through\n", "y = []\n", "for t in 1:n\n", " λest = quantile(posterior(t), 0.9)\n", " S = sum(x[1:t]) + (n-t)*λest # Estimated total\n", " factor = minimum([1.0, T/S])\n", " push!(y, x[t]*factor)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evolve the posterior distribution over time and see how the it narrows to the actual values. Display with the ensemble and the throttled samples." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Saved animation to \n", "│ fn = /Users/john/Projects/notebooks/tmp.gif\n", "└ @ Plots /Users/john/.julia/packages/Plots/1KWPG/src/animation.jl:114\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/Users/john/Projects/notebooks/tmp.gif\")" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Xpred = Array{Real}(undef, n)\n", "lower = Array{Real}(undef, n)\n", "upper = Array{Real}(undef, n)\n", "@gif for t in 5:n \n", " for i in 1:n\n", " Xpred[i] = i