{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m registry at `~/.julia/registries/General.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1m No Changes\u001b[22m\u001b[39m to `~/.julia/environments/v1.7/Project.toml`\n",
"\u001b[32m\u001b[1m No Changes\u001b[22m\u001b[39m to `~/.julia/environments/v1.7/Manifest.toml`\n"
]
}
],
"source": [
"] add Distributions Plots"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"using Distributions, Plots\n",
"default(leg=:none)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generate some Poisson distributed sample data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n=60 # the number of samples\n",
"λ=6.0\n",
"\n",
"# Generate a data set for testing\n",
"dist = Poisson(λ)\n",
"x = rand(dist, n)\n",
"\n",
"time = plot(x, xlabel=\"t\", ylabel=\"Spend per unit time\")\n",
"hist = histogram(x, xlabel=\"Spend per unit time\", normed=true)\n",
"plot(time, hist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Estimating the parameter from the sample data\n",
"\n",
"We can calculate the posterior distribution of $\\lambda$ and the predictive posterior distribution of the next sample based on the data seen up to time 't'."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = [sum(x[1:i]) for i in 1:n]\n",
"\n",
"alpha(t) = X[Int(t)]\n",
"theta(t) = 1/(t+0)\n",
"\n",
"# Posterior distribution of λ\n",
"posterior(t) = Gamma(alpha(t), theta(t))\n",
"\n",
"# Predictive posterior of next sample\n",
"predictive(t) = NegativeBinomial(alpha(t), 1-theta(t))\n",
"\n",
"T=Int(round(n*0.1))+1 # observed % of test data\n",
"plot(0:1:maximum(x), [x -> Distributions.pdf(posterior(T), x), x -> Distributions.pdf(predictive(T), x)], title=[\"Posterior distribution of λ\" \"Predictive distribution\"], xlabel=[\"λ\" \"predicted events per unit time\"], seriestype=:bar, layout=2, leg=:none)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The posterior distribution is the current estimate of the mean. \n",
"The predictive distribution is a Negative Binomial and incorporates the uncertainty in the value of λ into the Poisson distribution, essentially stretching it horizontally.\n",
"\n",
"See how the posterior distribution behaves over time and settles close to the known value. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"λes = [mean(posterior(i)) for i in 1:n]\n",
"λlower = [quantile(posterior(i), 0.01)-λes[i] for i in 1:n]\n",
"λupper = [λes[i]-quantile(posterior(i), 0.99) for i in 1:n]\n",
"plot(λes, ribbon=(λlower, λupper), xlabel=\"t\", ylabel=\"λ\", leg=:none)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Limit spend based on current estimate of spend rate\n",
"\n",
"First, construct a whole ensemble of trajectories to use as a background guide."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"ensemble = []\n",
"for j in 1:1000\n",
"samples = Array{Real}(undef, n)\n",
"for i in 1:n\n",
" samples[i] = 12*rand() + ((i>1) ? samples[i-1] : 0)\n",
"end\n",
"push!(ensemble, samples)\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to throttle the total spend to a specific value by only allowing a fraction of the samples through and we want to do it with 90% certainty. \n",
"\n",
"To do that we can calculate the overshoot factor with the data up to the current time, t, and apply it as a correcting factor to the samples."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"T = 300 # Desired total samples allowed through\n",
"y = []\n",
"for t in 1:n\n",
" λest = quantile(posterior(t), 0.9)\n",
" S = sum(x[1:t]) + (n-t)*λest # Estimated total\n",
" factor = minimum([1.0, T/S])\n",
" push!(y, x[t]*factor)\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Evolve the posterior distribution over time and see how the it narrows to the actual values. Display with the ensemble and the throttled samples."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"┌ Info: Saved animation to \n",
"│ fn = /Users/john/Projects/notebooks/tmp.gif\n",
"└ @ Plots /Users/john/.julia/packages/Plots/1KWPG/src/animation.jl:114\n"
]
},
{
"data": {
"text/html": [
""
],
"text/plain": [
"Plots.AnimatedGif(\"/Users/john/Projects/notebooks/tmp.gif\")"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Xpred = Array{Real}(undef, n)\n",
"lower = Array{Real}(undef, n)\n",
"upper = Array{Real}(undef, n)\n",
"@gif for t in 5:n \n",
" for i in 1:n\n",
" Xpred[i] = i