{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bayesian Inference in FSharp\n", "\n", "\n", "## Introduction\n", "\n", "For my F# Advent submission this year, I decided to implement an extremely simple Bayesian Statistical Framework to be used for inference. The purpose of statistical inference is to infer characteristics of a population via samples. The \"Bayesian\" prefix refers to making use of the Bayes Theorem to conduct inferential statistics.\n", "\n", "![image.png](./Images/BayesTheoremNeon.jpeg)\n", "\n", "I like to think of Statistical Inference via a pasta cooking analogy: hypothetically, assume you lose the cooking instructions of dried pasta you are trying to cook. You start boiling the water and don't know when the pasta is al-dente. One way to check is to careful remove a piece or two and test if it is cooked to your liking. Through this process, you'll know right away if the pasta needs to cook more or is ready to be sauce'd up. Similarly, inference deals with trying to figure out characteristics (__is the pasta ready?__) of a population (__all the pasta__) via samples (__a single or few pieces of the pasta of your choice__).\n", "\n", "![image.png](./Images/pasta.jpg)\n", "\n", "F#, as a language, didn't fail to deliver an awesome development experience! Admittedly, I hadn't used F# much in the past couple of months because of my deep dives into R and Python. However, within little to no-time a combination of muscle memory and ease of usage bumped up my productivity. The particular aspects of the language that made it easy to develop a Bayesian Statistical Framework are [Pattern Matching](https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching) and Immutable Functional Data Structures such as [Records](https://fsharpforfunandprofit.com/posts/records/) and [Discriminated Unions](https://fsharpforfunandprofit.com/posts/discriminated-unions/) that make expressing the domain succinctly and lucidly not only for the developer but also the reader." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Motivation\n", "\n", "I recently received a Masters in Data Science but discovered that none of my courses dove deep into Bayesian Statistics. Therefore, motivated by a penchant desire to fully understand the differences between frequentist approaches and Bayesian ones to eventually gain an understanding of Bayesian Neural Networks. I, consequently, set out to dive deep into the world of Bayesian Statistics.\n", "\n", "While reading [\"A First Course in Bayesian Statistical Methods\"](https://www.amazon.com/Bayesian-Statistical-Methods-Springer-Statistics/dp/1441928286/ref=sr_1_2?crid=3D7COZ06U9AU1&dchild=1&keywords=a+first+course+in+bayesian+statistical+methods&qid=1608705310&sprefix=A+first+course+in+baye%2Caps%2C239&sr=8-2) by Peter Hoff (highly recommend this book to those with a strong mathematical background) and [this](https://www.youtube.com/playlist?list=PLFDbGp5YzjqXQ4oE4w9GVWdiokWB9gEpm) playlist by [Ben Lambert](https://ben-lambert.com/bayesian/), I discovered [pymc3](https://docs.pymc.io/), a Probabilistic Programming library in Python. Pymc3, in my opinion, has some great abstractions that I wanted to implement myself in a functional-first setting. \n", "\n", "The best way to learn a new statistical algorithm is my implementing it and the best way to learn something well is to teach it to others; this submission is a result of that ideology." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bayes Theorem\n", "\n", "The crux of Bayesian Inference lies in making use of the Bayes Theorem to get the __Posterior__:\n", "\n", "\\begin{align}\n", "p(\\theta \\mid y) = \\frac{p(y \\mid \\theta) p(\\theta)}{p(y)}\n", "\\end{align}\n", "\n", "From an inferential perspective, here are the interpretations of the terms of theorem:\n", "\n", "| Term | Mathematical Representation | Definition |\n", "| :-----------------|-----------------------------|:------------|\n", "| __Variable to Infer__ | $\\theta$ | The variable we are trying to infer details about from a population i.e. trying to figure out if that variable is characteristic of the population. \n", "| __Observations__ | $y$ | Observations obtained to validate our hypothesis that $\\theta$ is characteristic of the true population. \n", "| __Prior__ | $p(\\theta)$ | The likelihood of the hypothesis that $\\theta$ represents the true population __before__ obtaining any observations\n", "| __Likelihood__ | $p(y \\mid \\theta)$ | Also known as the __Sampling Model__, is the likelihood that $y$ would be the outcome of our experiment if we knew $\\theta$ was representative of the true population.\n", "| __Evidence__ | $p(y)$ | Likelihood of observing the data we have collected in all circumstances regardless of our hypothesis. This quantity can also be interpretted as the normalizing constant that causes our posterior probability to be a real number between 0.0 and 1.0. It's important to note that this quantity, out of the box, doesn't depend on the variable we are trying to infer. |\n", "| __Posterior__ | $p(\\theta \\mid y)$ | Likelihood of our hypothesis that $\\theta$ is characteristic of the true population after observing the data or in light of new information. |\n", "\n", "In my opinion, Bayes Theorem resonates with the [Scientific Method](https://en.wikipedia.org/wiki/Scientific_method) as we are effectively trying to quantify the validity of a hypothesis in light of new information i.e. compute the posterior distribution. As an important observational aside, the type of the output of Bayes Theorem i.e. the type of is a __probability distribution__ rather than a __point-estimate__, which has some consequences in practice." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bayesian Inference in Practice\n", "\n", "In practice, the denominator or the __Evidence__ i.e. the normalization constant of the Bayes theorem is a pain to calculate. For example, for continuous variables, Bayes theorem has the following form via an application of the [Law of Total Probability](https://corporatefinanceinstitute.com/resources/knowledge/other/total-probability-rule/):\n", "\n", "\\begin{align}\n", "p(\\theta \\mid y) = \\frac{p(y \\mid \\theta) p(\\theta)}{\\int_{\\Theta}p(y \\mid \\tilde{\\theta}) p(\\tilde{\\theta}) \\; d\\tilde{\\theta}}\n", "\\end{align}\n", "\n", "That integral in the denominator is UGLY and sometimes computational infeasible and so, approximation techniques must be used to get to generate the posterior distribution: enter __Markov-Chain Monte Carlo (MCMC)__ methods taking advantage of the fact that the denominator is a normalization __constant__ i.e. the posterior isn't proportional to the evidence but is proportional to the product of the prior and likelihood:\n", "\n", "\\begin{align}\n", "p(\\theta \\mid y) \\propto p(y \\mid \\theta) p(\\theta)\n", "\\end{align}\n", "\n", "### Markov-Chain Monte-Carlo (MCMC) Algorithms\n", "\n", "MCMC algorithms help avoid the issue of the ugly sometimes infeasible denominator of Bayes Theorem by giving us approximations of the posterior distribution. For the sake of clarity to give a definition for these algorithms, let's break up the conjugated terms:\n", "\n", "#### Monte-Carlo\n", "\n", "This term refers to fact that we make use of random numbers via a random number generator to approximate the posterior distribution to some reasonable degree of precision. The premise for the use of the Monte-Carlo method is that of the [Law of Large Numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers), which states that the average of the results of an experiment of a large number of iterations or trials tends to get closer to the expected value. \n", "\n", "The use case here is the large number of iterations guided by random number generators to cleverly approximate the posterior distribution will eventually resemble the posterior distribution.\n", "\n", "#### Markov-Chain\n", "\n", "_\"The future only depends on the present, not the past\"_ - Peter Hoff (knicked this from Peter Hoff's AWESOME book mentioned before) encapsulates the idea behind Markov-Chains. Markov-Chains are a sequence of events whose probability is only a function of the previous event. To a computer scientist, a Markov-Chain can be thought of as a linked list (Thanks Steve!) of approximations whose values via Monte-Carlo methods typify the posterior.\n", "\n", "#### Metropolis Hastings Algorithm\n", "\n", "The Metropolis Hastings Algorithm is one choice of a MCMC Algorithm that can be used to approximate the posterior distribution. This algorithm, for each chain in the Markov-Chains, decides at each step or iteration if a random move in parameter space to approximate the posterior is worth keeping or rejecting based on a criteria that'll describe later. The specific flavor of Metropolis Hastings I make use of is the Symmetric Metropolis Hastings algorithm and I'll elucidate on its implementation in a section below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goals\n", "\n", "The goals of this submission are to develop the following:\n", "\n", "1. __A Bayesian Domain Specific Language (DSL) and Its Parser__ that a statistician with no programming experience can understand and use to specify a model and its parameters. Once our user specifies the model and the parameters, the parser associated with the Bayesian DSL converts this model into something digestable by the framework.\n", "2. __A Bayesian Network__ from the Parsed Bayesian DSL addressing the simplest case i.e. one random variable representing the prior and one for the likelihood in the form of a Directed Acylical Graph (DAG).\n", "3. __Symmeteric Metropolis-Hastings__, an MCMC algorithm that makes use of the Bayesian DSL representation to approximate the posterior distribution.\n", "\n", "Here is the proposed workflow I have developed in this submission; I will continue to add to it as time goes on but it is a start:\n", "![image](./Images/Plan.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Setup\n", "\n", "The inferential logic will require minimal dependencies:\n", "\n", "| Dependency | Use |\n", "| :-----------------| :----------: | \n", "| __XPlot.Plotly__ | Used for charting the histogram of the distributions.\n", "| __MathNet.Numerics__ | Used for the out-of-the-box statistical distributions. |\n", "| __Newtonsoft.Json__ | Parsing the parameters of the Bayesian DSL. |\n", "\n", "I wrote this notebook via the .net-fsharp kernel via .NET Interactive and found it great to use! This [blogpost](https://devblogs.microsoft.com/dotnet/net-core-with-juypter-notebooks-is-here-preview-1/) was pretty helpful to get me started." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Installing package XPlot.Plotly.done!" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Installing package MathNet.Numerics.done!" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Installing package Newtonsoft.Json.done!" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// Dependencies\n", "#r \"nuget: XPlot.Plotly\"\n", "#r \"nuget: MathNet.Numerics\"\n", "#r \"nuget: Newtonsoft.Json\"\n", "\n", "// Imports \n", "open XPlot.Plotly\n", "open MathNet.Numerics\n", "open Newtonsoft.Json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A Bayesian Domain Specific Language (DSL) and Its Parser\n", "\n", "![plan1](./Images/Plan_Section1.png)\n", "\n", "The goal of this section is to develop a Bayesian Domain Specific Language (DSL) that a statistician with no programming experience can understand and use to specify a model and its parameters. \n", "\n", "Drawing inspiration from [Stan](https://mc-stan.org) and pymc3 tutorials such as this [one](https://docs.pymc.io/notebooks/stochastic_volatility.html#Stochastic-Volatility-model), I wanted this DSL to be extremely simple, albeit, complete and therefore, the components I found that describe each random variable of the Bayesian Model are:\n", "\n", "1. The Name of the Random Variable \n", "2. Conditionals of the Random Variables i.e. what other variables are given to be true to completely specify the random variable\n", "3. The distribution associated with the random variable.\n", "4. The parameters as variables of the distribution e.g. for a Normal Distribution, the parameters will be $\\mu$ and $\\sigma$, the mean and the variance respectively. \n", "5. The observed data for the Random Variable - if the random variable has \n", "6. A map of the parameters to constants to decouple the model from the values associated with the model.\n", "\n", "I also wanted this DSL to be a first class citizen in the eyes of a Bayesian statistician that specifies models using probability distributions.\n", "\n", "### Format\n", "\n", "```\n", "Random Variable Name [|Comma-separated Conditionals] ~ Distribution(Comma-separated Parameters without spaces) [: observed] \n", "```\n", "\n", "The details enclosed in ``[]`` imply optionality.\n", "\n", "### Example\n", "\n", "```\n", "θ ~ Gamma(a,b)\n", "Y|θ ~ Poisson(θ) : observed1\n", "Z|Y,θ ~ Beta(θ, Y) : observed2\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Domain\n", "\n", "The domain or the types associated with the representation of the ``ParsedRandomVariable`` representing a single line in the specified model and the ``ParsedBayesianModel`` is a list of these." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "type ParsedRandomVariable = \n", " { Name : string\n", " Conditionals : string list\n", " Distribution : string\n", " Parameters : string list\n", " Observed : string option }\n", "\n", "type ParsedBayesianModel = ParsedRandomVariable list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parsing Logic\n", "\n", "The parsing logic for each line in the user specified model is as follows:\n", "1. Split the line by spaces.\n", "2. Extract the Name and the Conditionals from the first part of the model split on the tilde (~). ``Random Variable Name | [Conditionals]``.\n", "3. From the second part of the model split on the tilde, get the name of the distribution and its associated parameters along with the optionally available observed variable.\n", "\n", "F#'s awesome pattern matching was a lifesaver here not only for it's ease of use but it's automatic ability to specify the failure cases. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "open System\n", "\n", "// Format: RVName [|Conditionals] ~ Distribution(Parameters) [: observed] \n", "// [] -> optional\n", "// NOTE: There can't be any spaces in the distribution parameters.\n", "let parseLineOfModel (lineInModel : string) : ParsedRandomVariable = \n", " \n", " // Helper fn to split the string based on a variety of type of delimiters.\n", " // Resultant type is a list of strings to feed in for the pattern matching.\n", " let splitToList (toSplit : string) (delimiters : obj) : string list = \n", " let split = \n", " match delimiters with\n", " | :? string as s -> toSplit.Split(s, StringSplitOptions.RemoveEmptyEntries) \n", " | :? array as arr -> toSplit.Split(arr, StringSplitOptions.RemoveEmptyEntries) \n", " | :? array as arr -> toSplit.Split(arr, StringSplitOptions.RemoveEmptyEntries) \n", " | _ -> failwithf \"Splitting based on delimiters failed as it is neither a string nor an array of strings: Of Type: %A - %A\" (delimiters.GetType()) toSplit\n", " \n", " Array.toList split\n", "\n", " match splitToList lineInModel \" \" with\n", " | nameAndConditionals :: \"~\" :: distributionParametersObserved ->\n", " // Get the name and conditionals.\n", " let splitNameAndConditionals = splitToList nameAndConditionals \"|\"\n", " let name = splitNameAndConditionals.[0]\n", " let conditionals = \n", " match splitNameAndConditionals with \n", " | name :: conditionals -> \n", " if conditionals.Length > 0 then splitToList conditionals.[0] \",\"\n", " else []\n", " | _ -> failwithf \"Pattern not found for RV Name and Conditionals - the format is: RVName|Condtionals: %A\" splitNameAndConditionals\n", "\n", " let extractAndGetParameters (distributionNameAndParameters : string) : string * string list = \n", " let splitDistributionAndParameters = splitToList distributionNameAndParameters [| \"(\"; \")\" |]\n", " (splitDistributionAndParameters.[0], splitToList splitDistributionAndParameters.[1] \",\")\n", " \n", " match distributionParametersObserved with \n", "\n", " // Case: Without Observations. Example: θ ~ Gamma(a,b)\n", " | distributionNameAndParameters when distributionNameAndParameters.Length = 1 ->\n", " let extractedDistributionAndParameters = extractAndGetParameters distributionNameAndParameters.[0]\n", " { Name = name; \n", " Conditionals = conditionals; \n", " Distribution = (fst extractedDistributionAndParameters).ToLower();\n", " Observed = None; \n", " Parameters = snd extractedDistributionAndParameters; }\n", "\n", " // Case: With Observations. Example: Y|θ ~ Poisson(θ) : observed\n", " | distributionNameAndParameters :: \":\" :: observed ->\n", " let extractedDistributionAndParameters = extractAndGetParameters distributionNameAndParameters\n", " { Name = name;\n", " Conditionals = conditionals; \n", " Distribution = (fst extractedDistributionAndParameters).ToLower();\n", " Observed = Some observed.Head; // Only 1 observed list permitted.\n", " Parameters = snd extractedDistributionAndParameters; } \n", "\n", " // Case: Error.\n", " | _ -> failwithf \"Pattern not found for the model while parsing the distribution, parameters and optionally, the observed variables: %A\" distributionParametersObserved \n", "\n", " | _ -> failwithf \"Pattern not found for the following line in the model - please check the syntax: %A\" lineInModel\n", "\n", "let parseModel (model : string) : ParsedBayesianModel = \n", " model.Split('\\n') \n", " |> Array.map(parseLineOfModel)\n", " |> Array.toList\n", "\n", "// Helper to print out the Parsed Bayesian Model\n", "let printParsedModel (model : string) : unit = \n", " let parsedModel = parseModel model\n", " printfn \"Model: %A is represented as %A\" model parsedModel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Examples of Parsing a User-Specified DSL" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"θ ~ Gamma(a,b)\n", " Y|θ ~ Poisson(θ) : observed\" is represented as [{ Name = \"θ\"\n", " Conditionals = []\n", " Distribution = \"gamma\"\n", " Parameters = [\"a\"; \"b\"]\n", " Observed = None }; { Name = \"Y\"\n", " Conditionals = [\"θ\"]\n", " Distribution = \"poisson\"\n", " Parameters = [\"θ\"]\n", " Observed = Some \"observed\" }]\n", "Model: \"θ ~ Beta(unit,unit)\n", " γ ~ Gamma(a,b)\n", " Y|θ,γ ~ Binomial(n,θ) : observed\" is represented as [{ Name = \"θ\"\n", " Conditionals = []\n", " Distribution = \"beta\"\n", " Parameters = [\"unit\"; \"unit\"]\n", " Observed = None }; { Name = \"γ\"\n", " Conditionals = []\n", " Distribution = \"gamma\"\n", " Parameters = [\"a\"; \"b\"]\n", " Observed = None }; { Name = \"Y\"\n", " Conditionals = [\"θ\"; \"γ\"]\n", " Distribution = \"binomial\"\n", " Parameters = [\"n\"; \"θ\"]\n", " Observed = Some \"observed\" }]\n" ] }, { "data": { "text/html": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Print out our simple 1-Parameter Model.\n", "let model1 = @\"θ ~ Gamma(a,b)\n", " Y|θ ~ Poisson(θ) : observed\"\n", "printParsedModel(model1)\n", "\n", "// This model doesn't make sense but adding to test multiple conditionals.\n", "let model2 = @\"θ ~ Beta(unit,unit)\n", " γ ~ Gamma(a,b)\n", " Y|θ,γ ~ Binomial(n,θ) : observed\"\n", "printParsedModel(model2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specifying the Parameters\n", "\n", "The idea is to decouple the parameters separate from the model and this is done by saving the details as a JSON string. This task is made simple using the ``Newtonsoft.Json`` library." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "open System\n", "open System.Collections.Generic\n", "\n", "open MathNet.Numerics.Distributions\n", "\n", "open Newtonsoft.Json\n", "\n", "type Observed = float list\n", "\n", "type ParameterList = \n", " { Observed : float list; Parameters : Dictionary } \n", "\n", "let deserializeParameters (paramsAsString : string) : ParameterList = \n", " JsonConvert.DeserializeObject(paramsAsString)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Examples of the Deserialization of the Parameters" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Deserialized Parameters 1: { Observed = [4.2; 0.235; 2.11]\n", " Parameters = seq [[μ0, 0]; [σ0, 1]; [μ, 5]; [σ, 2]; ...] }\n", "Sampling from the Exponential Distribution with the λ = Exponential(λ = 2) parameter: 1.673949327\n" ] }, { "data": { "text/html": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Parameter List 1\n", "let parameters1 = \"{Parameters : {μ0 : 0, σ0 : 1, μ : 5, σ : 2, λ : 4}, observed : [4.2,0.235,2.11]}\"\n", "let deserializedParameters1 = deserializeParameters parameters1\n", "printfn \"Deserialized Parameters 1: %A\" (deserializedParameters1)\n", "\n", "// Parameter List 2\n", "let parameters2 = \"{Parameters: {λ : 2}}\"\n", "let deserializedParameters2 = deserializeParameters parameters2\n", "// Applying the Deserialized Parameters to Sample from a Distribution\n", "let exp = Exponential deserializedParameters2.Parameters.[\"λ\"] \n", "printfn \"Sampling from the Exponential Distribution with the λ = %A parameter: %A\" exp (exp.Sample())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Converting the Parsed Model into a Bayesian Network\n", "\n", "![plan_section2](./Images/Plan_Section2.png)\n", "\n", "The goal is to take the parsed representations of the user-inputted model and convert them into a directed acylic graph (DAG) that represents a Bayesian Network in an effort to easily and generically compute the posterior distribution. \n", "\n", "The first step here is to map the distribution that's specified by the user as a string to an actual function that can compute the density or mass functions used in the prior and likelihood calculations. Subsequent to the distribution mapping, creating the notion of a Bayesian Node and the associated Bayesian Network will lead us to having a concrete form of the specified model. In this flavor of implementation, I will be simplifying the logic to only consider a two node network with one node representing the prior and the other, the likelihood. In a future post, I'll be generalizing this algorithm further." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Distribution Mapping\n", "\n", "The distribution mapping step involves developing:\n", "\n", "1. Type-specified domain for the distribution.\n", "2. Parsing logic that takes the name of the distribution and applies the ParsedRandomVariable based parameters to it.\n", "3. A helper function based on the type of distribution to get the probability density function based on an input.\n", "\n", "__NOTE: The list of distributions is incomplete, however, the idea should be similar to add newer ones.__" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "open MathNet.Numerics.Distributions\n", "\n", "type DistributionType = \n", " | Continuous\n", " | Discrete\n", "\n", "type Parameter = float\n", "type DiscreteInput = int\n", "type Input = float\n", "\n", "type DensityType = \n", " | OneParameter of (Parameter * Input -> float) \n", " | OneParameterDiscrete of (Parameter * DiscreteInput -> float)\n", " | TwoParameter of (Parameter * Parameter * Input -> float)\n", "\n", "type DistributionInfo = { RVName : string\n", " DistributionType : DistributionType \n", " DistributionName : string \n", " Parameters : float list\n", " Density : DensityType } with\n", " \n", " static member Create (item : ParsedRandomVariable) \n", " (parameterList : ParameterList) : DistributionInfo = \n", " // I know this is ugly but this functionality assumes the user enters the \n", " // parameters in the order that's expected by the MathNet Numerics Library. \n", " // Grab the parameters associated with this Random Variable.\n", " let rvParameters = \n", " item.Parameters\n", " |> List.filter(parameterList.Parameters.ContainsKey) \n", " |> List.map(fun item -> parameterList.Parameters.[item])\n", "\n", " // Extract Distribution Associated with the Parsed Random Variable.\n", " match item.Distribution with\n", "\n", " // 1 Parameter Distributions \n", " | \"exponential\" -> \n", " { RVName = item.Name\n", " DistributionName = item.Distribution\n", " Parameters = rvParameters\n", " DistributionType = DistributionType.Continuous\n", " Density = OneParameter Exponential.PDF }\n", " | \"poisson\" ->\n", " { RVName = item.Name\n", " DistributionName = item.Distribution\n", " Parameters = rvParameters\n", " DistributionType = DistributionType.Discrete\n", " Density = OneParameterDiscrete Poisson.PMF } \n", " // 2 Parameter Distributions \n", " | \"normal\" ->\n", " { RVName = item.Name\n", " DistributionName = item.Distribution\n", " Parameters = rvParameters\n", " DistributionType = DistributionType.Continuous\n", " Density = TwoParameter Normal.PDF }\n", " | \"gamma\" ->\n", " { RVName = item.Name\n", " DistributionName = item.Distribution\n", " Parameters = rvParameters\n", " DistributionType = DistributionType.Continuous\n", " Density = TwoParameter Gamma.PDF }\n", " | \"beta\" ->\n", " { RVName = item.Name\n", " DistributionName = item.Distribution\n", " Parameters = rvParameters\n", " DistributionType = DistributionType.Continuous\n", " Density = TwoParameter Beta.PDF }\n", " | \"continuousuniform\" ->\n", " { RVName = item.Name\n", " DistributionName = item.Distribution\n", " Parameters = rvParameters\n", " DistributionType = DistributionType.Continuous\n", " Density = TwoParameter ContinuousUniform.PDF }\n", " // Failure Case\n", " | _ -> failwithf \"Distribution not registered: %A\" item.Distribution \n", " \n", " member this.ComputeOneParameterPDF (parameter : float) (input : float) : float =\n", " match this.Density with\n", " | OneParameter pdf -> pdf(parameter,input)\n", " | _ -> failwithf \"Incorrect usage of function with a non One Parameter Density Type. Density given: %A\" this.Density \n", " member this.ComputeOneParameterDiscretePMF (parameter : float) (input : int) : float =\n", " match this.Density with\n", " | OneParameterDiscrete pmf -> pmf(parameter,input)\n", " | _ -> failwithf \"Incorrect usage of function with a non One Parameter Discrete Density Type. Density given: %A\" this.Density \n", " member this.ComputeTwoParameterPDF (parameter1 : float) (parameter2 : float) (input : float) : float =\n", " match this.Density with\n", " | TwoParameter pdf -> pdf(parameter1, parameter2, input)\n", " | _ -> failwithf \"Incorrect usage of function with a non Two Parameter Density Type. Density given: %A\" this.Density" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Helper Methods to get the Distributions from the User Specified DSL" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "let getDistributionInfoForModel(model : string) (parameterList : string) : DistributionInfo list = \n", " let parsedModel = parseModel model\n", " let parameterList = deserializeParameters parameterList \n", " parsedModel\n", " |> List.map(fun x -> DistributionInfo.Create x parameterList)\n", "\n", "let getDensityOrProbabilityForModel (model : string) \n", " (parameterList : string) \n", " (data : float seq) : IDictionary = \n", " getDistributionInfoForModel model parameterList\n", " |> List.map(fun (e : DistributionInfo) -> \n", " match e.Density with\n", " | OneParameter p ->\n", " let param = List.exactlyOne e.Parameters \n", " let results = data |> Seq.map(fun d -> e.ComputeOneParameterPDF param d)\n", " e.RVName, results\n", " | OneParameterDiscrete p ->\n", " let param = List.exactlyOne e.Parameters \n", " let results = data |> Seq.map(fun d -> e.ComputeOneParameterDiscretePMF param (int d))\n", " e.RVName, results\n", " | TwoParameter p ->\n", " let p2 : float list = e.Parameters |> List.take 2\n", " let results = data |> Seq.map(fun d -> e.ComputeTwoParameterPDF p2.[0] p2.[1] d)\n", " e.RVName, results)\n", " |> dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Testing the Distribution Mapping Logic on Random Samples via a Simple Model" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Exponential: seq\n", " [seq [5.021224175e-54; 6.790246353e-50; 2.044679134e-68; 2.02863983e-172; ...]]\n", "Normal: seq [seq [0.3813916696; 0.1382035578; 0.1331370548; 0.2884060033; ...]]\n", "Poisson: seq\n", " [seq [1.104713281e-15; 1.21518461e-14; 7.532136009e-17; 1.21518461e-14; ...]]\n" ] }, { "data": { "text/html": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Exponential. \n", "let exponentialModel = \"x ~ Exponential(lambda)\"\n", "let exponentialParamList = \"{Parameters: {lambda : 2, a : 2., b : 2.3}, Observed : []}\"\n", "let exponentialDummyData = ContinuousUniform.Samples(0., 200.) |> Seq.take 2000\n", "let exponentialPdfs = getDensityOrProbabilityForModel exponentialModel exponentialParamList exponentialDummyData\n", "printfn \"Exponential: %A\" (exponentialPdfs.Values)\n", "\n", "// Normal.\n", "let normalModel = \"x ~ Normal(mu,sigma)\"\n", "let normalParamList = \"{Parameters: {mu: 0., sigma : 1.}, Observed : []}\"\n", "let normalDummyData = Normal.Samples(0.0, 1.0) |> Seq.take 2000\n", "let normalPdfs = getDensityOrProbabilityForModel normalModel normalParamList normalDummyData \n", "printfn \"Normal: %A\" (normalPdfs.Values)\n", "\n", "// Poisson.\n", "let poissonModel = \"x ~ Poisson(theta)\"\n", "let poissonParamList = \"{Parameters: {theta: 44}, Observed : []}\"\n", "let poissonDummyData = ContinuousUniform.Samples(0., 5.) |> Seq.take 2000 \n", "let poissonPdfs = getDensityOrProbabilityForModel poissonModel poissonParamList poissonDummyData \n", "printfn \"Poisson: %A\" (poissonPdfs.Values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bayesian Node\n", "\n", "The next step is to specify the Bayesian Node, the abstraction that should encapsulate all the information for successfully computing probabilities. There can be two types of Nodes:\n", "\n", "1. With __Observed__ Variables for cases such as the Likelihood where we have data from a sampling model.\n", "2. __Non-Observed__ Variables for cases such as the Prior where all we have is a guess as to what the distribution of the parameter we want to infer is." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "type BayesianNodeTypeInfo =\n", " | Observed of float list \n", " | NonObserved\n", "\n", "type BayesianNode = \n", " { Name : string\n", " NodeType : BayesianNodeTypeInfo \n", " DistributionInfo : DistributionInfo\n", " ParsedRandomVariable : ParsedRandomVariable } with\n", "\n", " static member Create(parsedRandomVariable : ParsedRandomVariable)\n", " (parameterList : ParameterList) =\n", "\n", " let nodeType : BayesianNodeTypeInfo =\n", " match parsedRandomVariable.Observed with\n", " | Some _ -> BayesianNodeTypeInfo.Observed parameterList.Observed\n", " | None -> BayesianNodeTypeInfo.NonObserved\n", " { Name = parsedRandomVariable.Name;\n", " NodeType = nodeType;\n", " DistributionInfo = DistributionInfo.Create parsedRandomVariable parameterList;\n", " ParsedRandomVariable = parsedRandomVariable; }\n", "\n", " member this.GetDependents (parsedBayesianModel : ParsedBayesianModel) : ParsedRandomVariable list =\n", " parsedBayesianModel \n", " |> List.filter(fun x -> x.Conditionals |> List.contains(this.Name))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### An Example of Creating a Node From a Line in the Model" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NameNodeTypeDistributionInfoParsedRandomVariable
x{ FSI_0024+BayesianNodeTypeInfo+Observed: Item: [ 1, 2, 3, 55 ], Tag: 0, IsObserved: True, IsNonObserved: False }{ FSI_0011+DistributionInfo: RVName: x, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: exponential, Parameters: [ 2 ], Density: { FSI_0011+DensityType+OneParameter: Item: { FSI_0011+Create@41: }, Tag: 0, IsOneParameter: True, IsOneParameterDiscrete: False, IsTwoParameter: False } }{ FSI_0006+ParsedRandomVariable: Name: x, Conditionals: [ ], Distribution: exponential, Parameters: [ lambda ], Observed: { Microsoft.FSharp.Core.FSharpOption<System.String>: Value: observed } }
" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let lineOfModel = @\"x ~ Exponential(lambda) : observed\"\n", "let paramList = \"{Parameters: {lambda : 2, a : 2., b : 2.3 }, observed : [1,2,3,55]}\"\n", "BayesianNode.Create (parseLineOfModel lineOfModel) (deserializeParameters paramList)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simple Bayesian Network\n", "\n", "A General Bayesian Network is a Directed Acyclic Graph consisting of Bayesian Nodes and the relationship amongst them. A Simple Bayesian Network is a type of Bayesian Network with two Bayesian Nodes namely, the Prior and Likelihood nodes or in other words, the model that the user can specify is constrained to two random variables; this is done for the sake of simplicity.\n", "\n", "The Simple Bayesian Network also consists of functions that can be used to compute the numerator of Bayes Theorem i.e. $Posterior = Prior \\times Likelihood$\n", "\n", "The __Prior__ is computed using the PDF or PMF based on the parameters associated with the Prior Bayesian Node. For example, a Normal Prior Node will have the formula of ``Normal.PDF(mean, stddev, input)``. \n", "\n", "The __Likelihood__, on the other hand, involves the result from the Prior and the observed list. The assumptions in the computation of the Likelihood is that the observations are __conditionally independent__ given our prior and the __exchangability__ of the observations where the order of the observations don't matter i.e. the order or labels of the observations don't change the outcome.\n", "\n", "\\begin{align}\n", "p(y_1, \\dots, y_n \\mid \\theta) = p(y_1 \\mid \\theta) \\times \\dots \\times p(y_n \\mid \\theta) = \\prod_i p(y_i \\mid \\theta).\n", "\\end{align}\n", "\n", "In other words, we will be computing the PDF based on the prior, parameters of the likelihood node and the observed variables. For example, if we were to be inferring the mean of the Normal distribution, our likelihood will be computed by the following pseudo-code: ``product(Normal.PDF(prior, stddevOfLikelihood, observation))``\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "// Only to be used for a model with 2 nodes \n", "// i.e. one for the prior and one for the likelihood.\n", "type SimpleBayesianNetworkModel = \n", " { Name : string\n", " Nodes : IDictionary\n", " Prior : BayesianNode\n", " Likelihood : BayesianNode } with\n", "\n", " member this.GetPriorProbability (input : float) : float = \n", " let distributionInfo = this.Prior.DistributionInfo \n", " match distributionInfo.Density with\n", " | OneParameter p ->\n", " let param = List.exactlyOne distributionInfo.Parameters \n", " distributionInfo.ComputeOneParameterPDF param input \n", " | OneParameterDiscrete p ->\n", " let param = List.exactlyOne distributionInfo.Parameters \n", " distributionInfo.ComputeOneParameterDiscretePMF param (int input)\n", " | TwoParameter p ->\n", " let p2 : float list = distributionInfo.Parameters |> List.take 2\n", " distributionInfo.ComputeTwoParameterPDF p2.[0] p2.[1] input\n", "\n", " member this.GetLikelihoodProbability (prior : float) : float = \n", " let distributionInfo = this.Likelihood.DistributionInfo\n", " let observed = match this.Likelihood.NodeType with\n", " | Observed l -> l\n", " | _ -> failwithf \"Incorrectly constructed Simple Network Model. %A\" this \n", " let density = \n", " match distributionInfo.Density with\n", " | OneParameter p -> \n", " observed |> List.map(fun d -> distributionInfo.ComputeOneParameterPDF prior d)\n", " | OneParameterDiscrete p ->\n", " observed |> List.map(fun d -> distributionInfo.ComputeOneParameterDiscretePMF prior (int (Math.Ceiling d)))\n", " | TwoParameter p ->\n", " let p : float = distributionInfo.Parameters |> List.exactlyOne \n", " observed |> List.map(fun d -> distributionInfo.ComputeTwoParameterPDF prior p d)\n", " density\n", " |> List.fold (*) 1.0\n", "\n", " member this.GetPosteriorWithoutScalingFactor (input: float) : float = \n", " let priorPdf = this.GetPriorProbability input \n", " let likelihoodPdf = this.GetLikelihoodProbability priorPdf\n", " priorPdf * likelihoodPdf\n", "\n", " static member Construct (name : string)\n", " (model : ParsedBayesianModel)\n", " (parameterList : ParameterList) : SimpleBayesianNetworkModel = \n", "\n", " // Construct all the modes of the model.\n", " let allNodes : (string * BayesianNode) list =\n", " model\n", " |> List.map(fun m -> m.Name, BayesianNode.Create m parameterList)\n", "\n", " let prior : BayesianNode = \n", " allNodes\n", " |> List.filter(fun (_,m) -> m.NodeType = BayesianNodeTypeInfo.NonObserved) \n", " |> List.map(fun (_,m) -> m)\n", " |> List.exactlyOne\n", "\n", " let likelihood : BayesianNode = \n", " allNodes\n", " |> List.filter(fun (_,m) -> m.NodeType <> BayesianNodeTypeInfo.NonObserved)\n", " |> List.map(fun (_,m) -> m)\n", " |> List.exactlyOne\n", "\n", " { Name = name; \n", " Nodes = dict allNodes; \n", " Prior = prior;\n", " Likelihood = likelihood; } " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example of Creating a Simple Bayesian Network" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NameNodesPriorLikelihood
Normal-Normal[ { System.Collections.Generic.KeyValuePair<System.String,FSI_0014+BayesianNode>: Key: x, Value: { FSI_0014+BayesianNode: Name: x, NodeType: { FSI_0014+BayesianNodeTypeInfo+_NonObserved: Tag: 1, IsObserved: False, IsNonObserved: True }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: x, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 5, 3.1622 ], Density: { FSI_0011+DensityType+TwoParameter: Item: FSI_0011+Create@54-2, Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: x, Conditionals: [ ], Distribution: normal, Parameters: [ μ, τ ], Observed: <null> } } }, { System.Collections.Generic.KeyValuePair<System.String,FSI_0014+BayesianNode>: Key: y, Value: { FSI_0014+BayesianNode: Name: y, NodeType: { FSI_0014+BayesianNodeTypeInfo+Observed: Item: [ 9.37, 10.18, 9.16, 11.6, 10.33 ], Tag: 0, IsObserved: True, IsNonObserved: False }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: y, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 1 ], Density: { FSI_0011+DensityType+TwoParameter: Item: FSI_0011+Create@54-2, Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: y, Conditionals: [ x ], Distribution: normal, Parameters: [ x, σ ], Observed: { Microsoft.FSharp.Core.FSharpOption`1[System.String]: Value: observed } } } } ]{ FSI_0014+BayesianNode: Name: x, NodeType: { FSI_0014+BayesianNodeTypeInfo+_NonObserved: Tag: 1, IsObserved: False, IsNonObserved: True }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: x, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 5, 3.1622 ], Density: { FSI_0011+DensityType+TwoParameter: Item: { FSI_0011+Create@54-2: }, Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: x, Conditionals: [ ], Distribution: normal, Parameters: [ μ, τ ], Observed: <null> } }{ FSI_0014+BayesianNode: Name: y, NodeType: { FSI_0014+BayesianNodeTypeInfo+Observed: Item: [ 9.37, 10.18, 9.16, 11.6, 10.33 ], Tag: 0, IsObserved: True, IsNonObserved: False }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: y, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 1 ], Density: { FSI_0011+DensityType+TwoParameter: Item: { FSI_0011+Create@54-2: }, Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: y, Conditionals: [ x ], Distribution: normal, Parameters: [ x, σ ], Observed: { Microsoft.FSharp.Core.FSharpOption<System.String>: Value: observed } } }
" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let model = @\"x ~ Normal(μ,τ) \n", " y|x ~ Normal(x,σ) : observed\"\n", "let parsedModel = parseModel model\n", "let paramList = \"{Parameters: {μ : 5, τ : 3.1622, σ : 1}, observed : [9.37,10.18,9.16,11.60,10.33]}\"\n", "let normalModel = \n", " SimpleBayesianNetworkModel.Construct \"Normal-Normal\" parsedModel (deserializeParameters paramList)\n", "normalModel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Debugging the Prior and Likelihood\n", "\n", "A quick sanity check to see if the numbers match up if we were to compute the Prior and Likelihood \"by hand\"." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Normal Model Prior: 0.03614314702 | Prior from formula: 0.03614314702\n", "Normal Model Likelihood: 4.158402902e-114 | Likelihood from formula: 4.158402901e-114\n" ] }, { "data": { "text/html": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let model = @\"x ~ Normal(μ,τ) \n", " y|x ~ Normal(x,σ) : observed\"\n", "let parsedModel = parseModel model\n", "let paramList = \"{Parameters: {μ : 5, τ : 3.1622, σ : 1}, observed : [9.37,10.18,9.16,11.60,10.33]}\"\n", "let normalModel = \n", " SimpleBayesianNetworkModel.Construct \"Normal-Normal\" parsedModel (deserializeParameters paramList)\n", " \n", "// Prior Check\n", "let priorFromModel = normalModel.GetPriorProbability 0.0\n", "let computedPrior = Normal.PDF(5., 3.1622, 0.0)\n", "printfn \"Normal Model Prior: %A | Prior from formula: %A\" priorFromModel computedPrior\n", "\n", "// Likelihood Check\n", "let likelihoodFromModel = normalModel.GetLikelihoodProbability 0.0\n", "let observed : float list = \n", " match normalModel.Likelihood.NodeType with\n", " | Observed x -> x\n", " | _ -> failwithf \"Incorrectly constructed Likelihood\"\n", " \n", "let likelihoodListFromFormula : float list = \n", " observed |> List.map(fun (o:float) -> Normal.PDF(computedPrior, 1., o))\n", "let likelihoodFromFormula = likelihoodListFromFormula |> List.fold (*) 1.\n", "\n", "printfn \"Normal Model Likelihood: %A | Likelihood from formula: %A\" \n", " (normalModel.GetLikelihoodProbability 0.03614314702) \n", " (likelihoodFromFormula)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Applying the Symmetric Metropolis Hastings Algorithm to Approximate the Posterior Distribution\n", "\n", "![plan_section3](./Images/Plan_Section3.png)\n", "\n", "Now that we have some digestible representation of the Simple Bayesian Network, our next step is to make use of an MCMC algorithm to approximate the posterior. The steps here as follows:\n", "\n", "1. Define a well-specified MCMC domain to run the algorithmss.\n", "2. Implement a type of MCMC algorithm called the Symmetric Metropolis Hasting Algorithm.\n", "3. Apply the algorithm to a Simple Bayesian Network." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Symmetric Metropolis Hastings Algorithm\n", "\n", "I would consider the Symmetric Metropolis Hastings Algorithm to be a simple MCMC algorithm that's based on some smart applications of the Bayes Theorem and therefore, I chose it to be the \"toy\" approximation algorithm used here. There are better algorithms out there such as [Hamiltonian Monte-Carlo](http://diffsharp.github.io/DiffSharp/examples-hamiltonianmontecarlo.html) or the [No U-Turn Sampler](http://www.stat.columbia.edu/~gelman/research/published/nuts.pdf) that I plan to prospectively explore.\n", "\n", "#### The Algorithm\n", "1. Produce a proposal based on the proposal distribution and the current value.\n", "2. Compute the Posterior with the proposal and the current value.\n", "3. Compute the Acceptance Ratio.\n", "4. Accept or Reject the proposal. If accepted, the current value will be the proposed value else, grab another proposal using the proposal distribution.\n", "5. Repeat until iterative convergence for all chains requested.\n", "\n", "The main ideas associated with the steps or single iterations of this algorithm are:\n", "\n", "#### 1. Proposing New Values To Approximate the Parameter Space\n", "\n", "Using a proposal distribution, we come up with newer values of the parameter to explore and find regions that are more likely to be outputted by the posterior distribution. Examples of the proposal distributions are:\n", "\n", "1. ``Uniform(current - δ, current + δ)``\n", "2. ``Normal(current, δ^2)``\n", "\n", "where δ is a hyperparameter that's a proxy of the size of the jumps.\n", "\n", "#### 2. Computing the Acceptance Ratio\n", "\n", "Taking advantage of the notion below and considering different parameters, say $\\theta$ and $\\theta^{*}$, we come up with a new quantity called the __Acceptance Ratio__, $r$.\n", "\n", "$p(\\theta \\mid y) \\propto p(y \\mid \\theta) p(\\theta)$\n", "\n", "Where $r = \\frac{p(\\theta \\mid y)}{p(\\theta^{*} \\mid y)} = \\frac{p(y \\mid \\theta) p(\\theta)}{p(y \\mid \\theta^{*}) p(\\theta^{*})}$\n", "\n", "By examining this ratio, it can be made evident that if the numerator is greater than the denominator, we have an acceptance ratio > 1; this implies that the posterior without the normalization constant with $\\theta$ is higher than that with $\\theta^{*}$ or is a data-point more likely (or with less uncertainity) to be obtained from the posterior.\n", "\n", "#### 3. Accepting or Rejecting The Proposal Based on the Acceptance Ratio\n", "\n", "The Symmetric Metropolis Hastings algorithm is a type of Metropolis Hastings algorithm where the proposal distribution between the current the proposed, new value is symmetric i.e. the probability to obtain the proposal given the current value is the same as the probability to obtain the current value given the proposal.\n", "\n", "With this idea in mind, once we have computed the acceptance ratio, we want to check to see how likely it'll typify the posterior distribution. This can be done by taking a random drawing from a uniform distribution where the lower and upper values are 0 and 1 respectively. This idea can be elucidated on by the fact that if the proposal based posterior better approximates the real-posterior better than a random drawing from a 0-1 uniform distribution, in the long run, it'll eventually be close-enough to the real-posterior.\n", "\n", "As mentioned before, if the proposal was accepted, we use that as the next current value to generate another proposal or else try to get another proposal in the subsequent iteration.\n", "\n", "#### Weaknesses of this Algorithm\n", "\n", "The simplicity of this model comes with costs, specifically, the dependence on the hyperparameters that have to be tuned and autocorrelation of values or if the proposed distribution is extremely close to the current value, it'll take longer to explore the entire parameter space as the jumps will be smaller. As an example of the weaknesses, I needed to experiment with a bunch of hyperparameters before I got to some semblance of the posterior distribution I was expecting in addition to increases the number of iterations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defining the MCMC Domain\n", "\n", "I liked the idea of using a simple ``Request -> Result`` model from the Client-Server model to encapulate the approximation. Additionally, I drew inspiration from reading pymc3's code to come up with the following domain.\n", "\n", "In general, I want any of the prospective MCMC algorithms to have the ability to:\n", "\n", "1. Output multiple chains that approximate the posterior.\n", "2. Have the ability to specify multiple types of \"steps\" or mechanisms to induce jumps in parameter space for appropriately approximating the posterior distribution.\n", "3. Outputting statistics as to how the algorithm did.\n", "4. Specifying a converge criteria after which the MCMC algorithm stops the approximation.\n", "5. An ability to pass in a proposal distribution on the basis of which jumps in the parameter space are induced.\n", "\n", "#### Terminology \n", "\n", "The __Acceptance Rate__ is a result of an MCMC chain the proportion of the number of accepted proposals to the total number of non-burn in proposals. This metric is a proxy of how the steps are approximating the posterior.\n", "\n", "The __BurnInIterationsPct__ is the percent of posterior approximates to ignore; these are taken from the beginning of the chains and are representative of the first few jumps before the algorithm is at a point where it's better at approximating the posterior." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "type ConvergenceCriteria = \n", " | IterativeConvergence of int // Number of Iterations\n", "\n", "type ProposalDistribution =\n", " | Normal of float // Normal( current, delta )\n", " | ContinuousUniform of float // ContinuousUniform( current - delta, current + delta )\n", " | PositiveContinuousUniform of float // ( x ~ Uniform( current - delta, current + delta ) if x <= 0 then 0.1 else x )\n", "\n", "type MCMCInferenceStep = \n", " | SymmetricMetropolisHastings of ProposalDistribution * float\n", "\n", "type MCMCChain =\n", " { Id : int\n", " AcceptanceRate : float // # of Acceptances / Total # of Chain Items\n", " StepValues : float seq }\n", "\n", "type MCMCRequest = \n", " { StepFunction : MCMCInferenceStep\n", " ConvergenceCriteria : ConvergenceCriteria\n", " BurnInIterationsPct : float \n", " Chains : int \n", " PrintDebug : bool }\n", "\n", "type MCMCResult =\n", " { Chains : MCMCChain seq \n", " MCMCRequest : MCMCRequest }" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "open System\n", "open MathNet.Numerics.Distributions\n", "open XPlot.Plotly\n", "\n", "type AcceptanceRejection =\n", " | Acceptance of float // Acceptance of the Proposal\n", " | Rejection of float // Rejection of the Proposal\n", "\n", "let doSymmetricMetropolisHastings (request : MCMCRequest) \n", " (iterations : int)\n", " (proposalDistribution : ProposalDistribution)\n", " (initialValue : float)\n", " (simpleBayesianModel : SimpleBayesianNetworkModel) : MCMCResult =\n", "\n", " let getChain (id : int) (request : MCMCRequest) \n", " (iterations : int) (simpleBayesianModel : SimpleBayesianNetworkModel) : MCMCChain =\n", "\n", " let burnin = int (Math.Ceiling(request.BurnInIterationsPct / 100. * float iterations))\n", " let zeroOneUniform = ContinuousUniform()\n", "\n", " let mutable current = simpleBayesianModel.GetPosteriorWithoutScalingFactor initialValue\n", "\n", " let matchedProposalDistribution (input : float) : float =\n", " match proposalDistribution with \n", " | Normal delta -> Normal(input, delta).Sample()\n", " | ContinuousUniform delta -> ContinuousUniform(input - delta, input + delta).Sample()\n", " | PositiveContinuousUniform delta -> \n", " let u = ContinuousUniform(input - delta, input + delta).Sample()\n", " if u <= 0. then input + 0.1 else u\n", "\n", " let step (iteration : int) : AcceptanceRejection =\n", " let proposed = matchedProposalDistribution current\n", " let currentProbability = simpleBayesianModel.GetPosteriorWithoutScalingFactor current\n", " let proposedProbabilty = simpleBayesianModel.GetPosteriorWithoutScalingFactor proposed\n", " let acceptanceRatio = Math.Min(currentProbability / proposedProbabilty, 1.)\n", " let uniformDraw = zeroOneUniform.Sample()\n", "\n", " if request.PrintDebug then \n", " printfn \"Chain: %A Iteration: %A - Current Probability: %A | Proposed Probability: %A | AcceptanceRatio: %A | Uniform Draw: %A | Current: %A | Proposed: %A\" \n", " id iteration currentProbability proposedProbabilty acceptanceRatio uniformDraw current proposed\n", "\n", " if uniformDraw < acceptanceRatio then (current <- proposed; Acceptance proposed) \n", " else Rejection current\n", " \n", " let stepResults : AcceptanceRejection seq =\n", " seq {1..iterations}\n", " |> Seq.map step\n", " |> Seq.skip burnin\n", "\n", " let getAcceptanceRateAndStepValues : float * float seq =\n", "\n", " // Compute the Acceptance Rate.\n", " let acceptanceRate : float = \n", " let totalBurninWithoutBurnin : float = float (Seq.length stepResults)\n", " let totalNumberOfAcceptances : float = \n", " stepResults \n", " |> Seq.filter(fun x -> \n", " match x with\n", " | Acceptance x -> true\n", " | _ -> false)\n", " |> Seq.length\n", " |> float\n", " totalNumberOfAcceptances / totalBurninWithoutBurnin\n", "\n", " // Grab the Step Values that'll approximate the posterior.\n", " let stepValues =\n", " stepResults |> Seq.map(fun s ->\n", " match s with\n", " | Acceptance v -> v\n", " | Rejection v -> v)\n", "\n", " acceptanceRate, stepValues\n", "\n", " let acceptanceRate, stepValues = getAcceptanceRateAndStepValues \n", "\n", " { Id = id\n", " AcceptanceRate = acceptanceRate\n", " StepValues = stepValues} \n", "\n", " let chains : MCMCChain seq =\n", " seq {1..request.Chains}\n", " |> Seq.map(fun id -> getChain id request iterations simpleBayesianModel)\n", "\n", " { Chains = chains\n", " MCMCRequest = request }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running the Algorithm\n", "\n", "This method takes in a generic MCMC Request and a model and returns an MCMC Result consisting of the chain of results that have the posterior sequence as well as the acceptance rate." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "let runMCMC (request : MCMCRequest) \n", " (model : SimpleBayesianNetworkModel) : MCMCResult =\n", " match request.StepFunction with\n", " | SymmetricMetropolisHastings (proposalDistribution,initialValue) ->\n", " match request.ConvergenceCriteria with \n", " | IterativeConvergence iterations -> \n", " doSymmetricMetropolisHastings request iterations proposalDistribution initialValue model\n", " | _ -> failwith \"You need to pass in the number of iterations for the Metropolis-Hastings algorithm\"\n", " | _ -> failwithf \"Step Function Not Registered: %A\" request.StepFunction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tieing it All Together\n", "\n", "Let's now go from a user inputted model to the results of the Symmetric Metropolis Hastings algorithm to approximate the posterior for two cases. We'll plot the results of the first chain to highlight if the algorithm did the right thing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exponential Prior and Exponential Likelihood" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let model = @\"x ~ Exponential(a) \n", " y|x ~ Exponential(x) : observed\"\n", "let parsedModel = parseModel model\n", "let paramList = \"{Parameters: {a : 2., b : 2.}, observed : [1,2,3,4,4,2,5,6,7,3,2,3,4,5,6,1,2,3,4,4,4,4]}\"\n", "let simpleModel = SimpleBayesianNetworkModel.Construct \"Exponential Model\" parsedModel (deserializeParameters paramList)\n", "\n", "let request : MCMCRequest = \n", " { StepFunction = SymmetricMetropolisHastings (ProposalDistribution.PositiveContinuousUniform 0.002, 0.001)\n", " ConvergenceCriteria = IterativeConvergence 10000\n", " BurnInIterationsPct = 0.2\n", " PrintDebug = false \n", " Chains = 4 }\n", "\n", "let mcmc = runMCMC request simpleModel\n", "let firstChain = Seq.head mcmc.Chains\n", "\n", "Histogram(x = firstChain.StepValues)\n", "|> Chart.Plot\n", "|> Chart.WithTitle \"Exponential Prior and Likelihood\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the online notebook renderer isn't displaying the chart, I took a screenshot of it.\n", "\n", "![Exponential Prior and Likelihood](./Images/ExponentialPriorLikelihood.png)\n", "\n", "This histogram of the Step Values indicates that the Posterior has the shape of an Exponential distribution; which is expected. As an example, here is the PDF of a typical exponential distribution:\n", "\n", "![ExponentialDistribution](./Images/ExponentialDistribution.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Normal Prior and Exponential Likelihood" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#!fsharp\n", "let model = @\"x ~ Normal(μ,τ) \n", " y|x ~ Exponential(x) : observed\"\n", "let parsedModel = parseModel model\n", "let paramList = \"{Parameters: {μ : 5, τ : 3.1622, σ : 1}, observed : [9.37,10.18,9.16,11.60,10.33]}\"\n", "let simpleModel = \n", " SimpleBayesianNetworkModel.Construct \"Normal-Exponential\" parsedModel (deserializeParameters paramList)\n", "\n", "\n", "let request : MCMCRequest = \n", " { StepFunction = SymmetricMetropolisHastings (ProposalDistribution.PositiveContinuousUniform 0.003, 0.0001)\n", " ConvergenceCriteria = IterativeConvergence 50000\n", " BurnInIterationsPct = 0.2\n", " PrintDebug = false \n", " Chains = 4 }\n", "\n", "let mcmc = runMCMC request simpleModel\n", "let firstChain = Seq.head mcmc.Chains\n", "\n", "Histogram(x = firstChain.StepValues)\n", "|> Chart.Plot\n", "|> Chart.WithTitle \"Normal Prior with Exponential Likelihood\"\n", "|> Chart.WithWidth 700\n", "|> Chart.WithHeight 500" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This posterior distribution was a lot more variable in terms of the shape of the distribution; it was also extremely dependent on the hyperparameters passed in highlighting the weaknesses of this algorithm. I am less confident about these results than those from the previous example.\n", "\n", "![Normal-Exponential](./Images/NormalExponential.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How to Use the Posterior\n", "\n", "Now that the posterior is obtained, the statistician could use it by sampling from it to plug in to other models used for prediction such as use it for the [Posterior Predictive Distribution](https://en.wikipedia.org/wiki/Posterior_predictive_distribution).\n", "\n", "My plan is to write a separate blogpost to go over computing this and making use of the posterior to make predictions. This submission is already bloated enough." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "This submission took a lot of work and is a product of two months of going down a Bayesian rabbit hole. Since I think I have built a reasonable base to continue to build on top of. As an obvious disclaimer, none of this code should be used in production - some of it is untested and might have bugs that weren't apparent to me while writing this. \n", "\n", "Also, please let me know if you find any mistakes or can suggest any improvements! I'll be happy to learn from them and fix up this notebook.\n", "\n", "Thanks to the organizers of FSharp-Advent and Happy Holidays to all!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lessons and Prospective Improvements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lessons\n", "\n", "### Debugging an Iterative Algorithm Can Be Challenging\n", "\n", "Despite being a simple algorithm, the Symmetric Metropolis Hastings still has a lot of moving parts. Debugging through this was a big challenge! As much of cliche as it is, starting off small and passing in extremely simple parameters to complex algorithms was the way I debugged it. Also, adding optional print statements were helpful.\n", "\n", "\n", "### Tooling And Workflow is of Paramount Importance\n", "\n", "Having the right workflow is extremely important for your productivity. I landed on making use of Jupyter notebooks with the .NET Interactive F# kernel. I briefly tried out the VSCode notebooks but got into issues such as inability to stop the kernel or cells running indefinitely. I will gladly assist with giving any feedback here as I think that feature has a lot of potential.\n", "\n", "### Prototyping Helps With Creating Modular Functions\n", "\n", "I took a different approach with this submission than my previous one by first working on prototypical functions. I created a separate notebook where I wrote up all the functions / defined my domain and essentially prototyped the design. This was helpful as it gave me an opportunity to write and debug the algorithms productively." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Improvements\n", "\n", "This Bayesian Rabbit hole I have got myself into is endless but unbelievably fun to understand. Here is how I envision improving the implementation and furthering my understanding of Bayesian Inference:\n", "\n", "### Better Parameter Parsing and Application\n", "\n", "I dislike the current parameter parsing logic when mapping to distributions. I have currently hardcoded the parameter sequence i.e. the order in which parameters are passed in. In practice, a good Bayesian Inference library would introduce a more flexible way to specify a model.\n", "\n", "### More Generalized Bayesian Network\n", "\n", "Right now, I have implemented a Simple Bayesian Network but want to generalize the implementation to a Generic Bayesian Network that'll work for all cases. Making use of a Directed Acyclic Graph (DAG) would be the way to represent this network and the computation of the joint distributions could be given by recursive equation that solves:\n", "\n", "\\begin{align}\n", "p(x_1, x_2, x_3 ... x_n) = \\prod_i p(x_i \\mid parents( x_i ))\n", "\\end{align}\n", "\n", "### More Efficient MCMC Methods\n", "\n", "As noted before, the Symmetric Metropolis Hastings algorithm has its flaws and there are more complex algorithms that give better results out there. I want to implement these from scratch. These include:\n", "\n", "1. Hamiltonian Monte-Carlo\n", "2. The No UTurn Sampler\n", "3. Gibbs Sampler with Metropolis Hastings for the Multivariate Case.\n", "\n", "### Branch out into Bayesian Regression\n", "\n", "As a departure from the traditional Ordinary Least Squares Linear Regression algorithm, I want to venture into the space of Bayesian Regression that, in a nutshell, has regularization built into the algorithm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Resources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Added to the Readme associated with this repository or [here](https://github.com/MokoSan/FSharpAdvent_2020/blob/main/README.md)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Music I Listened to While Working On This Submission\n", "\n", "You have made it this far! Thank you!! Here is the music I had on repeat while working on this submission.\n", "\n", "1. [John Zorn and Friends](https://www.youtube.com/watch?v=c4eO2o9u1j0&ab_channel=podgoryt)\n", "2. [Mike Patton and Mondo Cane](https://www.youtube.com/watch?v=iDOl5q7UXfg&ab_channel=FNM4EVER2)\n", "3. [Burt Bacharach - This Guy's In Love with you](https://www.youtube.com/watch?v=2dDGnl8_Dzg&ab_channel=MusicWonders )\n", "4. [Melvins - Lysol](https://www.youtube.com/watch?v=JtO_Awk4pqU&ab_channel=AboveDeath)\n", "5. [The Jesus Lizard - Puss](https://www.youtube.com/watch?v=PaBJQG6A9SQ&ab_channel=spycory1)\n", "6. [Mr. Bungle - Sudden Death](https://www.youtube.com/watch?v=-QWFV_057KM&ab_channel=IpecacRecordings)\n", "7. [Madreblu - Certamente](https://www.youtube.com/watch?v=Ivh0FWTSJ78&ab_channel=Milano2000Records)\n", "8. [Pato Banton - Spirits in the Material World](https://www.youtube.com/watch?v=S--kTyvm_fM&ab_channel=PatoBanton%26TheReggaeRevolution-Topic)\n", "9. [Max Cooper - Ripple](https://www.youtube.com/watch?v=P_X1KGCgWlE&ab_channel=MaxCooper-Topic)\n", "10. [The Sword - Cheap Sunglasses](https://www.youtube.com/watch?v=RYnTLIZkjlg&ab_channel=SwordofDoomMusic)\n", "11. [Brother Dege - Too Old To Die Young](https://www.youtube.com/watch?v=FQNFcYvILJE&ab_channel=UltimatePowa)\n", "12. [Mr. Bungle - Retrovertigo](https://www.youtube.com/watch?v=DRyh2cxJCp0&ab_channel=tkan)" ] } ], "metadata": { "kernelspec": { "display_name": ".NET (F#)", "language": "F#", "name": ".net-fsharp" }, "language_info": { "file_extension": ".fs", "mimetype": "text/x-fsharp", "name": "C#", "pygments_lexer": "fsharp", "version": "4.5" } }, "nbformat": 4, "nbformat_minor": 4 }