{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Probabilistic Programming 2: Message Passing & Analytical Solutions\n", "\n", "#### Goal \n", " - Understand when and how analytical solutions to Bayesian inference can be obtained.\n", " - Understand how to perform message passing in a Forney-style factor graph.\n", "\n", "#### Materials \n", " - Mandatory\n", " - This notebook\n", " - Lecture notes on factor graphs\n", " - Lecture notes on continuous data\n", " - Lecture notes on discrete data\n", " - Optional\n", " - Chapters 2 and 3 of [Model-Based Machine Learning](http://www.mbmlbook.com/LearningSkills.html).\n", " - [Differences between Julia and Matlab / Python](https://docs.julialang.org/en/v1/manual/noteworthy-differences/index.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that none of the material below is new. The point of the Probabilistic Programming sessions is to solve practical problems so that the concepts from Bert's lectures become less abstract." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using Pkg\n", "Pkg.activate(\"./workspace/\")\n", "Pkg.instantiate();" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using LinearAlgebra\n", "using SpecialFunctions\n", "using ForneyLab\n", "using PyCall\n", "using Plots\n", "pyplot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll be using the toolbox [ForneyLab.jl](https://github.com/biaslab/ForneyLab.jl) to visualize factor graphs and compute messages passed within the graph." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem: A Job Interview\n", "\n", "After you finish your master's degree, you will need to start looking for jobs. You will get one or more job interviews and some will be fun while others will be frustrating. The company you applied at wants a talented and skilled employee, but measuring a person's skill is tricky. Even a highly-skilled person makes mistakes and people with few skills can get lucky. In this session, we will look at various ways to assess skills using questions and test assignments. Along the way, you will gain experience with message passing, factor graphs and working with discrete vs continuous data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1: Right or wrong\n", "\n", "Suppose you head to a job interview for a machine learning engineer position. The company is interested in someone who knows Julia and has set up a test with syntax questions. We will first look at a single question, which we treat as an outcome variable $X_1$. You can either get this question right or wrong, which means we're dealing with a Bernoulli likelihood. The company assumes you have a skill level, denoted $\\theta$, and the higher the skill, the more likely you are to get the question right. Since the company doesn't know anything about you, they chose an uninformative prior distribution: the Beta(1,1). We can write the generative model for answering this question as follows:\n", "\n", "$$\\begin{align*}\n", "p(X_1, \\theta) =&\\ p(X_1 \\mid \\theta) \\cdot p(\\theta) \\\\\n", "=&\\ \\text{Bernoulli}(X_1 \\mid \\theta) \\cdot \\text{Beta}(\\theta \\mid \\alpha = 1, \\beta=1) \\, .\n", "\\end{align*}$$\n", "\n", "The factor graph for this model is:\n", "\n", "\n", "\n", "\n", "where $f_b(X_1, \\theta) \\triangleq \\text{Bernoulli}(X_1 \\mid \\theta)$ and $f_a(\\theta) \\triangleq \\text{Beta}(\\theta \\mid 1,1)$. We are now going to construct this factor graph using the toolbox ForneyLab." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Start building a model by setting up a FactorGraph structure\n", "factor_graph1 = FactorGraph()\n", "\n", "# Add the prior over \n", "@RV θ ~ Beta(1.0, 1.0, id=:f_a)\n", "\n", "# Add the question correctness likelihood\n", "@RV X1 ~ Bernoulli(θ, id=:f_b)\n", "\n", "# The outcome X1 is going to be observed, so we set up a placeholder for the data entry\n", "placeholder(X1, :X1)\n", "\n", "# Visualize the graph\n", "ForneyLab.draw(factor_graph1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Code notes:\n", "- @RV is a macro that lets you add Random Variables as nodes to your factor graph.\n", "- The symbol ~ means \"is distributed as\". For example, $\\theta \\sim \\text{Beta}(1,1)$ should be read as \"$\\theta$ is distributed according to a Beta($\\theta$ | $a$=1, $b$=1) probability distribution\".\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above you can see the factor graph that ForneyLab has generated. It is not as clean as the ones in the theory lectures. For example, ForneyLab generates nodes for the clamped parameters of the Beta prior ($\\alpha = 1$ and $\\beta = 1$), while we ignore these in the manually constructed graphs. Nonetheless, ForneyLab's version is very useful for debugging later on. \n", "\n", "We are now going to tell ForneyLab to generate a message passing procedure for us." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Indicate which variables you want posteriors for\n", "q = PosteriorFactorization(θ, ids=[:θ])\n", "\n", "# Generate a message passing inference algorithm\n", "algorithm = messagePassingAlgorithm(θ, q)\n", "\n", "# Compile algorithm code\n", "source_code = algorithmSourceCode(algorithm)\n", "\n", "# Bring compiled code into current scope\n", "eval(Meta.parse(source_code))\n", "\n", "# Visualize message passing schedule\n", "pfθ = q.posterior_factors[:θ]\n", "ForneyLab.draw(pfθ, schedule=pfθ.schedule);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Code notes:\n", "- ForneyLab.jl compiles the specified model and inference procedure into a string. This string is human-readable and portable across devices. The functions `eval(Meta.parse())` are used to bring that string into the current scope, so the generated code can be used.\n", "- In `ForneyLab.draw()`, only the edge of interest is shown with the two connecting nodes and their inputs. All other parts of the graph are ignored.\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ForneyLab's visualization of the message passing procedure for a specific variable isolates that variable in the graph and shows where the incoming messages come from. In this case, we are interested in $\\theta$ (your skill level), which receives message ((2)) from the likelihood node (the \"Ber\" node above $\\theta$) and message ((1)) from the prior node (the \"Beta\" node below $\\theta$). \n", "\n", "In the message passing framework, the combination of these two messages produces the \"marginal\" distribution for $\\theta$. We are using message passing to do Bayesian inference, so note that the \"marginal\" for $\\theta$ corresponds to the posterior distribution $p(\\theta \\mid X_1)$. \n", "\n", "Let's inspect these messages." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Message ((1)) = Beta(a=1.00, b=1.00)\n", "Message ((2)) = Beta(a=2.00, b=1.00)\n", "\n" ] } ], "source": [ "# Initialize data structure for messages\n", "messages = Array{Message}(undef, 2)\n", "\n", "# Initalize data structure for marginal distributions\n", "marginals = Dict()\n", "\n", "# Suppose you got question 1 correct\n", "data = Dict(:X1 => 1)\n", "\n", "# Update coefficients\n", "stepθ!(data, marginals, messages);\n", "\n", "# Print messages\n", "print(\"\\nMessage ((1)) = \"*string(messages[1].dist))\n", "println(\"Message ((2)) = \"*string(messages[2].dist))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Code notes:\n", "- A `Dict` is a [dictionary data structure](https://docs.julialang.org/en/v1/base/collections/#Base.Dict). In the `marginals` dictionary we only have one entry: the key is the variable `θ` (as a Symbol, i.e. as `:θ`) and the value is a `ProbabilityDistribution` object. It is the initial distribution for that variable. In the `data` dictionary, we also only have one entry: the key is the variable `X1` and the value is a Float. This is because `X1` is observed. We know its value without uncertainty.\n", "- The `stepθ!` function comes from the algorithm compilation.\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alright. So, they are both Beta distributions. Do they actually make sense? Where do these parameters come from?\n", "\n", "Recall from the lecture notes that the formula for messages sent by factor nodes is:\n", "\n", "$$ \\boxed{\n", "\\underbrace{\\overrightarrow{\\mu}_{Y}(y)}_{\\substack{ \\text{outgoing}\\\\ \\text{message}}} = \\sum_{x_1,\\ldots,x_n} \\underbrace{\\overrightarrow{\\mu}_{X_1}(x_1)\\cdots \\overrightarrow{\\mu}_{X_n}(x_n)}_{\\substack{\\text{incoming} \\\\ \\text{messages}}} \\cdot \\underbrace{f(y,x_1,\\ldots,x_n)}_{\\substack{\\text{node}\\\\ \\text{function}}} }\n", "$$\n", "\n", "