{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Factor Graphs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "### Preliminaries\n", "\n", "- Goal \n", " - Introduction to Forney-style factor graphs and message passing-based inference \n", "- Materials \n", " - Mandatory\n", " - These lecture notes \n", " - Loeliger (2007), [The factor graph approach to model based signal processing](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Loeliger-2007-The-factor-graph-approach-to-model-based-signal-processing.pdf), pp. 1295-1302 (until section V)\n", " - Optional\n", " - Frederico Wadehn (2015), [Probabilistic graphical models: Factor graphs and more](https://www.youtube.com/watch?v=Fv2YbVg9Frc&t=31) video lecture (**highly recommended**)\n", " - References\n", " - Forney (2001), [Codes on graphs: normal realizations](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Forney-2001-Codes-on-graphs-normal-realizations.pdf)\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Why Factor Graphs?\n", "\n", "- A probabilistic inference task gets its computational load mainly through the need for marginalization (i.e., computing integrals). E.g., for a model $p(x_1,x_2,x_3,x_4,x_5)$, the inference task $p(x_2|x_3)$ is given by \n", "$$\\begin{align*}\n", "p(x_2|x_3) = \\frac{p(x_2,x_3)}{p(x_3)} = \\frac{\\idotsint p(x_1,x_2,x_3,x_4,x_5) \\, \\mathrm{d}x_1 \\mathrm{d}x_4 \\mathrm{d}x_5}{\\idotsint p(x_1,x_2,x_3,x_4,x_5) \\, \\mathrm{d}x_1 \\mathrm{d}x_2 \\mathrm{d}x_4 \\mathrm{d}x_5}\n", "\\end{align*}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Since these computations (integrals or sums) suffer from the \"curse of dimensionality\", we often need to solve a simpler problem in order to get an answer. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Factor graphs provide a computationally efficient approach to solving inference problems **if the probabilistic model can be factorized**. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Factorization helps. For instance, if $p(x_1,x_2,x_3,x_4,x_5) = p(x_1)p(x_2,x_3)p(x_4)p(x_5|x_4)$, then\n", "\n", "$$\\begin{align*}\n", "p(x_2|x_3) &= \\frac{\\idotsint p(x_1)p(x_2,x_3)p(x_4)p(x_5|x_4) \\, \\mathrm{d}x_1 \\mathrm{d}x_4 \\mathrm{d}x_5}{\\idotsint p(x_1)p(x_2,x_3)p(x_4)p(x_5|x_4) \\, \\mathrm{d}x_1 \\mathrm{d}x_2 \\mathrm{d}x_4 \\mathrm{d}x_5} \n", " = \\frac{p(x_2,x_3)}{\\int p(x_2,x_3) \\mathrm{d}x_2}\n", "\\end{align*}$$\n", "\n", "which is computationally much cheaper than the general case above." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- In this lesson, we discuss how computationally efficient inference in *factorized* probability distributions can be automated by message passing-based inference in factor graphs." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Factor Graph Construction Rules\n", "\n", "- Consider a function \n", "$$\n", "f(x_1,x_2,x_3,x_4,x_5) = f_a(x_1,x_2,x_3) \\cdot f_b(x_3,x_4,x_5) \\cdot f_c(x_4)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- The factorization of this function can be graphically represented by a **Forney-style Factor Graph** (FFG):\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Due to the factorization and the [Generalized Distributive Law](https://en.wikipedia.org/wiki/Generalized_distributive_law), we can decompose this sum-of-products to the following product-of-sums:\n", "$$\\begin{align*}\\bar{f}&(x_3) = \\\\\n", " &\\underbrace{ \\Bigg( \\sum_{x_1,x_2} \\underbrace{f_a(x_1)}_{\\overrightarrow{\\mu}_{X_1}(x_1)}\\, \\underbrace{f_b(x_2)}_{\\overrightarrow{\\mu}_{X_2}(x_2)}\\,f_c(x_1,x_2,x_3)\\Bigg) }_{\\overrightarrow{\\mu}_{X_3}(x_3)} \n", " \\underbrace{ \\cdot\\Bigg( \\sum_{x_4,x_5} \\underbrace{f_d(x_4)}_{\\overrightarrow{\\mu}_{X_4}(x_4)}\\,f_e(x_3,x_4,x_5) \\cdot \\underbrace{ \\big( \\sum_{x_6,x_7} f_f(x_5,x_6,x_7)\\,\\underbrace{f_g(x_7)}_{\\overleftarrow{\\mu}_{X_7}(x_7)}\\big) }_{\\overleftarrow{\\mu}_{X_5}(x_5)} \\Bigg) }_{\\overleftarrow{\\mu}_{X_3}(x_3)}\n", "\\end{align*}$$\n", "which is computationally (much) lighter than executing the full sum $\\sum_{x_1,\\ldots,x_7}f(x_1,x_2,\\ldots,x_7)$\n", "\n", "