{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# EM as a Message Passing Algorithm\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preliminaries\n", "\n", "- Goals\n", " - Describe Expectation-Maximization (EM) as a message passing algorithm on a Forney-style factor graph\n", "- Materials\n", " - Madatory\n", " - These lecture notes\n", " - Optional\n", " - [Dauwels et al., 2009](./files/Dauwels-2009-Expectation-Maximization-as-Message-Passing.pdf), pp. 1-5 (sections I,II and III)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Problem for the Multiplier Node\n", "\n", "- Consider the multiplier factor $f(x,y,\\theta) = \\delta(y-\\theta x)$ with incoming Gaussian messages $\\overrightarrow{\\mu}_X(x) = \\mathcal{N}(x|m_x,v_x)$ and $\\overleftarrow{\\mu}_Y(y) = \\mathcal{N}(y|m_y,v_y)$. For simplicity's sake, we assume all variables are scalar. \n", "\n", "\n", "\n", "- In a system identification setting, we are interested in computing the outgoing message $\\overleftarrow{\\mu}_\\Theta(\\theta)$. \n", "\n", "- Let's compute the sum-product message:\n", "\n", "$$\\begin{align*}\n", "\\overleftarrow{\\mu}_\\Theta(\\theta) &= \\int \\overrightarrow{\\mu}_X(x) \\, \\overleftarrow{\\mu}_Y(y) \\, f(x,y,\\theta) \\, \\mathrm{d}x \\mathrm{d}y \\\\\n", " &= \\int \\mathcal{N}(x\\,|\\,m_x,v_x) \\, \\mathcal{N}(y\\,|\\,m_y,v_y) \\, \\delta(y-\\theta x)\\, \\, \\mathrm{d}x \\mathrm{d}y \\\\\n", " &= \\int \\mathcal{N}(x\\,|\\,m_x,v_x) \\,\\mathcal{N}(\\theta x\\,|\\,m_y,v_y) \\, \\mathrm{d}x \\\\\n", " &= \\int \\mathcal{N}(x\\,|\\,m_x,v_x) \\,\\mathcal{N}\\left(x \\,\\bigg|\\, \\frac{m_y}{\\theta},\\frac{v_y}{\\theta^2}\\right) \\, \\mathrm{d}x \\\\\n", " &= \\mathcal{N}\\left(\\frac{m_y}{\\theta} \\,\\bigg|\\, m_x, v_x + \\frac{v_y}{\\theta^2}\\right) \\cdot \\int \\mathcal{N}(x\\,|\\,m_*,v_*)\\, \\mathrm{d}x \\tag{SRG-6} \\\\\n", " &= \\mathcal{N}\\left(\\frac{m_y}{\\theta} \\,\\bigg|\\, m_x, v_x + \\frac{v_y}{\\theta^2}\\right)\n", "\\end{align*}$$\n", "\n", "- This is **not** a Gaussian message for $\\Theta$! Passing this message into the graph leads to very serious problems when trying to compute sum-product messages for other factors in the graph.\n", " - (We have seen before in the lesson on [Working with Gaussians](http://nbviewer.ipython.org/github/bertdv/AIP-5SSB0/blob/master/lessons/04_working_with_Gaussians/Working-with-Gaussians.ipynb) that multiplication of two Gaussian-distributed variables does _not_ produce a Gaussian distributed variable.) \n", "\n", "- The same problem occurs in a forward message passing schedule when we try to compute a message for $Y$ from incoming Gaussian messages for both $X$ and $\\Theta$. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Limitations of Sum-Product Messages\n", "\n", "- The foregoing example shows that the sum-product (SP) message update rule will sometimes not do the job. For example:\n", " - On large-dimensional **discrete** domains, the SP update rule maybe computationally intractable.\n", " - On **continuous** domains, the SP update rule may not have a closed-form solution or the rule may lead to a function that is incompatible with Gaussian message passing. \n", "\n", "- There are various ways to cope with 'intractable' SP update rules. In this lesson, we discuss how the EM-algorithm can be written as a message passing algorithm on factor graphs. Then, we will solve the 'multiplier node problem' with EM messages (rather than with SP messages)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EM as Message Passing\n", "\n", " \n", "- Consider first a general setting with likelihood function $f(x,\\theta)$, hidden variables $x$ and tuning parameters $\\theta$. Assume that we are interested in the maximum likelihood estimate \n", "\n", "$$\\begin{align*}\n", "\\hat{\\theta} &= \\arg\\max_\\theta \\int f(x,\\theta) \\mathrm{d}x\\,.\n", "\\end{align*}$$\n", "\n", "- If $\\int f(x,\\theta) \\mathrm{d}x$ is intractible, we can try to apply the EM-algorithm to estimate $\\hat{\\theta}$, which leads to the following iterations (_cf._ [lesson on the EM algorithm](http://nbviewer.ipython.org/github/bertdv/AIP-5SSB0/blob/master/lessons/10_the_EM_algorithm/The-General-EM-Algorithm.ipynb)):\n", "\n", "$$\n", "\\hat{\\theta}^{(k+1)} = \\underbrace{\\arg\\max_\\theta}_{\\text{M-step}} \\left( \\underbrace{\\int_x f(x,\\hat{\\theta}^{(k)})\\,\\log f(x,\\theta)\\,\\mathrm{d}x}_{\\text{E-step}} \\right)\n", "$$\n", "\n", "- It turns out that _for factorized functions_ $f(x,\\theta)$, the EM-algorihm can be executed as a message passing algorithm on the factor graph. \n", "\n", "- As an simple example, we consider the factorization \n", "\n", "$$\n", "f(x,\\theta) = f_a(\\theta)f_b(x,\\theta)\n", "$$\n", "\n", "\n", "\n", "\n", "- Applying the EM-algorithm to this graph leads to the following forward and backward messages over the $\\theta$ edge\n", "$$\\begin{align*}\n", "\\textbf{E-step}&: \\quad \\eta(\\theta) = \\int p_b(x|\\hat{\\theta}^{(k)}) \\log f_b(x,\\theta) \\,\\mathrm{d}x \\\\\n", "\\textbf{M-step}&: \\quad \\hat{\\theta}^{(k+1)} = \\arg\\max_\\theta \\left( f_a(\\theta)\\, e^{\\eta(\\theta)}\\right) \n", "\\end{align*}$$\n", "where $p_b(x|\\hat{\\theta}^{(k)}) \\triangleq \\frac{f_b(x,\\hat{\\theta}^{(k)})}{\\int f_b(x^\\prime,\\hat{\\theta}^{(k)}) \\,\\mathrm{d}x^\\prime}$. \n", "Proof:\n", "