{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "不完全数据:观测随机变量$Y$。 \n", "完全数据:观测随机变量$Y$和隐随机变量$Z$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$Q$函数:完全数据的对数似然函数$\\log P \\left( Y , Z | \\theta \\right)$关于在给定观测数据$Y$和当前参数$\\theta_{\\left( i \\right)}$下对未观测数据$Z$的条件概率分布$P \\left( Z | Y, \\theta_{\\left( i \\right)} \\right)$的期望 \n", "\\begin{align*} & Q \\left( \\theta, \\theta_{\\left( i \\right)} \\right) = E_{Z} \\left[ \\log P \\left( Y, Z | \\theta \\right) | Y , \\theta_{\\left( i \\right)} \\right] \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "含有隐变量$Z$的概率模型,目标是极大化观测变量$Y$关于参数$\\theta$的对数似然函数,即 \\begin{align*} & \\max L \\left( \\theta \\right) = \\log P \\left( Y | \\theta \\right) \\\\ & = \\log \\sum_{Z} P \\left( Y,Z | \\theta \\right) \\\\ & = \\log \\left( \\sum_{Z} P \\left( Y|Z,\\theta \\right) P \\left( Z| \\theta \\right) \\right)\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对数似然函数$L \\left( \\theta \\right)$与第$i$次迭代后的对数似然函数估计值$L \\left( \\theta^{\\left( i \\right)} \\right)$的差 \\begin{align*} & L \\left( \\theta \\right) - L \\left( \\theta^{\\left( i \\right)} \\right) = \\log \\left( \\sum_{Z} P \\left( Y|Z,\\theta \\right) P \\left( Z| \\theta \\right) \\right) - \\log P \\left( Y| \\theta^{ \\left( i \\right)} \\right) \\\\ & = \\log \\left( \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\dfrac { P \\left( Y|Z,\\theta \\right) P \\left( Z| \\theta \\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right)} \\right) - \\log P \\left( Y| \\theta^{ \\left( i \\right)} \\right)\\\\ &\\geq \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)}{P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right)} - \\log P \\left( Y| \\theta^{ \\left( i \\right)} \\right) \\\\ & = \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) P \\left(Y|\\theta^{\\left( i \\right)} \\right)}\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "令\\begin{align*} \\\\& B \\left( \\theta , \\theta^{\\left ( i \\right)} \\right) = L \\left( \\theta^{\\left ( i \\right)} \\right) + \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) P \\left(Y|\\theta^{\\left( i \\right)} \\right)} \\end{align*} \n", "则 \\begin{align*} & L \\left( \\theta \\right) \\geq B \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\end{align*} \n", "即函$B \\left( \\theta, \\theta^{\\left( i \\right)} \\right)$ 是$L \\left( \\theta \\right)$ 的一个下界。 \n", "选择$\\theta^{\\left( i \\right)}$使$B \\left( \\theta, \\theta^{\\left( i \\right)} \\right) $达到极大,即 \\begin{align*} & \\theta^{\\left( i+1 \\right)}= \\arg \\max B \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\\\ & = \\arg \\max \\left( L \\left( \\theta^{\\left ( i \\right)} \\right) + \\sum_{Z} P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) \\log \\dfrac {P \\left( Y | Z, \\theta \\right) P \\left(Z|\\theta\\right)} {P \\left( Z | Y , \\theta^{\\left( i \\right)} \\right) P \\left(Y|\\theta^{\\left( i \\right)} \\right)} \\right) \\\\ & = \\arg \\max \\left( \\sum_{Z} P \\left( Z | Y, \\theta^{\\left( i \\right)} \\right) \\log \\left( P \\left( Y | Z, \\theta \\right) \\right) P \\left( Z | \\theta \\right) \\right) \\\\ & = \\arg \\max \\left( \\sum_{Z} P \\left( Z | Y, \\theta^{\\left( i \\right)} \\right) \\log P \\left( Y, Z | \\theta\\right) \\right) \\\\ & = \\arg \\max Q \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "EM算法: \n", "输入:观测随机变量数据$Y$,隐随机变量数据$Z$,联合分布$P\\left(Y,Z|\\theta\\right) $,条件分布$P\\left(Y|Z,\\theta\\right) $; \n", "输出:模型参数$\\theta$ \n", "1. 初值$\\theta^{\\left(0\\right)}$ \n", "2. $E$步:\\begin{align*} & Q\\left(\\theta,\\theta^\\left(i\\right)\\right)=E_{Z}\\left[\\log P\\left(Y,Z|\\theta\\right)|Y,\\theta^{\\left(i\\right)}\\right] \\\\ & = \\sum_{Z} \\log P\\left(Y,Z|\\theta \\right) \\cdot P\\left(Z|Y, \\theta^\\left(i\\right)\\right)\\end{align*} \n", "3. $M$步:\\begin{align*} & \\theta^{\\left( i+1 \\right)} = \\arg \\max Q\\left(\\theta, \\theta^\\left( i \\right) \\right)\\end{align*} \n", "4. 重复2. 3.,直到收敛。" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "$F$函数:隐变量$Z$的概率分布为$\\tilde{P} \\left( Z \\right)$,关于分布$\\tilde{P}$与参数$\\theta$的函数\n", "\\begin{align*} \\\\ & F \\left( \\tilde{P}, \\theta \\right) = E_{\\tilde{P}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] + H \\left( \\tilde{P} \\right) \\end{align*} \n", "其中,$H \\left( \\tilde{P} \\right) = - E_{\\tilde{P}} \\left[ \\log \\tilde{P} \\left( Z\\right)\\right]$是分布$\\tilde{P} \\left( Z \\right)$的熵。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于固定的$\\theta$,极大化$F$函数 \n", "\\begin{align*} \\\\ & \\max_{\\tilde{P}} F \\left( \\tilde{P}, \\theta \\right) \n", "\\\\ & s.t. \\sum_{Z} \\tilde{P}_{\\theta} \\left( Z \\right) = 1 \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "引入拉格朗日乘子$\\lambda$,构造拉格朗日函数\n", "\\begin{align*} \\\\ & L = E_{\\tilde{P}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] - E_{\\tilde{P}} \\left[ \\log \\tilde{P} \\left( Z\\right)\\right] + \\lambda \\left( 1 - \\sum_{Z} \\tilde{P} \\left( Z \\right) \\right) \n", "\\\\ & = \\sum_{Z} \\log P \\left( Y, Z | \\theta \\right) \\tilde{P} \\left( Z \\right) - \\sum_{Z} \\log P \\left( Z \\right) \\tilde{P} \\left( Z \\right) + \\lambda - \\lambda \\sum_{Z} \\tilde{P} \\left( Z \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将其对$\\tilde{P} \\left( Z \\right)$求偏导,得\n", "\\begin{align*} \\\\ & \\dfrac {\\partial L}{\\partial \\tilde{P} \\left( Z \\right) } = \\log P \\left( Y, Z | \\theta \\right) - 1 - \\log P \\left( Z \\right) - \\lambda \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "令其等于0,得\n", "\\begin{align*} \\\\ & \\lambda = \\log P \\left( Y, Z | \\theta \\right) - 1 - \\log P \\left( Z \\right) \n", "\\\\ & \\dfrac{P \\left( Y, Z | \\theta \\right) }{\\tilde{P}_{\\theta} \\left( Z \\right) } = e^{1 + \\lambda } \n", "\\\\ & \\sum_{Z} P \\left( Y, Z | \\theta \\right) = e^{1 + \\lambda } \\sum_{Z} \\tilde{P}_{\\theta} \\left( Z \\right) \\end{align*} \n", "由于$\\sum_{Z} \\tilde{P}_{\\theta} \\left( Z \\right) = 1$,得 \n", "\\begin{align*} \\\\ & P \\left( Y \\right) = e^{1 + \\lambda } \\end{align*} \n", "代回,得 \n", "\\begin{align*} \\\\ & \\tilde{P}_{\\theta} \\left( Z \\right) = P \\left( Z | Y, \\theta \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "则\n", "\\begin{align*} \\\\ & F \\left( \\tilde{P}, \\theta \\right) = E_{\\tilde{P}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] + H \\left( \\tilde{P} \\right) \n", "\\\\ & = \\sum_{Z} \\log P \\left( Y, Z | \\theta \\right) \\tilde{P} \\left( Z \\right) - \\sum_{Z} \\log P \\left( Z \\right) \\tilde{P} \\left( Z \\right)\n", "\\\\ & = \\sum_{Z} \\tilde{P} \\left( Z \\right) \\log \\dfrac{P \\left( Y, Z | \\theta \\right) }{\\tilde{P} \\left( Z \\right) }\n", "\\\\ & = \\sum_{Z} \\tilde{P} \\left( Z \\right) \\log \\dfrac{P \\left( Z | Y, \\theta \\right) P \\left(Y | \\theta \\right) }{\\tilde{P} \\left( Z \\right) }\n", "\\\\ & = \\log P \\left(Y | \\theta \\right) \\sum_{Z} \\tilde{P} \\left( Z \\right) \n", "\\\\ & = \\log P \\left(Y | \\theta \\right) \n", "\\\\ & = L \\left( \\theta \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于使$F \\left( \\tilde{P}, \\theta \\right)$达到最大值的参数$\\theta^{*}$,有\n", "\\begin{align*} L \\left( \\theta^{*} \\right) = F \\left( \\tilde{P}_{\\theta^{*}}, \\theta^{*} \\right) = F \\left( \\tilde{P}^{*}, \\theta^{*} \\right)\\end{align*} \n", "即,如果$F \\left( \\tilde{P}, \\theta \\right)$在$\\tilde{P}^{*}, \\theta^{*}$达到局部极大值(全局最大值),则$L \\left( \\theta^{*} \\right)$在$\\tilde{P}^{*}, \\theta^{*}$也达到局部极大值(全局最大值)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "由$\\tilde{P}_{\\theta} \\left( Z \\right) = P \\left( Z | Y, \\theta \\right)$,对固定的$\\theta^{\\left( i \\right) }$,\n", "\\begin{align*} \\tilde{P}^{\\left( i + 1 \\right)} \\left( Z \\right) = \\tilde{P}_{\\theta^{\\left( i \\right)}} \\left( Z \\right) = P \\left( Z | Y, \\theta^{\\left( i \\right) } \\right)\\end{align*} \n", "使$F \\left( \\tilde{P}, \\theta^{\\left( i \\right)} \\right)$极大化, \n", "则\n", "\\begin{align*} \\\\ & F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right) = E_{\\tilde{P}^{\\left( i + 1 \\right)}} \\left[ \\log P \\left( Y, Z | \\theta \\right)\\right] + H \\left( \\tilde{P}^{\\left( i + 1 \\right)} \\right) \n", "\\\\ & = \\sum_{Z} log P \\left(Y , Z | \\theta \\right) P \\left( Z | Y, \\theta^{\\left( i \\right)} \\right) + H \\left( \\tilde{P}^{\\left( i + 1 \\right)} \\right) \n", "\\\\ & =Q \\left( \\theta, \\theta^{\\left( i \\right)} \\right) + H \\left( \\tilde{P}^{\\left( i + 1 \\right)} \\right)\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "固定$\\tilde{P}^{\\left( i + 1 \\right)}$,求$\\theta^{\\left( i \\right)}$使$F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right)$极大化,得 \n", "\\begin{align*} \\theta^{\\left( i + 1 \\right)} = \\arg \\max_{\\theta} F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right) = \\arg \\max_{\\theta} Q \\left( \\theta, \\theta^{\\left( i \\right)} \\right) \\end{align*} \n", "即,由$EM$算法与$F$函数的极大-极大算法的到的参数估计序列$\\theta^{\\left( i \\right)},i = 1, 2, \\cdots,$是一致的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$GEM$算法: \n", "输入:观测数据$Y$,$F$函数; \n", "输出:模型参数$\\theta$ \n", "1. 初值$\\theta^{\\left(0\\right)}$ \n", "2. 第$i+1$次迭代,第1步:记$\\theta^{\\left( i \\right)}$为参数$\\theta$的估计值,$\\tilde{P}^{\\left( i \\right)} $为函数$\\tilde{P}$的估计。求$\\tilde{P}^{\\left( i+1 \\right)} $使$\\tilde{P}$极大化$F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right)$\n", "3. 第2步:求$\\theta^{\\left( i \\right)}$使$F \\left( \\tilde{P}^{\\left( i + 1 \\right)}, \\theta \\right)$极大化\n", "4. 重复(2)和(3),直到收敛。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }