{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "图是由结点及连接结点的编组成的集合。 \n", "结点记作$v$,结点集合记作$V$;边记作$e$,边的集合记作$E$;图记作$G=\\left(V,E\\right)$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "概率图模型是由图表示的概率分布。设由联合分布$P\\left(Y\\right)$,$Y \\in \\mathcal{Y}$是一组随机变量。 \n", "由无向图$G=\\left(V,E\\right)$表示概率分布$P\\left(Y\\right)$,即在图$G$中,结点$v \\in V$表示一个随机变量$Y_{v}$,$Y=\\left(Y_{v}\\right)_{v \\in V}$;边$e \\in E$表示随机变量之间的概率依赖关系。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "成对马尔可夫性\n", "\\begin{align*} \\\\& P\\left(Y_{u},Y_{v}|Y_{O}\\right)=P\\left(Y_{u}|Y_{O}\\right)P\\left(Y_{v}|Y_{O}\\right)\\end{align*} \n", "其中,设$u$和$v$是无向图$G$中任意两个没有边连接的结点,分别对应随机变量$Y_{u}$和$Y_{v}$;其它所有结点为$O$,对应的随机变量组是$Y_{O}$。 \n", "即,给定随机变量组$Y_{O}$的条件下随机变量$Y_{u}$和$Y_{v}$是条件独立的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "局部马尔可夫性\n", "\\begin{align*} \\\\& P\\left(Y_{v},Y_{O}|Y_{W}\\right)=P\\left(Y_{v}|Y_{W}\\right)P\\left(Y_{O}|Y_{W}\\right)\\end{align*} \n", "其中,设$v \\in V$是无向图$G$中任意一个结点,对应的随机变量是$Y_{v}$。$W$是与$v$有边连接的所有的结点,对应的随机变量组是$Y_{W}$。$O$是$v,W$以外的其它所有结点,对应的随机变量组是$Y_{O}$。 \n", "即,给定随机变量组$Y_{W}$的条件下随机变量$Y_{v}$和$Y_{O}$是条件独立的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "全局马尔可夫性\n", "\\begin{align*} \\\\& P\\left(Y_{A},Y_{B}|Y_{C}\\right)=P\\left(Y_{A}|Y_{C}\\right)P\\left(Y_{B}|Y_{C}\\right)\\end{align*} \n", "其中,设结点集合$A,B$是在无向图$G$中被结点集合$C$分开的任意结点集合,对应的随机变量组是$Y_{A},Y_{B},Y_{C}$。 \n", "即,给定随机变量组$Y_{C}$的条件下随机变量$Y_{A}$和$Y_{B}$是条件独立的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "概率无向图模型(马尔可夫随机场):设有联合概率分布$P\\left(Y\\right)$,由无向图$G=\\left(V,E\\right)$表示,在图$G$中,结点表示随机变量,边表示随机变量之间的依赖关系。如果联合概率分布$P\\left(Y\\right)$满足成对、局部或全局马尔可夫性,就称次联合概率分布为概率无向图模型,或马尔可夫随机场。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "团:无向图$G$中任意两个结点均有边连接的结点子集。 \n", "最大团:无向图$G$中的一个团,并且不能再加进任何一个结点使其成为一个更大的团。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "概率无向图模型的联合概率分布\n", "\\begin{align*} \\\\& P\\left(Y\\right)=\\dfrac{1}{Z} \\prod_{C} \\varPsi_{C}\\left(Y_{C}\\right) \\end{align*}\n", "其中,$C$是无向图的最大团,$Y_{C}$是$C$的结点对应的随机变量;势函数$\\varPsi_{C}\\left(Y_{C}\\right)$是严格正的\n", "\\begin{align*} \\\\& \\varPsi_{C}\\left(Y_{C}\\right) = \\exp \\left\\{-E\\left(Y_{C}\\right)\\right\\} \\end{align*}\n", "$Z$是规范化因子,保证$P\\left(Y\\right)$构成一个概率分布\n", "\\begin{align*} \\\\& Z = \\sum_{Y} \\prod_{C} \\varPsi_{C}\\left(Y_{C} \\right) \\end{align*}\n", "乘积是在无向图所有的最大团上进行的。" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "设$X$与$Y$是随机变量,$P\\left(Y|X\\right)$是在给定$X$的条件下$Y$的条件概率分布,若随机变量$Y$构成一个由无向图$G=\\left(V,E\\right)$表示的马尔可夫随机场,即\n", "\\begin{align*} \\\\& P\\left(Y_{v}|X,Y_{w},w \\neq v \\right)= P\\left(X,Y_{w},w \\thicksim v \\right)\\end{align*}\n", "对任意结点$v$成立,则称条件概率分布$P\\left(Y|X\\right)$为条件随机场。其中,$w \\thicksim v$表示在图$G=\\left(V,E\\right)$中与结点$v$有边连接的所有结点$w$,$w \\neq v$表示结点$v$以外的所有结点,$Y_{v}$与$Y_{w}$为结点$v$与$w$对应的随机变量。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "设$X=\\left(X_{1},X_{2}, \\cdots, X_{n}\\right)$,$Y=\\left(Y_{1},Y_{2}, \\cdots, Y_{n}\\right)$均为线性链表示的随机变量序列,若在给定随机变量序列$X$的条件下,随机变量序列$Y$的条件概率分布$P\\left(Y|X\\right)$构成条件随机场,即满足马尔可夫性\n", "\\begin{align*} \\\\& P\\left(Y_{i}|X,Y_{1},\\cdots,Y_{i-1},Y_{i+1},\\cdots,Y_{n} \\right)= P\\left(Y_{i}|X,Y_{i-1},Y_{i+1}\\right)\\end{align*}\n", "称条件概率分布$P\\left(Y|X\\right)$为线性链条件随机场。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性链条件随机场的参数化形式 \n", "设$P\\left(Y|X\\right)$为线性链条件随机场,则在随机变量$X$取值为$x$的条件下,随机变量$Y$取值为$y$的条件概率\n", "\\begin{align*} \\\\& P\\left(y|x\\right)= \\dfrac{1}{Z\\left(x\\right)} \\exp \\left(\\sum_{i,k} \\lambda_{k} t_{k} \\left(y_{i-1},y_{i},x,i\\right)+\\sum_{i,l} \\mu_{l}s_{l} \\left(y_{i},x,i\\right) \\right)\\end{align*}\n", "其中,\n", "\\begin{align*} \\\\& Z\\left(x\\right) = \\sum_{y} \\exp \\left(\\sum_{i,k} \\lambda_{k} t_{k} \\left(y_{i-1},y_{i},x,i\\right)+\\sum_{i,l} \\mu_{l}s_{l} \\left(y_{i},x,i\\right) \\right)\\end{align*}\n", "$t_{k}$是定义在边上特征函数,称为转移特征,依赖于当前和前一个位置;$s_{l}$是定义在结点上的特征函数,称为状态特征,依赖于当前位置。特征函数$t_{k}$和$s_{l}$取值为1或0。$\\lambda_{k}$和$\\mu_{l}$是对应的权值,$Z\\left(x\\right)$是规范化因子,求和是在所有可能的输出序列上进行的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性链条件随机场的简化化形式 \n", "设有$K_{1}$个转移特征,$K_{2}$个状态特征,$K=K_{1}+K_{2}$,记\n", "\\begin{align*} f_{k}\\left(y_{i-1},y_{i},x,i\\right) = \\left\\{\n", "\\begin{aligned} \n", "\\ & t_{k}\\left(y_{i-1},y_{i},x,i\\right), \\quad k=1,2,\\cdots,K_{1}\n", "\\\\ & s_{l}\\left(y_{i},x,i\\right), \\quad k=K_{1}+l; \\quad l=1,2,\\cdots,K_{2}\n", "\\end{aligned}\n", "\\right.\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "转移与状态特征在各个位置$i$求和,记作\n", "\\begin{align*} \\\\& f_{k} \\left(y,x\\right) = \\sum_{i=1}^{n} f_{k}\\left(y_{i-1},y_{i},x,i\\right), \\quad k=1,2, \\cdots,K\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "用$w_{k}$表示特征$f_{k}\\left(y,x\\right)$的权值,即\n", "\\begin{align*} w_{k} = \\left\\{\n", "\\begin{aligned} \n", "\\ & \\lambda_{k}, \\quad k=1,2,\\cdots,K_{1}\n", "\\\\ & \\mu_{l}, \\quad k=K_{1}+l; \\quad l=1,2,\\cdots,K_{2}\n", "\\end{aligned}\n", "\\right.\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "则条件随机场可表示为\n", "\\begin{align*} \\\\& P\\left(y|x\\right)= \\dfrac{1}{Z\\left(x\\right)} \\exp \\sum_{k=1}^{K} w_{k} f_{k}\\left(y,x\\right)\n", "\\\\ & Z\\left(x\\right) = \\sum_{y} \\exp \\sum_{k=1}^{K} w_{k} f_{k}\\left(y,x\\right) \\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "以$w$表示权值向量,即\n", "\\begin{align*} \\\\& w=\\left(w_{1},w_{2}, \\cdots, w_{K}\\right)^{T}\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "以$F\\left(y,x\\right)$表示权局特征向量,即\n", "\\begin{align*} \\\\& F\\left(y,x\\right) = \\left(f_{1}\\left(y,x\\right),f_{2}\\left(y,x\\right), \\cdots, f_{K}\\left(y,x\\right)\\right)^{T} \\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "则条件随机场可写成向量$w$与$F\\left(y,x\\right)$的内积的形式\n", "\\begin{align*} \\\\& P\\left(y|x\\right) = \\dfrac{\\exp \\left(w \\cdot F\\left(y,x\\right)\\right)}{Z_{w}\\left(x\\right)} \\end{align*}\n", "其中,\n", "\\begin{align*} \\\\& Z_{w}\\left(x\\right) = \\sum_{y} \\exp \\left(w \\cdot F\\left(y,x\\right)\\right) \\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设$P_{w}\\left(y|x\\right)$是线性链条件随机场对给定观测序列$x$,相应的标记序列$y$的条件概率。引进特殊的起点标记$y_{0}=start$表示开始状态,特殊的终点标记$y_{n+1}=stop$表示终止状态。对观测序列$x$的每一个位置$i=1,2,\\cdots,n+1$,定义一个$m$阶矩阵($m$是标记$y_{i}$取值的个数)\n", "\\begin{align*} \\\\& M_{i}\\left(x\\right) = \\left[ M_{i} \\left(y_{i-1},y_{i}|x\\right) \\right]_{m \\times m} \n", "\\\\ & M_{i} \\left(y_{i-1},y_{i}|x\\right) = \\exp \\left(W_{i} \\left(y_{i-1},y_{i},|x\\right)\\right)\n", "\\\\ & W_{i} \\left(y_{i-1},y_{i},|x\\right) = \\sum_{i=1}^{K} w_{k} f_{k} \\left(y_{i-1},y_{i},x,i\\right)\\end{align*}\n", "则\n", "\\begin{align*} \\\\& P_{w}\\left(y|x\\right) = \\dfrac{1}{Z_{w}\\left(x\\right)} \\prod_{i=1}^{n+1} M_{i} \\left(y_{i-1},y_{i}|x\\right)\\end{align*}\n", "其中,$Z_{w}$为规范化因子,是$n+1$个矩阵的乘积的元素,是以$start$为起点$stop$为终点通过状态的所有路径$y_{1}y_{2}\\cdots y_{n}$的非规范化概率$\\prod_{i=1}^{n+1} M_{i} \\left(y_{i-1},y_{i}|x\\right)$之和。\n", "\\begin{align*} \\\\& Z_{w}\\left(x\\right) = \\left(M_{1}\\left(x\\right),M_{2}\\left(x\\right), \\cdots,M_{n+1}\\left(x\\right)\\right)_{start,stop}\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对每个指标$i=0,1,\\cdots,n+1$,定义前向向量$\\alpha_{i}\\left(x\\right)$\n", "\\begin{align*} \\alpha_{0}\\left(y|x\\right) = \\left\\{\n", "\\begin{aligned} \n", "\\ & 1, \\quad y=start\n", "\\\\ & 0, \\quad 否则\n", "\\end{aligned}\n", "\\right.\\end{align*} \n", "递推公式\n", "\\begin{align*} \\\\& \\alpha_{i}^{T}\\left(y_{i}|x\\right) = \\alpha_{i-1}^{T}\\left(y_{i}|x\\right) \\left[M_{i}\\left(y_{i-1},y_{i}|x\\right)\\right], \\quad i=1,2,\\cdots,n+1\\end{align*}\n", "又表示为\n", "\\begin{align*} \\\\& \\alpha_{i}^{T} \\left(x\\right) = \\alpha_{i-1}^{T} \\left(x\\right) M_{i}\\left(x\\right)\\end{align*} \n", "$\\alpha_{i}\\left(y_{i}|x\\right)$表示在位置$i$的标记是$y_{i}$并且到位置$i$的前部分标记序列的非规范化概率,$y_{i}$可取的值又$m$个,所以$\\alpha_{i}\\left(x\\right)$是$m$维列向量。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对每个指标$i=0,1,\\cdots,n+1$,定义后向向量$\\beta_{i}\\left(x\\right)$\n", "\\begin{align*} \\beta_{n+1}\\left(y_{n+1}|x\\right) = \\left\\{\n", "\\begin{aligned} \n", "\\ & 1, \\quad y_{n+1}=stop\n", "\\\\ & 0, \\quad 否则\n", "\\end{aligned}\n", "\\right.\\end{align*} \n", "递推公式\n", "\\begin{align*} \\\\& \\beta_{i}\\left(y_{i}|x\\right) = \\left[M_{i}\\left(y_{i},y_{i+1}|x\\right)\\right]\\beta_{i+1}\\left(y_{i+1}|x\\right) , \\quad i=1,2,\\cdots,n+1\\end{align*}\n", "又表示为\n", "\\begin{align*} \\\\& \\beta_{i} \\left(x\\right) = M_{i+1}\\left(x\\right) \\beta_{i+1} \\left(x\\right) \\end{align*} \n", "$\\beta_{i}\\left(y_{i}|x\\right)$表示在位置$i$的标记是$y_{i}$并且到位置$i+1$到$n$的后部分标记序列的非规范化概率。" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "由前向-后向向量,得\n", "\\begin{align*} \\\\& Z\\left(x\\right)=\\alpha_{n}^{T}\\left(x\\right) \\cdot \\mathbf{1} = \\mathbf{1}^{T} \\cdot \\beta_{i}\\left(x\\right) \\end{align*}\n", "其中,$\\mathbf{1}$是元素均为1的$m$维列向量。 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "标记序列中,在位置$i$是标记$y_{i}$的条件概率\n", "\\begin{align*} \\\\& P\\left(Y_{i}=y_{i}|x\\right) = \\dfrac{\\alpha_{i}^{T}\\left(y_{i}|x\\right) \\beta_{i}\\left(y_{i}|x\\right)}{Z\\left(x\\right)}\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "标记序列中,在位置$i-1$是标记$y_{i-1}$,在位置$i$是标记$y_{i}$的条件概率\n", "\\begin{align*} \\\\& P\\left(Y_{i-1}=y_{i-1},Y_{i}=y_{i}|x\\right) = \\dfrac{\\alpha_{i-1}^{T}\\left(y_{i-1}|x\\right) M_{i}\\left(y_{i-1},y_{i}|x\\right)\\beta_{i}\\left(y_{i}|x\\right)}{Z\\left(x\\right)}\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "其中,\n", "\\begin{align*} \\\\& Z\\left(x\\right)=\\alpha_{n}^{T}\\left(x\\right) \\cdot \\mathbf{1} \\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "特征函数$f_{k}$关于条件分布$P\\left(Y|X\\right)$的数学期望\n", "\\begin{align*} \\\\& E_{P\\left(Y|X\\right)} \\left[f_{k}\\right]=\\sum_{y} P\\left(y|x\\right) f_{k}\\left(y,x\\right)\n", "\\\\ & = \\sum_{i=1}^{n+1} \\sum_{y_{i-1},y_{i}} f_{k}\\left(y_{i-1},y_{i},x,i\\right) \\dfrac{\\alpha_{i-1}^{T}\\left(y_{i-1}|x\\right) M_{i}\\left(y_{i-1},y_{i}|x\\right)\\beta_{i}\\left(y_{i}|x\\right)}{Z\\left(x\\right)}\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设经验分布为$\\tilde{P}\\left(x\\right)$,特征函数$f_{k}$关于联合分布$P\\left(X,Y\\right)$的数学期望\n", "\\begin{align*} \\\\& E_{P\\left(X,Y\\right)} \\left[f_{k}\\right]=\\sum_{x,y} P\\left(x,y\\right) \\sum_{i=1}^{n+1} f_{k}\\left(y_{i-1},y_{i},x,i\\right)\n", "\\\\ & = \\sum_{x} \\tilde{P}\\left(x\\right) \\sum_{y} P\\left(y|x\\right) \\sum_{i=1}^{n+1} f_{k}\\left(y_{i-1},y_{i},x,i\\right) \n", "\\\\ & = \\sum_{x} \\tilde{P}\\left(x\\right) \\sum_{i=1}^{n+1} \\sum_{y_{i-1},y_{i}} f_{k}\\left(y_{i-1},y_{i},x,i\\right) \\dfrac{\\alpha_{i-1}^{T}\\left(y_{i-1}|x\\right) M_{i}\\left(y_{i-1},y_{i}|x\\right)\\beta_{i}\\left(y_{i}|x\\right)}{Z\\left(x\\right)} \\quad \\quad k=1,2,\\cdots,K\n", "\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "其中,\n", "\\begin{align*} \\\\& Z\\left(x\\right)=\\alpha_{n}^{T}\\left(x\\right) \\cdot \\mathbf{1} \\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "由训练数据集,得经验概率分布$\\tilde{P}\\left(X,Y\\right)$。 \n", "训练数据的对数似然函数\n", "\\begin{align*} \\\\& L\\left(w\\right)= L_{\\tilde{P}}\\left(P_{w}\\right)=\\log \\prod_{x,y} P_{w}\\left(y|x\\right)^{\\tilde{P}\\left(x,y\\right)} = \\sum_{x,y} \\tilde{P}\\left(x,y\\right) \\log P_{w}\\left(y|x\\right)\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当$P_{w}$是条件随机场模型时,对数似然函数\n", "\\begin{align*} \\\\& L\\left(w\\right)= \\sum_{x,y} \\tilde{P}\\left(x,y\\right) \\log P_{w}\\left(y|x\\right)\n", "\\\\ & = \\sum_{x,y} \\left[\\tilde{P}\\left(x,y\\right) \\sum_{k=1}^{K} w_{k} f_{k}\\left(y,x\\right) - \\tilde{P}\\left(x,y\\right) \\log Z_{w}\\left(x\\right) \\right]\n", "\\\\ & = \\sum_{j=1}^{N} \\sum_{k=1}^{K} w_{k} f_{k} \\left(y_{j},x_{j}\\right) - \\sum_{j=1}^{N} \\log Z_{w} \\left(x_{j}\\right) \\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "设模型当前参数向量\n", "\\begin{align*} \\\\& w = \\left(w_{1},w_{2},\\cdots,w_{K}\\right)^{T}\\end{align*}\n", "向量的增量\n", "\\begin{align*} \\\\& \\delta = \\left(\\delta_{1},\\delta_{2},\\cdots,\\delta_{K}\\right)^{T}\\end{align*}\n", "更新参数向量\n", "\\begin{align*} \\\\& w+\\delta = \\left(w_{1}+\\delta_{1},w_{2}+\\delta_{2},\\cdots,w_{K}+\\delta_{K}\\right)^{T}\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "关于转移特征$t_{k}$的更新方程\n", "\\begin{align*} \\\\& E_{\\tilde{P}} \\left[t_{k}\\right]=\\sum_{x,y} \\tilde{P} \\left(x,y\\right) \\sum_{i=1}^{n+1} t_{k}\\left(y_{i-1},y_{i},x,i\\right)\n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n+1} t_{k}\\left(y_{i-1},y_{i},x,i\\right)\\exp\\left(\\delta_{k}T\\left(x,y\\right)\\right) \\quad \\quad k=1,2,\\cdots,K\n", "\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "关于转移特征$s_{l}$的更新方程\n", "\\begin{align*} \\\\& E_{\\tilde{P}} \\left[s_{l}\\right]=\\sum_{x,y} \\tilde{P} \\left(x,y\\right) \\sum_{i=1}^{n} s_{l}\\left(y_{i},x,i\\right)\n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n} s_{l}\\left(y_{i},x,i\\right)\\exp\\left(\\delta_{K_{1}+l}T\\left(x,y\\right)\\right) \\quad \\quad l=1,2,\\cdots,K_{2}\n", "\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "其中,$T\\left(x,y\\right)$是在数据$\\left(x,y\\right)$中出现的所有特征函数的总和\n", "\\begin{align*} \\\\& T\\left(x,y\\right)=\\sum_{k} f_{k}\\left(y,x\\right)=\\sum_{k=1}^{K} \\sum_{i=1}^{n+1} f_{k} \\left(y_{i-1},y_{i},x,i\\right)\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "条件随机场模型学习的改进的迭代尺度法: \n", "输入:特征函数$t_{1},t_{2},\\cdots,t_{K_{1}}$,$s_{1},s_{2},\\cdots,s_{K_{2}}$,经验分布$\\tilde{P}\\left(x,y\\right)$ \n", "输出:参数估计值$\\hat{w}\\quad$ 模型$P_{\\hat{w}}$ \n", "1. 对所有$k \\in \\left\\{1,2,\\cdots,K\\right\\}$,取初值$w_{k}=0$ \n", "2. 对每一个$k \\in \\left\\{1,2,\\cdots,K\\right\\}$ \n", "2.1当$k=1,2,\\cdots,K_{1}$时,令$\\delta_{k}$是方程\n", "\\begin{align*} \\\\& \\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n+1} t_{k}\\left(y_{i-1},y_{i},x,i\\right)\\exp\\left(\\delta_{k}T\\left(x,y\\right)\\right)=E_{\\tilde{P}} \\left[t_{k}\\right]\n", "\\end{align*}\n", "的解 \n", "当$k=K_{1}+l,l=1,2,\\cdots,K_{2}$时,令$\\delta_{K_{1}+l}$是方程\n", "\\begin{align*} \\\\& \\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n} s_{l}\\left(y_{i},x,i\\right)\\exp\\left(\\delta_{K_{1}+l}T\\left(x,y\\right)\\right)=E_{\\tilde{P}} \\left[s_{l}\\right]\n", "\\end{align*}\n", "的解 \n", "2.2更新$w_{k}$的值:$w_{k} \\leftarrow w_{k}+\\delta_{k}$ \n", "3. 如果不是所有的$w_{k}$都收敛,重复步骤2. \n", "其中,\n", "\\begin{align*} \\\\& T\\left(x,y\\right)=\\sum_{k} f_{k}\\left(y,x\\right)=\\sum_{k=1}^{K} \\sum_{i=1}^{n+1} f_{k} \\left(y_{i-1},y_{i},x,i\\right)\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "算法$S$是解决$T\\left(x,y\\right)$表示数据$\\left(x,y\\right)$中特征总数时,对不同数据$\\left(x,y\\right)$取值可能不同的问题。引入松弛特征\n", "\\begin{align*} \\\\& s\\left(x,y\\right) = S - \\sum_{i=1}^{n+1} \\sum_{k=1}^{K} f_{k} \\left(y_{i-1},y_{i},x,i\\right)\\end{align*}\n", "其中,$S$是一个常数。选取足够大的常数$S$使得对训练数据集的所有数据$\\left(x,y\\right)$,$s\\left(x,y\\right) \\geq 0$成立。这时特征总数可取$S$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对转移特征$t_{k}$,$\\delta_{k}$的更新方程\n", "\\begin{align*} \\\\& \\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n+1} t_{k}\\left(y_{i-1},y_{i},x,i\\right)\\exp\\left(\\delta_{k}S\\right)=E_{\\tilde{P}} \\left[t_{k}\\right]\n", "\\\\ & \\delta_{k}=\\dfrac{1}{S} \\log \\dfrac{E_{\\tilde{P}}\\left[t_{k}\\right]}{E_{P}\\left[t_{k}\\right]}\\end{align*}\n", "其中,\n", "\\begin{align*} \\\\& E_{P} \\left[t_{k}\\right]=\\sum_{x} \\tilde{P}\\left(x\\right) \\sum_{i=1}^{n+1} \\sum_{y_{i-1},y_{i}} t_{k}\\left(y_{i-1},y_{i},x,i\\right) \\dfrac{\\alpha_{i-1}^{T}\\left(y_{i-1}|x\\right) M_{i}\\left(y_{i-1},y_{i}|x\\right)\\beta_{i}\\left(y_{i}|x\\right)}{Z\\left(x\\right)} \n", "\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对状态特征$s_{l}$,$\\delta_{k}$的更新方程\n", "\\begin{align*} \\\\& \\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n} s_{l}\\left(y_{i},x,i\\right)\\exp\\left(\\delta_{K_{1}}S\\right)=E_{\\tilde{P}} \\left[s_{l}\\right]\n", "\\\\ & \\delta_{K_{1}+l}=\\dfrac{1}{S} \\log \\dfrac{E_{\\tilde{P}}\\left[s_{l}\\right]}{E_{P}\\left[s_{l}\\right]}\\end{align*}\n", "其中,\n", "\\begin{align*} \\\\& E_{P} \\left[s_{l}\\right]=\\sum_{x} \\tilde{P}\\left(x\\right) \\sum_{i=1}^{n} \\sum_{y_{i-1},y_{i}} s_{l}\\left(y_{i},x,i\\right) \\dfrac{\\alpha_{i}^{T}\\left(y_{i-1}|x\\right) \\beta_{i}\\left(y_{i}|x\\right)}{Z\\left(x\\right)} \n", "\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "算法$T$是解决算法$S$每步迭代增量向量变大,算法收敛变慢问题。对每个观测序列$x$计算其特征总数最大值$T\\left(x\\right)$\n", "\\begin{align*} \\\\& T\\left(x\\right) = \\max_{y} T\\left(x,y\\right)=t\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "关于转移特征参数的更新方程\n", "\\begin{align*} \\\\& E_{\\tilde{P}} \\left[t_{k}\\right]=\\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n+1} t_{k}\\left(y_{i-1},y_{i},x,i\\right)\\exp\\left(\\delta_{k}T\\left(x,y\\right)\\right) \n", "\\\\ & = \\sum_{x} \\tilde{P} \\left(x\\right) \\sum_{y} P\\left(y|x\\right) \\sum_{i=1}^{n+1} t_{k}\\left(y_{i-1},y_{i},x,i\\right)\\exp\\left(\\delta_{k}T\\left(x,y\\right)\\right) \n", "\\\\ & = \\sum_{x} \\tilde{P}\\left(x\\right) a_{k,t} \\exp\\left(\\delta_k \\cdot t\\right)\n", "\\\\ & = \\sum_{t=0}^{T_{max}} a_{k,t} \\beta_{k}^{t}\n", "\\end{align*}\n", "其中,$a_{k,t}$是特征$t_{k}$的期望,$\\delta_{k}=\\log \\beta_{k}$,$\\beta_{k}$是多项式方程唯一的实根,可以用牛顿法求得,从而得$\\delta_{k}$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "关于状态特征的参数更新方程\n", "\\begin{align*} \\\\& E_{\\tilde{P}} \\left[s_{l}\\right]= \\sum_{x,y} \\tilde{P} \\left(x\\right) P\\left(y|x\\right) \\sum_{i=1}^{n} s_{l}\\left(y_{i},x,i\\right)\\exp\\left(\\delta_{K_{1}+l}T\\left(x,y\\right)\\right) \n", "\\\\ & = \\sum_{x} \\tilde{P} \\left(x\\right) \\sum_{y} P\\left(y|x\\right) \\sum_{i=1}^{n} s_{l}\\left(y_{i},x,i\\right)\\exp\\left(\\delta_{K_{1}+l}T\\left(x,y\\right)\\right) \n", "\\\\ & = \\sum_{x} \\tilde{P} \\left(x\\right) b_{l,t} \\exp \\left(\\delta_{k} \\cdot t\\right)\n", "\\\\ & = \\sum_{t=0}^{T_{max}} b_{l,t} \\gamma_{l}^{t}\n", "\\end{align*}\n", "其中,$b_{l,t}$是特征$s_{l}$的期望,$\\delta_{l}=\\log \\gamma_{l}$,$\\gamma_{l}$是多项式方程唯一的实根,可以用牛顿法求得。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "条件随机场的预测问题是给定条件随机场$P\\left(Y|X\\right)$和输入序列(观测序列)$x$,求条件概率最大的输出序列(标记序列)$y^{*}$,即对观测序列进行标注。\n", "即,\n", "\\begin{align*} \\\\& y^{*}=\\arg\\max_{y}P_{w}\\left(y|x\\right)\n", "\\\\ & = \\arg\\max_{y} \\dfrac{\\exp \\left(w \\cdot F\\left(y,x\\right)\\right)}{Z_{w}\\left(x\\right)}\n", "\\\\ & = \\arg\\max_{y} \\exp \\left(w \\cdot F\\left(y,x\\right)\\right)\n", "\\\\ & = \\arg\\max_{y} \\left(w \\cdot F\\left(y,x\\right)\\right) \\end{align*}\n", "其中,\n", "\\begin{align*} \\\\& w=\\left(w_{1},w_{2},\\cdots,w_{K}\\right)^{T} \n", "\\\\ & F\\left(y,x\\right)=\\left(f_{1}\\left(y,x\\right),f_{2}\\left(y,x\\right),\\cdots,f_{K}\\left(y,x\\right)\\right)^{T}\n", "\\\\ & f_{k}\\left(y,x\\right)=\\sum_{i=1}^{n} f_{k} \\left(y_{i-1},y_{i},x,i\\right),\\quad \\quad k=1,2,\\cdots,K\n", "\\end{align*}\n", "等价的\n", "\\begin{align*} \\\\& \\max_{y} \\sum_{i=1}^{n} w \\cdot F_{i} \\left(y_{i-1},y_{i},x\\right)\\end{align*}\n", "其中,\n", "\\begin{align*} \\\\& F_{i}\\left(y_{i-1},y_{i},x\\right) = \\left(f_{1}\\left(y_{i-1},y_{i},x,i\\right),f_{2}\\left(y_{i-1},y_{i},x,i\\right),\\cdots,f_{K}\\left(y_{i-1},y_{i},x,i\\right)\\right)^{T}\\end{align*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "条件随机场预测的维特比算法: \n", "输入:模型特征函数$F\\left(y,x\\right)$和权值向量$w$,观测序列$x=\\left(x_{1},x_{2},\\cdots,x_{n}\\right)$ \n", "输出:最优序列$y^{*}=\\left(y_{1}^{*},y_{2}^{*},\\cdots,y_{n}^{*}\\right)$ \n", "1. 初始化\n", "\\begin{align*} \\\\& \\delta_{1}\\left(j\\right)=w \\cdot F_{1}\\left(y_{0}=start,y_{1}=j,x\\right),\\quad j=1,2,\\cdots,m\\end{align*}\n", "2. 递推,对$i=2,3,\\cdots,n$ \n", "\\begin{align*} \\\\& \\delta_{i}\\left(l\\right)=\\max_{1 \\leq j \\leq m}\\left\\{\\delta_{i-1}\\left(j\\right)+w \\cdot F_{i}\\left(y_{i-1}=j,y_{i}=l,x\\right)\\right\\},\\quad l=1,2,\\cdots,m\n", "\\\\ & \\varPsi_{i}\\left(l\\right) = \\arg \\max_{1 \\leq j \\leq m}\\left\\{\\delta_{i-1}\\left(j\\right)+w \\cdot F_{i}\\left(y_{i-1}=j,y_{i}=l,x\\right)\\right\\},\\quad l=1,2,\\cdots,m\\end{align*}\n", "3. 终止\n", "\\begin{align*} \\\\& \\max_{y} \\left(w \\cdot F\\left(y,x\\right)\\right)=\\max_{1 \\leq j \\leq m} \\delta_{n}\\left(j\\right)\n", "\\\\ & y_{n}^{*}=\\arg \\max_{1 \\leq j \\leq m}\\delta_{n}\\left(j\\right) \\end{align*}\n", "4. 返回路径\n", "\\begin{align*} \\\\& y_{i}^{*}=\\varPsi_{i+1}\\left(y_{i+1}^{*}\\right),\\quad i=n-1,n-2,\\cdots,1\\end{align*}\n", "最优路径\n", "\\begin{align*} \\\\& y_{i}^{*}=\\left(y_{1}^{*},y_{2}^{*},\\cdots,y_{n}^{*}\\right)\\end{align*}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }