{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "这一节聊一下HMM与CRF之间的区别与联系,内容主要参考自李宏毅老师的视频:[《HMM/CRF by 李宏毅》](https://www.bilibili.com/video/BV1zJ411575b?from=search&seid=11892221500659184153)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 一.联系\n", "\n", "先看看它们有什么相同的地方,我们知道HMM模型可以表示如下: \n", "\n", "$$\n", "P(x,y)=\\prod_{i=1}^np(y_{i}\\mid y_{i-1})p(x_i\\mid y_i)\n", "$$ \n", "\n", "这里,起始点看做$p(y_1\\mid y_0)=p(y_1)$,而CRF模型可以写作: \n", "\n", "$$\n", "P(y\\mid x)=\\frac{exp(w^TF(y,x))}{Z(x)}\n", "$$ \n", "\n", "似乎看起来风马牛不相及,但如果HMM做一点变换,它也可以写作CRF的形式,假设$x_i,i=1,2,..,n$可能取值的情况有$N$种:$o_1,o_2,...,o_N$,而$y_i,i=1,2,...,n$可能取值的情况有$M$种:$q_1,q_2,...,q_M$,那么: \n", "\n", "$$\n", "P(x,y)=\\prod_{i=1}^np(y_{i}\\mid y_{i-1})p(x_i\\mid y_i)\\\\\n", "=\\prod_{i=1}^n\\prod_{k_1=1}^M\\prod_{k_2=1}^M\\prod_{t=1}^Np(y_{i}=q_{k_1}\\mid y_{i-1}=q_{k_2})^{I(y_{i}=q_{k_1} , y_{i-1}=q_{k_2})}p(x_i=o_t\\mid y_i=q_{k_1})^{I(x_i=o_t,y_i=q_{k_1})}\\\\\n", "=\\prod_{k_1=1}^M\\prod_{k_2=1}^M\\prod_{t=1}^Np(q_{k_1}\\mid q_{k_2})^{N_1(q_{k_1},q_{k_2})}p(o_t\\mid q_{k_1})^{N_2(o_t,q_{k_1})}\\\\\n", "=exp(\\sum_{k_1=1}^M\\sum_{k_2=1}^MN_1(q_{k_1},q_{k_2})logp(q_{k_1}\\mid q_{k_2})+\\sum_{k_1=1}^M\\sum_{t=1}^NN_2(o_t,q_{k_1})logp(o_t\\mid q_{k_1}))\\\\\n", "=exp(logp(q_1\\mid q_1)\\cdot N_1(q_1,q_2)+\\cdots+logp(q_M\\mid q_M)\\cdot N_1(q_M,q_M)+logp(o_1,\\mid q_1)\\cdot N_2(o_1,q_2)+\\cdots logp(o_N\\mid q_M)\\cdot N_2(o_N,q_M) )\\\\\n", "=exp([logp(q_1\\mid q_1),...,logp(o_N\\mid q_M)][N_1(q_1,q_2),...,N_2(o_N,q_M)]^T)\\\\\n", "=exp(w^TF(y,x))\n", "$$ \n", "\n", "这里,$I(\\cdot)$表示指示函数,$I(x_i=o_t,y_i=q_{k_1})$表示如果$x_i=o_t$且$y_i=q_{k_1}$,那么返回结果1,否则返回结果0,$I(y_{i}=q_{k_1},y_{i-1}=q_{k_2})$表示,如果$y_{i}=q_{k_1}$且$y_i=q_{k_2}$那么返回结果1,否则返回结果0,而: \n", "\n", "$$\n", "N_1(q_{k_1},q_{k_2})=\\sum_{i=1}^nI(y_{i}=q_{k_1},y_{i-1}=q_{k_2})\\\\\n", "N_2(o_t,q_{k_1})=\\sum_{i=1}^nI(x_i=o_t,y_i=q_{k_1})\n", "$$ \n", "\n", "显然,最后有: \n", "\n", "$$\n", "w=[logp(q_1\\mid q_1),...,logp(o_N\\mid q_M)]^T\\\\\n", "F(y,x)=[N_1(q_1,q_2),...,N_2(o_N,q_M)]^T\n", "$$ \n", "\n", "大家看一下,这是不是就和CRF一一对应了呢,上面函数$I(\\cdot)$对应了特征函数$f_k(y_{i-1},y_i,x,i)$,$N1(\\cdot),N_2(\\cdot)$对应了特征函数出现的次数的累计$f_k(y,x)$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 二.区别\n", "虽然它们具有一样的表现形式,但是训练它们的“驱动力”不一样,HMM是最大化的联合概率分布的似然估计,而CRF是最大化的条件概率分布的似然估计,虽然它们都在最大化已观测数据的概率: \n", "\n", "(1)但是,CRF对于未观测到的模式具有一定抑制的作用(对于$Z(x)$,有缩小它的趋势,而$Z(x)$中包含了我们未能观测数据的概率),所以,如果数据量足够可以考虑采用CRF(数据量太少,那些未出现的“好模式”也会被抑制) \n", "\n", "(2)那么,反过来,HMM对于未知模式就具有一定的“创造性”了,首先它对于未知模式没有抑制作用,其次,由于条件独立性假设,它还会提高某些未知模式的概率,例如,训练集中只有两种模式“A->B-D”和“H->B->C”,它们的训练样本各占一半,那么对于“A->B->C”这种从未出现过的模式,它也能给出50%的概率值,但是这也是它的弊端,因为对于未知模式,也包含了“坏模式”" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 小结一下\n", "\n", "(1)HMM和CRF对于已观测数据都起着积极作用,即提升这部分数据出现的概率值; \n", "\n", "(2)对于未观测数据,CRF对于“好坏模式”都会抑制,而HMM对于“好坏模式”都会提升,所以对于算法学习来说两者不分伯仲,难分好坏; **但是呢**,目前可用于训练的数据量往往很大,而且逐年累增,所以CRF可能更胜一筹" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }