{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "训练数据集\n", "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", "假设分类模型是条件概率分布$P \\left( Y | X \\right), X \\in \\mathcal{X} \\subseteq R^{n}$表示输入,$Y \\in \\mathcal{Y}$表示输出。给定输入$X$,以条件概率$P \\left( Y | X \\right)$输出$Y$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "特征函数$f \\left( x, y \\right)$描述输入$x$和输出$y$之间的某一事实,\n", "\\begin{align*} f \\left( x, y \\right) = \\left\\{\n", "\\begin{aligned} \n", "\\ & 1, x与y满足某一事实\n", "\\\\ & 0, 否则\n", "\\end{aligned}\n", "\\right.\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "特征函数$f \\left( x, y \\right)$关于经验分布$ \\tilde{P} \\left( X, Y \\right) $的期望\n", "\\begin{align*} \\\\& E_{ \\tilde{P} } \\left( f \\right) = \\sum_{x, y} \\tilde{P} \\left( x, y \\right) f \\left( x, y \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "特征函数$f \\left( x, y \\right)$关于模型$ P \\left( Y | X \\right) $与经验分布$ \\tilde{P} \\left( X \\right) $的期望\n", "\\begin{align*} \\\\& E_{ P } \\left( f \\right) = \\sum_{x, y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) f \\left( x, y \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最大熵模型:假设满足所有约束条件的模型集合为\n", "\\begin{align*} \\\\& \\mathcal{C} \\equiv \\left\\{ P \\in \\mathcal{P} | E_{ P } \\left( f_{i} \\right) = E_{ \\tilde{P} } \\left( f_{i} \\right), i = 1,2, \\cdots, n \\right\\}\\end{align*} \n", "定义在条件概率分布$P \\left( Y | X \\right)$上的条件熵为 \n", "\\begin{align*} \\\\& H \\left( P \\right) = - \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) \\end{align*} \n", "则模型集合$\\mathcal{C}$中条件熵$H \\left( P \\right)$最大的模型称为最大熵模型。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "给定训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$以及特征函数$f_{i} \\left( x, y \\right), i = 1, 2, \\cdots, n$,最大熵模型的学习等价于最优化问题: \n", "\\begin{align*} \\\\& \\max_{P \\in \\mathcal{C} } \\quad H \\left( P \\right) = - \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) \n", "\\\\ & s.t.\\quad E_{ P } \\left( f_{i} \\right) = E_{ \\tilde{P} } \\left( f_{i} \\right), i = 1,2, \\cdots, n \n", "\\\\ & \\sum_{y} P \\left( y | x \\right) = 1 \\end{align*} \n", "等价的 \n", "\\begin{align*} \\\\& \\min_{P \\in \\mathcal{C} } \\quad -H \\left( P \\right) = \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) \n", "\\\\ & s.t.\\quad E_{ P } \\left( f_{i} \\right) - E_{ \\tilde{P} } \\left( f_{i} \\right) = 0, i = 1,2, \\cdots, n \n", "\\\\ & \\sum_{y} P \\left( y | x \\right) = 1 \\end{align*} " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "最优化问题的求解: \n", "1. 引入拉格朗日乘子$w_{i}, i = 0,1, \\cdots, n$,定义拉格朗日函数$L \\left( P, w \\right)$\n", "\\begin{align*} \\\\& L \\left( P, w \\right) = - H \\left( P \\right) + w_{0} \\left( 1 - \\sum_{y} P \\left( y | x \\right) \\right) + \\sum_{i=1}^{n} w_{i} \\left( E_{P} \\left( f_{i} \\right) - E_{\\tilde{P}} \\left( f_{i} \\right) \\right) \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) \\log P \\left( y | x \\right) + w_{0} \\left( 1 - \\sum_{y} P \\left( y | x \\right) \\right) \n", "\\\\ & \\quad + \\sum_{i=1}^{n} w_{i} \\left( \\sum_{x, y} \\tilde{P} \\left( x \\right) P \\left( y | x \\right) f_{i} \\left( x, y \\right) - \\sum_{x, y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", "2. 求$\\min_{P \\in \\mathcal{C} } L \\left( P, w \\right)$: \n", "记对偶函数$\\Psi \\left( w \\right) = min_{P \\in \\mathcal{C} } L \\left( P, w \\right) = L \\left( P_{w}, w \\right)$,其解记$P_{w} = \\arg \\min_{P \\in \\mathcal{C} } L \\left( P, w \\right) = P_{w} \\left( y | x \\right)$\n", "\\begin{align*} \\\\& \\dfrac {\\partial L \\left( P, w \\right)} {\\partial P \\left( y | x \\right)} = \\sum_{x,y} \\tilde{P} \\left( x \\right) \\left( \\log P \\left( y | x \\right) + 1 \\right) - \\sum_{y} w_{0} - \\sum_{x,y} \\left( \\tilde{P} \\left( x \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \n", "\\\\ & \\quad = \\sum_{x,y} \\tilde{P} \\left( x \\right) \\left( \\log P \\left( y | x \\right) + 1 \\right) - \\sum_{x,y} P \\left( x \\right) w_{0} - \\sum_{x,y} \\left( \\tilde{P} \\left( x \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \n", "\\\\ & \\quad = \\sum_{x,y} \\tilde{P} \\left( x \\right) \\left( \\log P \\left( y | x \\right) + 1 - w_{0} - \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) = 0\\end{align*} \n", "由于$\\tilde{P} \\left( x \\right) > 0 $,得\n", "\\begin{align*} \\\\ & \\log P \\left( y | x \\right) + 1 - w_{0} - \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right)=0\n", "\\\\ & P \\left( y | x \\right) = \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) + w_{0} -1 \\right) = \\dfrac{ \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) }{ \\exp \\left( 1 - w_{0} \\right)}\\end{align*} \n", "由于 \n", "\\begin{align*} \\\\& \\sum_{y} P \\left( y | x \\right) = 1 \\end{align*} \n", "则\n", "\\begin{align*} \\\\ & \\sum_{y} P \\left( y | x \\right) = \\sum_{y} \\dfrac{ \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) }{ \\exp \\left( 1 - w_{0} \\right)} = 1 \n", "\\\\ & \\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) = \\exp \\left( 1 - w_{0} \\right)\\end{align*} \n", "代入,得\n", "\\begin{align*} \\\\ & P \\left( y | x \\right) = \\dfrac{1 }{Z_{w} \\left( x \\right)}\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", "其中\n", "\\begin{align*} Z_{w} = \\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", "$Z_{w}$称为规范化因子;$f_{i} \\left( x, y \\right)$是特征函数;$w_{i}$是特征的权值。\n", "3. 求$\\max_{w} \\Psi \\left( w \\right)$ \n", "将其解记为$w^{*}$,即\n", "\\begin{align*} w^{*} = \\arg \\max_{w} \\Psi \\left( w \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "已知训练数据的经验概率分布$\\tilde{P} \\left( X, Y \\right)$,则条件概率分布$P \\left( X | Y \\right)$的对数似然函数\n", "\\begin{align*} \\\\ & L_{\\tilde{P}} \\left( P_{w} \\right) = \\log \\prod_{x,y} P \\left( y | x \\right)^{\\tilde{P} \\left( x, y \\right)} \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P \\left( y | x \\right)\n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log \\dfrac{\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)}{Z_{w} \\left( x \\right) }\n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log Z_{w} \\left( x \\right) \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right)\\end{align*} \n", "对偶函数\n", "\\begin{align*} \\\\ & \\Psi \\left( w \\right) = min_{P \\in \\mathcal{C} } L \\left( P, w \\right) = L \\left( P_{w}, w \\right) \\\\ & = - H \\left( P_{w} \\right) + w_{0} \\left( 1 - \\sum_{y} P_{w} \\left( y | x \\right) \\right) + \\sum_{i=1}^{n} w_{i} \\left( E_{\\tilde{P}} \\left( f_{i} \\right) - E_{P_{w}} \\left( f_{i} \\right) \\right) \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) \\log P_{w} \\left( y | x \\right)\n", "\\\\& \\quad\\quad\\quad + w_{0} \\left( 1 - \\sum_{y} \\dfrac{1 }{Z_{w} \\left( x \\right)}\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\right)\n", "\\\\ & \\quad\\quad\\quad + \\sum_{i=1}^{n} w_{i} \\left( \\sum_{x, y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) - \\sum_{x, y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\right) \n", "\\\\ & = \\sum_{x, y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) + \\sum_{x,y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) \\left( \\log P_{w} \\left( y | x \\right) - \\sum_{i=1}^{n} w_{i} f_{i} \\left(x, y \\right) \\right) \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log Z_{w} \\left( x \\right) \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right)\\end{align*} \n", "得\n", "\\begin{align*} \\\\ & L_{\\tilde{P}} \\left( P_{w} \\right) = \\Psi \\left( w \\right)\\end{align*} \n", "即,最大熵模型的极大似然估计等价于对偶函数极大化。" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "已知最大熵模型\n", "\\begin{align*} \\\\ & P_{w} \\left( y | x \\right) = \\dfrac{1 }{Z_{w} \\left( x \\right)}\\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", "其中\n", "\\begin{align*} Z_{w} = \\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right) \\end{align*} \n", "$Z_{w}$称为规范化因子;$f_{i} \\left( x, y \\right)$是特征函数;$w_{i}$是特征的权值。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对数似然函数\n", "\\begin{align*} \\\\ & L \\left( w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P_{w} \\left( y | x \\right)\n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于给定的经验分布$\\tilde{P}$,模型参数从$w$到$w + \\delta$,对数似然函数的改变量\n", "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P_{w + \\delta} \\left( y | x \\right) - \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\log P_{w} \\left( y | x \\right)\n", "\\\\ & = \\left( \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\left( w_{i} + \\delta_{i} \\right) f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w + \\delta} \\left( x \\right) \\right) \n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log Z_{w} \\left( x \\right) \\right)\n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\log \\dfrac{Z_{w + \\delta} \\left( x \\right)}{Z_{w} \\left( x \\right)}\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "由\n", "\\begin{align*} - \\log \\alpha \\geq 1 - \\alpha, \\alpha > 0 \\end{align*} \n", "得\n", "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) \\geq \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\dfrac{Z_{w + \\delta} \\left( x \\right)}{Z_{w} \\left( x \\right)}\n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\dfrac{\\sum_{y} \\exp \\left( \\sum_{i=1}^{n} \\left( w_{i} + \\delta_{i} \\right) f_{i} \\left( x, y \\right) \\right)}{\\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)} \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} \\dfrac{ \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)}{\\sum_{y} \\exp \\left( \\sum_{i=1}^{n} w_{i} f_{i} \\left( x, y \\right) \\right)} \\exp \\left( \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) \\right) \n", "\\\\ & = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\exp \\left( \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) \\right)\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "记\n", "\\begin{align*} \\\\ & A \\left( \\delta | w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\exp \\left( \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) \\right)\\end{align*} \n", "则 \n", "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) \\geq A \\left( \\delta | w \\right)\\end{align*} \n", "即$ A \\left( \\delta | w \\right)$是对数似然函数改变量的一个下届。" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "引入\\begin{align*} \\\\ & f^{\\#} \\left( x, y \\right) = \\sum_{i} f_{i} \\left( x, y \\right) \\end{align*} \n", "表示所有特征在$\\left( x, y \\right)$出现的次数 \n", "则\n", "\\begin{align*} \\\\ & A \\left( \\delta | w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\exp \\left( f^{\\#} \\left( x, y \\right) \\sum_{i=1}^{n} \\dfrac{\\delta_{i} f_{i} \\left( x, y \\right) }{f^{\\#} \\left( x, y \\right) } \\right)\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对任意$i$,有$\\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\geq 0$且$\\sum_{i=1}^{n} \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} = 1$, \n", "根据Jensen不等式,得 \n", "\\begin{align*} \\\\ & \\exp \\left( \\sum_{i=1}^{n} \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\delta_{i} f_{\\#} \\left( x, y \\right) ) \\right) \\leq \\sum_{i=1}^{n} \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", "则\n", "\\begin{align*} \\\\ & A \\left( \\delta | w \\right) \\geq \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\sum_{i=1}^{n} \\left( \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "记\n", "\\begin{align*} \\\\ & B \\left( \\delta | w \\right) = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) \\sum_{i=1}^{n} \\delta_{i} f_{i} \\left( x, y \\right) + 1 - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) \\sum_{i=1}^{n} \\left( \\dfrac{f_{i} \\left( x, y \\right)}{f^{\\#} \\left( x, y \\right)} \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", "则 \n", "\\begin{align*} \\\\ & L \\left( w + \\delta \\right) - L \\left( w \\right) \\geq A \\left( \\delta | w \\right) \\geq B \\left( \\delta | w \\right)\\end{align*} \n", "即$ B \\left( \\delta | w \\right)$是对数似然函数改变量的一个新的(相对不紧的)下届。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "求\n", "\\begin{align*} \\\\ & \\dfrac {\\partial B \\left( \\delta | w \\right) }{\\partial \\delta_{i}} = \\sum_{x,y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) - \\sum_{x} \\tilde{P} \\left( x \\right) \\sum_{y} P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", "令$ \\dfrac {\\partial B \\left( \\delta | w \\right) }{\\partial \\delta_{i}} = 0 $ \n", "得 \n", "\\begin{align*} \\\\ & \\sum_{x,y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) = \\sum_{x, y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right)\\end{align*} \n", "对$\\delta_{i}$求解可解得$\\delta$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "改进的迭代尺度算法(IIS): \n", "输入:特征函数$f_{i},i=1, 2, \\cdots, n$,经验分布$\\tilde{P} \\left( x, y \\right)$,模型$P_{w} \\left( y | x \\right)$ \n", "输出:最优参数值$w_{i}^{*}$;最优模型$P_{w^{*}}$ \n", "1. 对所有$i \\in \\left\\{ 1, 2, \\cdots, n \\right\\}$,取$w_{i} = 0$;\n", "2. 对每一$i \\in \\left\\{ 1, 2, \\cdots, n \\right\\}$ \n", "2.1 令$\\delta_{i}$是方程\n", "\\begin{align*} \\\\ & \\sum_{x,y} \\tilde{P} \\left( x, y \\right) f_{i} \\left( x, y \\right) = \\sum_{x, y} \\tilde{P} \\left( x \\right) P_{w} \\left( y | x \\right) f_{i} \\left( x, y \\right) \\exp \\left( \\delta_{i} f^{\\#} \\left(x, y\\right) \\right) \\end{align*} \n", "的解 \n", "2.2 更新$w_{i}$的值\n", "\\begin{align*} \\\\ & w_{i} \\leftarrow w_{i} + \\delta_{i}\\end{align*} \n", "3. 如果不是所有$w_{i}$都收敛,重复步骤2." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }