{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.6 信息论" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 熵" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "考虑一个离散变量 $x$。\n", "\n", "对于该变量的一个观测值,我们需要一个量来衡量从这个观测量中得到了多少关于 $x$ 的信息。\n", "从主观意义上来说,一个不常见的观测值所提供的新信息要比常见的观测值要多。\n", "\n", "我们的信息量 $h$ 依赖于一个概率分布 $p(x)$,$h(x)$ 需要满足这样的性质:\n", "- 关于 $p(x)$ 是单调(递减)的,高概率对应的信息量低,低概率对应的信息量高;\n", "- 假设两个事件 $x, y$ 是不相关的,即 $p(x, y)=p(x)p(y)$,那么从事件 $(x, y)$ 中获得的信息量应该等于单独从 $x$ 和 $y$ 事件中获得的信息量之和,即 $h(x, y)=h(x) + h(y)$;\n", "\n", "我们将 $h$ 和 $p$ 之间的关系表示为 $h(p)$,从上面的例子中,我们可以看到\n", "\n", "$$\n", "h(pq) = h(p) h(q)\n", "$$\n", "\n", "归纳法不难给出 $h(p^n) = nh(p)$,所以\n", "\n", "$$\n", "h(p^{n/m}) = nh(p^{1/m}) = \\frac n m h(p^{m/m}) = \\frac n m h(p)\n", "$$\n", "\n", "因此由连续性,我们有 $h(p^x) = x h(p)$。\n", "\n", "对任意正数 $p,q$,总有 $q^x = p$ 成立,则我们有\n", "\n", "$$\n", "\\frac{h(p)}{\\ln (p)} = \\frac{h(q^x)}{\\ln (p^x)} = \\frac{xh(q)}{x\\ln (q)} = \\frac{h(q)}{\\ln (q)}\n", "$$\n", "\n", "从而 $h(p) \\propto \\ln p$,即 $h$ 与 $p$ 的对数成正比。\n", "\n", "一个可能的定义为 \n", "\n", "$$\n", "h(x) = - \\log_2 p(x)\n", "$$\n", "\n", "负号保证信息量为正且满足单调递减的要求,在这里,我们选取了 2 作为对数的底,因此 $h(x)$ 的单位为比特(`bit`)。\n", "\n", "有了对于单一的 $x$ 的信息量,更一般的想法是衡量一组 $x$ 的信息量,因此,我们考虑对 x 求期望,得到熵(`entropy`):\n", "\n", "$$\n", "H[x]=-\\sum_x p(x) \\log_2 p(x)\n", "$$\n", "\n", "因为 $\\lim_{p\\to 0} plnp = 0$,所以如果 $p(x)=0$,那么 $p(x)\\ln p(x)=0$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 编码长度和熵" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "现在假设我们要传送状态变量给一个接收器。\n", "\n", "考虑一个有 8 个状态的变量 $x$,每个状态都是等概率的。为了表示这个状态,我们至少需要 3 比特的信息来表示状态,此时熵值为\n", "\n", "$$\n", "H[x] = - 8 \\times \\frac 1 8 \\log_2 \\frac 1 8 = 3~\\text{bits}\n", "$$\n", "\n", "再考虑不等概率的情况,假设状态和概率为\n", "\n", "|a|b|c|d|e|f|g|h|\n", "|---|---|---|---|---|---|---|---|\n", "|$\\frac{1}{2}$|$\\frac{1}{4}$|$\\frac{1}{8}$|$\\frac{1}{16}$|$\\frac{1}{64}$|$\\frac{1}{64}$|$\\frac{1}{64}$|$\\frac{1}{64}$|\n", "\n", "熵为\n", "\n", "$$\n", "H[x] = - \\frac{1}{2} \\log_2 \\frac{1}{2} - \\frac{1}{4}\\log_2 \\frac{1}{4} - \\frac{1}{8} \\log_2 \\frac{1}{8} - \\frac{1}{16} \\log_2 \\frac{1}{16} - \\frac{4}{64} \\log_2 \\frac{1}{64} = 2~\\text{bits}\n", "$$\n", "\n", "不等概率的熵要比等概率的熵要小。\n", "\n", "考虑字符的霍夫曼编码:\n", "\n", "|a|b|c|d|e|f|g|h|\n", "|---|---|---|---|---|---|---|---|\n", "|0|10|110|1110|111100|111101|111110|111111|\n", "\n", "其平均字符长度为:\n", "\n", "$$\n", "\\frac{1}{2} \\times 1 + \\frac{1}{4} \\times 2 + \\frac{1}{8} \\times 3 + \\frac{1}{16} \\times 4 + \\frac{4}{64} \\times 6 = 2~\\text{bits}\n", "$$\n", "\n", "因此,熵和霍夫曼编码长度之间存在一定的关系。\n", "\n", "`Shannon` 在 1948 年的 `noiseless coding theorem` 说明,这样的熵是编码长度的下界。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 自然对数熵" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "现在考虑自然对数表示的熵。此时熵的单位是 `nat` 而不是 `bit`。\n", "\n", "我们之前用平均信息量的方式引入了熵,事实上熵的概念是从物理上引入的。\n", "\n", "假设我们将 $N$ 个相同的小球放入一些容器中,使得第 $i$ 个容器中有 $n_i$ 个球,$\\sum_i n_i = N$。\n", "\n", "我们按顺序从第 1 个容器开始往里面放入小球,第 1 个有 $N$ 种选择,第 2 个有 $N-1$ 种选择……总共有 $N!$ 种选择。\n", "\n", "但是放入每个容器中的 $n_i$ 个球是不区分顺序的,有 $n_i!$ 种可能性放置这些小球,因此总的可能方法有\n", "\n", "$$\n", "W = \\frac{N!}{\\prod_i n_i!}\n", "$$\n", "\n", "定义熵为 \n", "\n", "$$\n", "H = \\frac{1}{N} \\ln W = \\frac 1 N\\ln N! - \\frac 1 N \\sum_{i} \\ln n_i !\n", "$$\n", "\n", "当 $N \\to \\infty$ 时,由 `Stirling` 近似公式:\n", "\n", "$$\n", "\\ln N! \\approx N \\ln N - N\n", "$$\n", "\n", "我们有\n", "\n", "$$\n", "H = \\lim_{N\\to \\infty} \\frac 1 N (N\\ln N - N) - \\frac 1 N \\sum_{i} (n_i \\ln n_i - n_i) \n", "= -\\lim_{N\\to \\infty} \\sum_i \\left(\\frac{n_i}{N}\\right) \\ln \\left(\\frac{n_i}{N}\\right)\n", "= -\\sum_i p_i \\ln p_i\n", "$$\n", "\n", "$p_i = \\lim_{N\\to\\infty} \\left(\\frac{n_i}{N}\\right)$ 是小球放入第 i 个容器的概率。\n", "\n", "在物理上,容器中小球的分布叫做微观状态,各个容器中的小球数目叫做宏观状态。\n", "\n", "对于一个离散随机变量 $X$ 的取值 $x_i$ 当作一个容器,即有 $p(X=x_i)=p_i$,则它的熵为:\n", "\n", "$$\n", "H[p] = - \\sum_i p(x_i) \\ln p(x_i)\n", "$$\n", "\n", "一般来说,一个离散分布的点越多,分布越均匀,那么对应的熵也越大。\n", "\n", "如果其中有一个 $p_i=1$,那么熵取到最小值 0,最大值可以用 Lagrange 乘子求解,Lagrange 函数为\n", "\n", "$$\n", "\\tilde H = - \\sum_i p(x_i) \\ln p(x_i) + \\lambda \\left(\\sum_i p(x_i) - 1\\right)\n", "$$\n", "\n", "最大化这个函数得到:\n", "\n", "$$\n", "\\frac{\\partial \\tilde H}{\\partial p(x_i)} = -1 -\\ln p(x_i) + \\lambda = 0\n", "$$\n", "\n", "得到最大熵应该满足一个均匀离散分布 $p(x_i)=p(x_j), \\forall i,j$。\n", "\n", "若状态总数为 $M$,则 $p(x_i) = \\frac 1 M$,此时,我们有 $H = \\ln M$。\n", "\n", "为了验证的确是个最大值,我们可以计算它的二阶导数来验证。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 连续变量的熵" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于连续变量,我们将 $x$ 划分为长度为 $\\Delta$ 的小段,积分中值定理告诉我们,在第 $i$ 小段 $(i\\Delta, (i+1)\\Delta)$ 中存在 $x_i$,使得:\n", "\n", "$$\n", "\\int_{i\\Delta}^{i+1\\Delta} p(x) dx = p(x_i) \\Delta \n", "$$\n", "\n", "我们可以将落在第 $i$ 小段的 $x$ 对应为 $x_i$,则观测到 $x_i$ 的概率为 $p(x_i) \\Delta$,这给出了一个关于 $x_i$ 的离散分布,其熵为:\n", "\n", "$$\n", "H_{\\Delta} = - \\sum_i p(x_i) \\Delta \\ln(p(x_i) \\Delta) = - \\sum_i p(x_i) \\Delta \\ln(p(x_i)) - \\ln\\Delta\n", "$$\n", "\n", "其中 $p(x_i) \\Delta= 1$,现在忽略最后的常数项,然后考虑 $\\Delta \\to0$ 的情况,我们有\n", "\n", "$$\n", "-\\lim_{\\Delta \\to 0} \\left\\{ \\sum_i p(x_i) \\Delta \\ln(p(x_i) \\Delta) \\right\\} = -\\int p(x) \\ln p(x) dx\n", "$$\n", "\n", "这一项叫做微分熵(`differential entropy`),它与离散熵的差别在于一个 $\\ln \\Delta$,在 $\\Delta \\to0$ 时会发散的部分。\n", "\n", "对于多元向量 $\\mathbf x$,微分熵定义为\n", "\n", "$$\n", "H[\\mathbf x]=-\\int p(\\mathbf x) \\ln p(\\mathbf x) d\\mathbf x\n", "$$\n", "\n", "\n", "在离散分布的例子中,我们看到最大化熵的分布是均匀离散分布。\n", "\n", "为了最大化连续分布的熵,我们需要加上 $p(x)$ 的一阶矩和二阶矩都固定的条件。\n", "\n", "考虑一维情况,我们的问题为\n", "\n", "$$\n", "\\begin{align}\n", "\\max_{p(x)}~& -\\int p(x) \\ln p(x) dx \\\\\n", "s.t.~& 1 = \\int_{-\\infty}^{\\infty} p(x) dx \\\\\n", "& \\mu = \\int_{-\\infty}^{\\infty} xp(x) dx\\\\\n", "& \\sigma^2 = \\int_{-\\infty}^{\\infty} (x-\\mu)^2p(x) dx\\\\\n", "\\end{align}\n", "$$\n", "\n", "\n", "`Lagrange` 函数为\n", "\n", "$$\n", "-\\int p(x) \\ln p(x) dx + \\lambda_1 \\left(\\int_{-\\infty}^{\\infty} p(x) dx - 1\\right) + \\lambda_2 \\left(\\int_{-\\infty}^{\\infty} xp(x) dx - \\mu\\right) + \\lambda_3 \\left(\\int_{-\\infty}^{\\infty} (x-\\mu)^2p(x) dx - \\sigma^2) \\right)\n", "$$\n", "\n", "令这个泛函对 $p(x)$ 的导数为 `0`,得到\n", "\n", "$$\n", "1-\\ln p(x)+\\lambda_1 + \\lambda_2 x + \\lambda_3 (x-\\mu)^2 = 0\n", "$$\n", "\n", "从而\n", "\n", "$$\n", "p(x) = \\exp\\left\\{1+\\lambda_1 + \\lambda_2 x + \\lambda_3 (x-\\mu)^2\\right\\}\n", "$$\n", "\n", "将这个形式带入约束条件,不难得到\n", "\n", "$$\n", "p(x) = \\frac{1}{(2\\pi\\sigma^2)^{1/2}} \\exp\\left\\{-\\frac{(x-\\mu)^2}{2\\sigma^2}\\right\\}\n", "$$\n", "\n", "即最大熵的分布是高斯分布。\n", "\n", "注意到,我们并没有对 $p(x)$ 做非负的约束,但是最大化给出的结果是一个非负的结果。\n", "\n", "高斯分布的熵为:\n", "\n", "$$\n", "\\frac{1}{2} \\left\\{1+\\ln(2\\pi\\sigma^2)\\right\\}\n", "$$\n", "\n", "从这个表达式我们可以看出,与离散熵不同,连续熵可以为负值:$\\sigma^2 < \\frac{1}{2\\pi e}, H(x) < 0$,原因是概率密度函数 p(x) 没有小于 1 的限制,因此 $-p(x)\\ln p(x)$ 不能保证非负。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 条件熵" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们有 $\\mathbf{x,y}$ 的联合分布,假设 $\\mathbf x$ 已知,那么 $\\mathbf y$ 的包含的信息量应当由 $-\\ln p(\\mathbf{y|x})$ 来衡量,因此我们可以定义其条件期望为给定 $\\mathbf x$ 下 $\\mathbf y$ 的条件熵(`conditional entropy`):\n", "\n", "$$\n", "H[\\mathbf{y|x}] = -\\iint p(\\mathbf{x,y}) \\ln(\\mathbf{y|x}) d\\mathbf xd\\mathbf y\n", "$$\n", "\n", "可以计算出,我们有:\n", "\n", "$$\n", "H[\\mathbf{x, y}] = H[\\mathbf{y|x}] + H[\\mathbf{x}]\n", "$$\n", "\n", "因此,用来描述 $\\mathbf{x,y}$ 的信息量等于描述 $\\mathbf x$ 的信息量加上给定 $\\mathbf x$ 的条件下描述 $\\mathbf y$ 所需的信息量。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.6.1 相对熵和互信息" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们有一个未知分布 $p(\\mathbf x)$,然后对它建模得到了一个概率分布 $q(\\mathbf x)$。\n", "\n", "用 $q(\\mathbf x)$ 的分布来传送 $x$ 所用的信息量($q(\\mathbf x)$),比用真实的分布 $p(\\mathbf x)$ 传送所用的信息量($p(\\mathbf x)$)要多一些,多出来的部分为:\n", "\n", "$$\n", "\\begin{align}\n", "KL(p\\|q) \n", "&= - \\int p(\\mathbf x)\\ln(q(\\mathbf x)) dx - \\left(-\\int p(\\mathbf x)\\ln p(\\mathbf x)\\right)\\\\\n", "&= - \\int p(\\mathbf x)\\ln \\left\\{\\frac{q(\\mathbf x)}{p(\\mathbf x)}\\right\\}\n", "\\end{align}\n", "$$\n", "\n", "这叫做相对熵(`relative entropy`)或者 `Kullback-Leibler` 散度(`Kullback-Leibler divergence`)。\n", "\n", "`K-L` 散度不是对称的,即 $KL(p\\|q) \\neq KL(q\\|p)$。\n", "\n", "`K-L` 散度满足 $KL(p\\|q) \\geq 0$,当且仅当 $p(\\mathbf x)=q(\\mathbf x)$ 时取等号。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 凸函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果一个函数的弦(`chord`)始终在函数上或者函数上方,那么这个函数就是凸函数。\n", "\n", "一个凸函数的例子如图所示:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAEDCAYAAACWDNcwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHbhJREFUeJzt3Xl8VNX9//HXYZFFFkFJxBWrIAmIbBYQlQBSXBCprbiC\nWxUSNQooWlEJrohgAREirlVRvli1VLRVYgkgiIgCCqioRcUFI6ABEtbk/P44yI8120zumTvzfj4e\neSB3xvBh5ubNnXPP+RxjrUVERIJRxXcBIiKJRKErIhIgha6ISIAUuiIiAVLoiogESKErIhKgaiU9\naIzRfDIRkQqw1pr9HS/1Stdaq68yfA0fPtx7DYn2pddcr3msfpVEwwsiIgFS6IqIBEihGyVpaWm+\nS0g4es2Dp9c8cqak8QdjjC1tfEJERPZkjMFW9EaaiIhEj0JXRCRACl0RkQApdEVEAqTQFREJkEJX\nRCRACl0RkQApdEVEAqTQFREJkEJXRCRACl0RkQApdEVEAqTQFREJkEJXRCRACl0RkQApdEVEAqTQ\nFREJkEJXRCRACl0RkQApdEVEAqTQFREJkEJXRCRACl0RCa/Vq6GoyHcV5aLQFZFw2rIFuneH2bN9\nV1IuCl0RCaf774dWraBbN9+VlIux1h74QWNsSY+LiHixbBl07QpLl8IRR/iuZh/GGKy1Zn+P6UpX\nRMKluBgGDIB77onJwC2NQldEwuXxx8FaF7whpOEFEQmP776D1q3dzbMWLXxXc0AaXhCR8LMWMjLg\nhhtiOnBLU813ASIiZfKPf8BXX8HLL/uuJCIaXhCR2Ld+PbRs6YL31FN9V1OqkoYXFLoiEvuuuAIO\nOQTGjfNdSZmUFLoaXhCR2PbmmzB3Lnz8se9KokKhKyKxa8MGGDgQnnkG6tTxXU1UaHhBRGLXwIGu\noc0TT/iupFw0vCAi4ZOT44YWPvnEdyVRpXm6IhJ7Nm6Ev/wFJk+G+vV9VxNVGl4QkdgzcCBs3w5P\nPeW7kgrR8IKIhEdODrzxhuskFofidnghLS2No48+2ncZADRp0oSrrrrKdxkisS8/H66+Gp58Mu6G\nFX4Tt6EL7hI/FhhjYqYWkZh2881wzjnQs6fvSiqNhhdEJDa8/rrrHrZ0qe9KKlVcX+kGwVrL1q1b\nfZchEm5r17r+uM88A3Xr+q6mUoUydAsKCrj77rtp3rw5tWrVIjk5mZ49ezJv3rx9nrt69Wr69OlD\nvXr1OPTQQ0lPT99vSP773/+mc+fO1KlTh/r163PWWWexcOHCPZ7z9ddfU6VKFe666y6effZZWrRo\nQc2aNZk6dSoAmzZtIiMjg0aNGlG3bl169uzJ559/Xjkvgki8+K0h+aWXQpcuvqupdKEbXti8eTNd\nu3Zl0aJFXHTRRWRmZrJ582bmz5/PnDlz6Ny5867nFhYW0r17d9LS0hg9ejTz58/n8ccfp1GjRtxz\nzz27njdt2jQuvvhiUlJSGDFiBFu3biU7O5suXbrwzjvvcOpeXY2mT5/OunXrdgVsSkoKABdccAE5\nOTn069ePTp06MX/+fHr06MGWLVuCeXFEwuj552HlSpgyxXclwbDWHvDLPRxb7r33XmuMsRMnTizx\neV26dLHGGDtu3Lg9jvfu3dsmJSXt+v327dtt48aN7THHHGPz8/N3Hf/uu+9s3bp1bfv27XcdW7Vq\nlTXG2Fq1atlvv/12j+87Y8YMa4yxt99++x7Hb7vtNmuMsVdddVW5/64ice+bb6xt1MjaxYt9VxJV\nO7Nzv7kauuGFadOm0aRJE9LT00t9btWqVRmw1z5KaWlp/PzzzxQUFACwaNEi1qxZw4ABA6hXr96u\n5x155JFceumlfPjhh6xZs2aP73HOOefsMx3t9ddfB+Dmm2/e4/iQIUPK/pcTSSRFRdC/Pwwe7Lbg\nSRChC90vvviCFmXcqiMpKYkaNWrscaxBgwYArF+/HnDjtMCuIYLd/XZs1apVexw/7rjj9nnu119/\nTb169UhOTt7jeKNGjTjkkEPKVK9IQhk92o3n3nqr70oCFbox3fLMd61ateoBH7MRLG+uVatWuZ4f\nyZ8lEpc++gjGjIEPPoASfk7jUeiudJs2bcqyZcuiFmRNmjQBYMWKFfs89umnnwL7v7Ld3/fZsGHD\nPkMReXl55OfnR16oSLwoLITLLoOxY+HYY31XE7jQhW7fvn355ptvmDx5clS+3ymnnELjxo2ZPHky\nGzdu3HX8hx9+YMqUKbRr147DDz+81O9z3nnnATB27Ng9jo8ZMyYqdYrEjcGDoW1bN0UsAYVueGHI\nkCG89tprpKenM3v2bDp37sy2bduYP38+bdu25a9//euu55blarhq1aqMHTuWiy++mI4dO3LVVVex\nbds2srOzKSoqYlwZ92Q699xz6d69O6NGjeLHH3+kQ4cOLFiwgNzcXA477DANMYgAvPoqzJwJixf7\nrsSb0F3p1qxZk9zcXG677TYWLlzI4MGDGTlyJAUFBaSlpe16Xkn9DvY+fuGFFzJjxgwaNGjA8OHD\nGTlyJKmpqeTm5u4zR7ckr732GgMGDOCNN95g6NCh5OXlkZOTw8EHH6zeCyKrV0N6upuPu9tMoUSj\nfroiUvmKiqB7d+jRA4YN811NpSupn27ornRFJITuv9/NUrj9dt+VeBe6MV0RCZk5c2DSJPjww4Sb\nHrY/utIVkcqzbh1cfjk8/TQccYTvamKCxnRFpHIUF0Pv3pCSAg8/7LuaQGlM14Pi4mIWLlyoqWKS\nuEaPhvXr4YEHfFcSUxS6laSwsJArr7ySXr167dO7QSTuzZvnlvlOnQrVq/uuJqYodCtJnTp1WLJk\nCaeddhqnnHIKDzzwANu2bfNdlkjl+/lnuOQSt336Mcf4ribmaEw3AKtWrSIzM5Mvv/ySSZMm7bGI\nQySuFBW5jSVbt4aHHvJdjTcljekqdANirWX69OlkZmbu2skiKSnJd1ki0TViBMyaBTk5UC1xZ6Tq\nRloMMMbQp08fVqxYQXJyMi1btiQ7O5vi4mLfpYlEx9tvw+TJbhw3gQO3NLrS9eSTTz4hPT2d7du3\nk52dTZs2bXyXJFJx33wDHTrA//1fQmwuWRpd6cagk046iTlz5jBgwADOOussbrrpJjZs2OC7LJHy\n27IF/vQnGDpUgVsGCl2PqlSpwtVXX83y5cspKCggNTWVadOmaW6vhIe1cP31cMIJMGiQ72pCQcML\nMWTevHkMHDiQI444gscee4wTTjjBd0kiJXv8cRg/Ht5/H+rU8V1NzNDwQkh07tyZjz76iB49etCx\nY0dGjBjBli1bfJclsn/z5sFdd8E//6nALQeFboypXr06t9xyC4sXL2bp0qW0atWKmTNn+i5LZE/f\nfw99+8Lf/w5Nm/quJlQ0vBDjZsyYwY033kjHjh155JFHaNy4se+SJNFt2QJpaa6ZzR13+K4mJml4\nIcR69erF8uXLOe6442jVqhUTJkygqKjId1mSqKyF665zy3t3249Qyk5XuiGyYsUKMjIy2LRpE9nZ\n2bRv3953SZJoRo+GF1+EuXPh4IN9VxOz4utKd9063xV4k5qayqxZs8jMzOS8887j+uuv59dff/Vd\nliSI/Jffxo55xN04U+BWWLhC98cfoUULWL7cdyXeGGPo378/y5cvp6ioiNTUVKZMmaK5vRJV+fmu\nhcLo0a5hWNNjt3Jk387kPTFdncMiFL7hheefh+HD3bzARo18V+PdggULSE9Pp2HDhkycOJETTzzR\nd0kSMvn58NFHbguzDz+ERYvc9c3JJ0O7dtC+2QbaPfhnmj94BVX7X+a73FCIvy5jw4bB7NnwzjtQ\no4bvarzbsWMHEyZM4L777iM9PZ077riDWrVq+S5LYlCpAdve/dq8+c49JLdudVunp6XBfff5Lj80\n4i90i4vdHMGaNd2Vr9nv3y3hfP/99wwaNIgPP/yQCRMmcPbZZ/suSTwqd8DuzVq3qeS2ba6RTZVw\njUb6FH+hC1BYCF27wllnuR6esst//vMfbrjhBlq3bs3YsWM56qijfJcklSzigN2fu+927RpnzYKd\nn5yycrPISsuqtL9HvIjP0AX46Sfo1MmN8V5xhe9qYsrmzZsZOXIkjz32GMOGDePGG2+kmnqcxoVK\nCdi9/f3v7mJmwQLYrdm+GWGww2M4E2JE/IYuwKefuvGmF190Y0+yh5UrV5KRkcHatWuZNGkSnTp1\n8l2SlEMgAbu3mTPdsMKsWZCausdDCt2yie/QBXdT7cIL3RYhrVr5ribmWGuZOnUqQ4YMoVevXowc\nOZKGDRv6Lkv24iVg97Z0KfToAf/4B5xxxj4PK3TLJv5DF9xA/y23uM5Hmke4X/n5+dx55528/PLL\nPPTQQ/Tv3x+jm5BexETA7u3bb6FzZ7d1et+++32KQrdsEiN0AR55xG37PHcu6ErugBYtWsTAgQM5\n+OCDmThxIi1atPBdUlyLyYDd27p1cPrp8Je/wODBB3yaQrdsEid0wV3tvveeG5eqXdt3NTGrqKiI\n7OxssrKyuOaaa7jrrrs4WEs7IxaKgN1bYaG7H3L66TBqVIlP1eyFskms0C0uhiuvhF9+gVdfherV\nfVcU09asWcOQIUOYN28e48ePp3fv3r5LCo1QBuzetm+HCy6ABg3g2Wc1FzdKEit0wZ1I55/vlgk/\n84xOpDJ45513yMjIICUlhfHjx3OMxsX3EBcBu7ffLlDWroXp03WBEkWJF7rgPjL94Q/up+Fvf9Oq\ntTLYunUro0aNYty4cQwdOpRBgwZRPQF/EOMyYPdmLdx0EyxeDG+9paG4KEvM0AX49Vc3h/ePf3QL\nKKRMvvrqK2644QZWr17NpEmTOP300/d4PCsri5SUFC666CJPFUZPQgTs/owY4Vo0zpoFhxziu5q4\nk7ihC27V2umnQ3q6toguB2str7zyCoMGDeLMM89k1KhRNNrZ1W3RokW7drQ49NBDPVdadgkbsHsb\nMwYmT4Y5cyA52Xc1cSmxQxdg9Wo30fv222HAAN/VhMrGjRsZPnw4L7zwAg888ABXX301VapUITMz\nk8LCQp588knfJe6XAvYAsrPdDIU5c6ACPTk0e6FsFLoAX33lhhruvx/69/ddTegsWbKE9PR0jDFk\nZ2fTpEkTUlNTeemll/YZfgiaAraMnnvOtUXNzYXjj6/Qt9A83bJR6P7ms8/cfMSHH4ZLL/VdTegU\nFxfz5JNPcuedd9KvXz9at27Ngw8+yJIlSzjooIMCqWH3gF20yP2qgC2DF1+EW291PaibN6/wt1Ho\nlo1Cd3fLl8OZZ8LYsRAHN4KCsmPHjl1dyvLy8hg6dCg5OTkkJydzwQUXMGzYsKj/mSUFbPv2ML6h\nYdmfrQK2NNOmuZkKOTluu6sIKHTLRqG7t48/dtPJHn3UNcqREq1cuZKTTz6ZmjVrkpSURFJSEsnJ\nyezYsYPc3Fw2btzI+++/H9HuxKUFbLt2+17BKgDKYNo0yMx0fXGj0AxKr3nZlBS6idlgtVUrNzex\nZ083QVxXvCVq1qwZhYWF/PLLL+Tl5ZGXl8dPP/1EXl4eLVu2JDc3l59++qnM36+0gD3nHLjrLg0R\nRGzqVDdjJ0qBK9GRmKEL7if87bdd8BYVaYy3FMYYGjZsSMOGDWlejjFBBawnL77o+pDMnAktW0bt\n2w7vovnukUrM4YXdLVvmhhoeeMAtiZQKq8gQQST0UfcAnn7a/Uv29tsRj+FKxWh4oSQtW7pVOWee\n6ZYOZ2T4rigUdAUboyZOhJEj3TndrJnvamQ/FLoAJ57odp/o3t0F7y23+K4opihgQ+Lhh2HSJHcu\nH3ec72rkABS6v/nd71zz8x49XFvI++5LyCY5CtgQshbuuMN1Cps7F4480ndFUgKF7u6OOsotjzz7\nbFi/HiZMiOtkUcDGgaIiuOEG+OADd+4edpjviqQUupG2Pxs2/P9+vM8/DzVq+K4oYkHf5ApCwt9I\n27rV7dq7bp3rGFavXqX/keq9UDZaHFERW7a4E3r9enjtNahf33dFZRaPAbs/CR26GzZAnz5w6KHw\nwguBXRgk9GteDpq9UBE1a7odhjMzoUsXePNNOOII31XtQ0MECej77+Hcc+HUU92qSr2xoaLQLUnV\nqm5c96GHoFMneOONqE40Ly8FrLBsmQvcjAwYOjQhb/aGnUK3NMa4PrzHHAPdusFLL7mpZZVMASv7\nyMmByy5z209pBWVoKXTL6tJL3VScvn3ddLJrr43at1bASqkmT4a773YNbLp08V2NREChWx5dusC7\n77qPd5995jrwlzMFFbBSLkVFbhhhxgw3B7dpU6/lqPdC5DR7oSLWr3ctIWvUcI1FDrCxX6LMIvAl\n7u+k//orXHIJbNsGL78MDRv6rkjKSFPGKsP27TB4sOvi9K9/kZ/cTAEbsLgO3ZUroXdv14xpzBio\nXt13RVIOCt3K9MQT/D6jHSuqteLkttUUsAGK29B9/XW45hq3n18U7x1IcDRPtzJdey3TDlvM0Tce\nT9Uzr4Thw6FKFd9VSRgVF8M998BTT8G//gUdO/quSCqB0iEKmvyxDVUXve/a6Z17Lqxd67skCZu1\na91d1P/+1/VRUODGLYVulGR9lu12Wj3pJDe28N57vkuSsFiwwJ0zJ5/sQvfww31XdEBZuVm+Swg9\nhW6UjJg9wt3sGDXKLc08/3zX37S42HdpEquKi9350rs3jB/vVj5Wi+0RvxGzR/guIfQUupWhd2/3\nEfGf/3QfGcuxaaMkiJ9+cufG9OnuXDn/fN8VSUAUupXl2GNdB//27aFNG9e3QQTcudC6tRtSyM11\n54okjNj+LBN21aq5JcN/+AP07+9+2EaPhtq1fVcmPhQUuNVlb7zhOtidcYbvisQDXekG4YwzYOlS\n2LjRXeHMn++7IgnavHnuRtnGjbBkiQI3gSl0o6TUNen167tdKB56CP70J3fFs3lzMMWJP4WFcOut\n8Oc/uxurzz13wGXjYaDeC5HTijQffv7Z7Wu1eDE88YS6RlVQzK9ImzPHrSxr187NaGnUyHdFEpCS\nVqTpSteHRo3cmN6oUa4/6oABbgdiiQ+//ALXXeea1YweDVOnKnBlF4WuT336uJ0AqlWD1FSYMsVt\npy3hZK17D1NT3ZztFSs0FUz2oeGFWPH++zBwoBv7ffRRt7JNShRTwwuffOKGjDZsgEmTtIw3wWl4\nIQw6dHA9IS+6yG0HlJnpttaW2LZuHdx4o3vPLr7YvYcKXCmBQjdKorImvWpVSE93H0uLiiAlBcaO\ndU2sJbZs3erem5QUN6ywYoV77+K8l6d6L0ROoRslUV2Tfthh8NhjbrXSzJnuB/vFF9XHIRYUF7v3\nIiUF3n7bvUcTJrj3LAGo90LkFLqxLDXVrV566ikYN85NPXr9dd1s88Fa1+O2bVv3Xjz9NLz5pnuP\nRMpBoRsGaWmu/d/w4XDnnfD737swVvhWPmvdppAdOrgdQ7Oy3HuRlua7MgkphW5YGOOmmC1e7Faz\n3XGHa6QzbZob/5XoKipym0G2bQvDhrlVZYsXu/fA7PemtEiZKHTDpkoVtxPxkiWumc7YsXDiiW5c\nsaDAd3XhV1DgxtObNYO//Q3uvde91hdeqG2YJCp0FkVJ4GvSjYFevVzznOeeczsOHHusuyJbtSrY\nWuLBqlVwyy3uNczJcX0y5s93r7GubHdR74XIaXFEPPnf/2DiRHj2WTcGee21bs+2ON2+O+LFEdu3\nu7HxyZNh4UK48krIyIDf/S5qNUpi0hbsiaaw0I31PvmkC+LLL4d+/eJulVuFQ/eTT9yngylTXMBe\nd53rAqY+xxIlWpGWaGrXdldt777rNsusVs1d8bZuDQ8+CF995bvC4P3vf+7vfvLJbpuc6tXdkMy7\n77oG8wpcCYiudBNFcTHMneuugF95xe04e/757qtNm1COW5Z4pWutuwE2fbrbq+7HH93V7EUXwWmn\n6aaYVCoNL8ieiorcTaLp093Xpk3Qs6fbVqhrV2jc2HeFZbJP6P7wA8ya5VbxvfUW1K3rNgnt0wc6\ndYr7JboSOxS6AcjKzSIrLct3GRXz5ZcupGbOdI23k5Lc1eCpp7qwatYs9gKrqAhzXzXs0U+5rXDm\nzYO8PLdooVs3OPtsOP5431XGnVCf5wFS6AYgptoMRqK4GD7+2IXYe++5r7w8NxbaujW0aOGWvp54\nIiQnV/6whLWwZg188QUsX+76Dy9dCkuXYm7ZhP38Eujc2X2ddFLs/eMQZ+LmPK9kCt0AxPXJ+Msv\nbjXWxx+74Fu+3F0dFxa6u/9HHw1HHeWGJZKS3C4J9etDvXpQpw7UqAEHHeQC0VoX7Nu2wZYtbjFC\nfr77WrvWBfyaNbB6NXz7rZs/W7s2NG3qwr5lSxeubdpgxjeM39c8RsX1eR5FJYWutmCX0jVo4D6y\nd+u25/H8fDcr4LvvXEiuWeOmY/38s2vmvWGD2/1261b39ds/4MZAzZoujGvXdgFdv74L66QkaNXK\nzbY45hi3WCHEGzmK7E2hKxVXv76b+dCmje9KREJD82ZERAKk0I0SrUmXRKDzPHK6kSahpZs6Equ0\nDFhEJEYodEVEAqTQFREJkEJXRCRACt0oycrN8l2CSKXTeR45hW6UjJg9wncJIpVO53nkFLoiIgFS\n6IqIBEihKyISIIWuiEiAFLpRojXpkgh0nkdOvRcktNR7QWKVei+IiMQIha6ISIAUuiIiAVLoiogE\nSKEbJVqTLolA53nkFLpRojXpkgh0nkdOoSsiEiCFrohIgBS6IiIBUuiKiARIoRslWpMuiUDneeTU\ne0FCS70XJFap94KISIxQ6IqIBEihKyISIIWuiEiAFLpRojXpkgh0nkdOoRslWpMuiUDneeQUuiIi\nAVLoiogESKErIhIgha6ISIAUulGiNemSCHSeR069FyS01HtBYpV6L4iIxAiFrohIgBS6IiIBUuiK\niARIoRslWpMuiUDneeQUulGiNemSCHSeR06hKyISIIWuiEiAFLoiIgFS6IqIBEihGyVaky6JQOd5\n5NR7QUJLvRckVqn3gohIjFDoiogESKErIhIgha6ISIAUulGiNemSCHSeR06hGyVaky6JQOd55BS6\nIiIBUuiKiARIoSsiEiCFrohIgBS6UaI16ZIIdJ5HTr0XJLTUe0FilXoviIjECIWuiEiAFLoiIgFS\n6IqIBEihGyVaky6JQOd55BS6UaI16ZIIdJ5HTqEbLat8F5CA9JoHT695xBS60fK17wIS0Ne+C0hA\nX/suIPwUuiIiAVLoiogEqNRlwAHWIiISNw60DLjE0BURkejS8IKISIAUuiIiAVLoiogESKErIiUy\nxhxkjOlvjFlkjDnWdz1hV813ASIS8y4AbgJOAn7xXEvoKXQlphljbgJqA22Bu3EBYIAG1tohPmtL\nFNbaqcaYw4B+1toNvusJOw0vSMwyxmQC/7HWPgi8DeQCzwA1gMs9lpaIugH/9V1EPFDoSiyrZq39\nfOd/HwUsttb+ADwOnO6vrMRijDHAGSh0o0LDCxEwxiQB9wA/AAcDvwKH6WNvdFhrH9ntt2cAb+08\n/p2fihJWa6Ae0NwY0xLoBEy21ub4LSucFLoVZIxpCMwGhltrpxlj6gLf4cYdJYqMMTWADsAdB3h8\nANDcWjso0MISRzegAPivtXa5MWYeMN0Yc6S1tthzbaGj4YWKGwX8aK2dtvP3m4AdwLv+Soofxpjq\nxphuO3/bEXfz7IOdjzUyxuwa07XWPg6cpelMlaYr8Ki1dvnO328CkoET/ZUUXgrdCjDGHAr0A17e\n7fBJuBs8i70UFX+uA940xtQCegNrrbU7dj52PTBjr+c/D1wZXHmJwRhTFTgNmLvb4ZSdvxYEX1H4\naXihYjoC1YFZux07A1igj1tRMxt4FTek8CpQYIwZA2wGpllrf93r+c/j3g/tJxNdTYG6wILdjqUB\n31hrv/VSUcgpdCumJlBkrf1st2NnAPOMMQcB11lrJ/gpLT5Ya5cBl+52aF4p/0sSUGSM6WqtnVXK\nc6Xs6gB51tqNAMaYasAfccNrUgEaXqiYBcB2Y0wdAGNMb6A78DnupsMSj7UlHGPMmUBn3E3MqzyX\nE29WADt23swEGI47z3VRUUHqp1tBxph+uI9Z3wIrga3AJcCX1tq/eiwtoRhj+gKnWGtvNcbUBL4E\nUn67MpPIGWPOB87B3UCzwJ3W2i1+qwovha6EljGmAzDAWnv1bsceAlZreEdilUJXRCRAGtMVEQmQ\nQldEJEAKXRGRACl0RUQCpNAVEQmQQldEJEAKXRGRACl0RUQCpNAVEQnQ/wMvDyGhLaSDiAAAAABJ\nRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import scipy as sp\n", "\n", "%matplotlib inline\n", "\n", "fig, ax = plt.subplots()\n", "\n", "xx = np.linspace(-3, 4, 100)\n", "yy = xx ** 2\n", "\n", "ax.plot(xx, yy, 'r')\n", "ax.set_xlim(-4, 5)\n", "ax.set_yticks([])\n", "ax.set_ylim(-10, 20)\n", "\n", "ax.set_xticks([-2, 0.5, 3])\n", "ax.set_xticklabels([r\"$a$\", r\"$x_\\lambda$\", r\"$b$\"], fontsize=\"xx-large\")\n", "\n", "for i in [-2, 3]:\n", " ax.plot([i, i], [-10, i ** 2], 'g--')\n", "\n", "ax.plot([-2, 3], [4, 9], 'b') \n", "ax.plot([0.5, 0.5], [-10, 6.5], 'g')\n", "ax.annotate(\n", " \"chord\",\n", " fontsize=\"xx-large\",\n", " xytext=(-3, 12),\n", " xy=(-0.2, 6),\n", " arrowprops=dict(arrowstyle=\"->\",\n", " connectionstyle=\"arc3\")\n", " )\n", "\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在 $x=a$ 和 $x=b$ 的区间内的任意点 $x$ 都可以写成 $\\lambda a +(1-\\lambda)b, 0\\leq\\lambda \\leq 1$,对应的弦上的值为 $\\lambda f(a) +(1-\\lambda) f(b)$,函数值为 $f(\\lambda a +(1-\\lambda)b)$。凸性等价于\n", "\n", "$$\n", "f(\\lambda a +(1-\\lambda)b) \\leq \\lambda f(a) +(1-\\lambda) f(b)\n", "$$\n", "\n", "这也等价于函数的二阶导数非负(如果二阶导数存在)。\n", "\n", "证明如下:\n", "\n", "- 必要性\n", "\n", "假设凸性的式子成立,令 $\\lambda = \\frac 1 2, b = a + 2\\epsilon$,则有:\n", "\n", "$$\n", "\\begin{align}\n", "\\frac 1 2f(a)+\\frac 1 2f(b) & \\geq f\\left(\\frac 1 2a + \\frac 1 2b\\right) \\\\\n", "& = \\frac 1 2 f\\left((\\frac 1 2a + \\frac 1 2b\\right) + \\frac 1 2 f\\left((\\frac 1 2a + \\frac 1 2b\\right) \\\\\n", "& = \\frac 1 2 f(a+\\epsilon) + \\frac{1}{2} f(b-\\epsilon)\n", "\\end{align}\n", "$$\n", "\n", "从而\n", "\n", "$$\n", "f(b) - f(b-\\epsilon) \\geq f(a+\\epsilon) - f(a)\n", "$$\n", "\n", "取 $\\epsilon\\to 0$,我们有 \n", "\n", "$$\n", "f'(b) \\geq f'(a)\n", "$$\n", "\n", "从而\n", "\n", "$$\n", "f''(x) \\geq 0\n", "$$\n", "\n", "- 充分性\n", "\n", "假设二阶导数非负:$f''(x) \\geq 0$,那么由泰勒展开,存在一个 $x^\\star$ 使得\n", "\n", "$$\n", "f(x) = f(x_0) + f'(x_0)(x-x_0) + \\frac 1 2 f''(x^\\star) (x-x_0)^2 \\geq f(x_0) + f'(x_0)(x-x_0)\n", "$$\n", "\n", "令 $x_0 = \\lambda a +(1-\\lambda b), x = a$,则有\n", "\n", "$$\n", "f(a) \\geq f(x_0) + f'(x_0)((1-\\lambda)(a-b))\n", "$$\n", "\n", "令 $x_0 = \\lambda a +(1-\\lambda b), x = b$,则有\n", "\n", "$$\n", "f(b) \\geq f(x_0) + f'(x_0)(\\lambda(a-b))\n", "$$\n", "\n", "合并两个不等式,去掉 $f'(x_0)$ 的项:\n", "\n", "$$\n", "\\lambda f(a) +(1-\\lambda) \\geq f(b) f(\\lambda a +(1-\\lambda)b)\n", "$$\n", "\n", "凸函数的例子:\n", "\n", "- $x\\ln x$\n", "- $x^2$\n", "\n", "如果等式只在 $\\lambda = 1$ 和 $\\lambda = 0$ 时成立,则称 $f(x)$ 是严格凸函数。\n", "\n", "如果 $f(x)$ 是凸函数,那么 $-f(x)$ 是凹函数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Jensen 不等式 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用数学归纳法不难证明,对于任意凸函数 $f(x)$,`Jensen` 不等式成立:\n", "\n", "$$\n", "f\\left(\\sum_{i=1}^M \\lambda _i x_i\\right) \\leq \\sum_{i=1}^M\\lambda_i f(x_i)\n", "$$\n", "\n", "其中 $\\lambda_i \\geq 0, \\sum_i \\lambda_i = 1$。等式成立的条件为 $f$ 是线性,或者所有的 $x_i$ 都相等。\n", "\n", "证明如下:\n", "\n", "$M = 1$ 时显然成立;假设对于某个 $M$ 不等式成立,那么对于 $M+1$ 来说:\n", "\n", "$$\n", "f\\left(\\sum_{i=1}^{M+1} \\lambda_i x_i\\right) = f\\left(\\lambda_{M+1} x_{M+1} + \\sum_{i=1}^M \\lambda _i x_i\\right) = f\\left(\\lambda_{M+1} x_{M+1} + (1-\\lambda_{M+1})\\sum_{i=1}^M \\eta_i x_i\\right) \n", "$$\n", "\n", "其中 \n", "\n", "$$\n", "\\eta_i = \\frac{\\lambda_i}{1-\\lambda_{M+1}}\n", "$$\n", "\n", "考虑凸函数性质,我们有\n", "\n", "$$\n", "f\\left(\\sum_{i=1}^{M+1} \\lambda_i x_i\\right) = f\\left(\\lambda_{M+1} x_{M+1} + (1-\\lambda_{M+1})\\sum_{i=1}^M \\eta_i x_i\\right) \\leq \\lambda_{M+1} f(x_{M+1}) + (1-\\lambda_{M+1}) f\\left(\\sum_{i=1}^M \\eta_i x_i\\right)\n", "$$\n", "\n", "再应用假设条件,我们有\n", "$$\n", "f\\left(\\sum_{i=1}^{M+1} \\lambda_i x_i\\right) \\leq \\lambda_{M+1} f(x_{M+1}) + (1-\\lambda_{M+1}) f\\left(\\sum_{i=1}^M \\eta_i x_i\\right) \\leq \\lambda_{M+1} f(x_{M+1}) + \\sum_{i=1}^M (1-\\lambda_{M+1})\\eta_i f\\left(x_i\\right) = \\sum_{i=1}^{M+1} \\lambda_i f(x_i)\n", "$$\n", "\n", "如果我们认为 $\\lambda_i$ 是某个离散分布取 $x_i$ 的概率,那 `Jenson` 不等式可以表示为\n", "\n", "$$\n", "f\\left(\\mathbb E[\\mathbf x]\\right) \\leq \\mathbb E[f(\\mathbf x)]\n", "$$\n", "\n", "不等式成立的条件为 $x$ 为常数,或者 $f$ 是线性的。\n", "\n", "更一般的,我们有:\n", "\n", "$$\n", "f\\left(\\mathbb E[y(\\mathbf x)]\\right) \\leq \\mathbb E[f(y(\\mathbf x))]\n", "$$\n", "\n", "对于连续形式,我们有\n", "\n", "$$\n", "f\\left(\\int y(\\mathbf x)p(\\mathbf x)d\\mathbf x\\right) \\leq \\int f(y(\\mathbf x))p(\\mathbf x)d\\mathbf x\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### 应用 Jensen 不等式" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "将 `Jensen` 不等式应用到 `K-L` 散度($-\\ln x$ 是凸函数)上:\n", "\n", "$$\n", "KL(p||q)=-\\int p(\\mathbf x)\\ln\\left\\{\\frac{q(\\mathbf x)}{p(\\mathbf x)}\\right\\} d\\mathbf x\n", "\\geq -\\ln \\int \\left\\{p(\\mathbf x) \\frac{q(\\mathbf x)}{p(\\mathbf x)}\\right\\} \n", "= -\\ln \\int q(\\mathbf x)d\\mathbf x = 0\n", "$$\n", "\n", "因为 $-\\ln x$ 是严格凸的,因此等式只能在 $\\frac{q(\\mathbf x)}{p(\\mathbf x)}$ 为常数时成立,考虑到两者都是概率分布(积分都为 1),所以只能有 $p(\\mathbf x) = q(\\mathbf x)$ 成立。\n", "\n", "因此,我们可以将 `K-L` 散度来衡量两个分布的差异。\n", "\n", "另一方面,我们看到,当我们使用真实的数据分布传送数据,所用信息量的期望是最小的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 最小化相对熵" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们有一组从未知分布 $p(\\mathbf x)$ 中产生的数据,我们的模型为 $q(\\mathbf{x|\\theta})$,一个决定参数 $\\mathbf \\theta$ 的方法就是最小化 $p(\\mathbf x)$ 和 $q(\\mathbf{x|\\theta})$ 的 `K-L` 散度。\n", "\n", "然而我们并不知道 $p(x)$ 的真实分布,一个近似的方法是使用已观测的数据进行模拟:\n", "\n", "$$\n", "KL(p|q) \\approx \\sum_{n=1}^N (-\\ln q(\\mathbf{x}_n | \\mathbf{\\theta}) + \\ln p(\\mathbf x_n))\n", "$$\n", "\n", "由于右边的部分与 $\\mathbf\\theta$ 无关,因此,最小化 `K-L` 散度就相当于最大化似然函数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 互信息" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "考虑两个变量 $\\mathbf{x,y}$ 的分布,如果它们独立,则 $p(\\mathbf{x,y}) = p(\\mathbf{x})p(\\mathbf{y})$,如果他们不独立,则我们用 $p(\\mathbf{x,y})$ 和 $p(\\mathbf{x})p(\\mathbf{y})$ 的 `K-L` 散度来衡量它们接近独立的程度:\n", "\n", "$$\n", "I[\\mathbf{x,y}] \\equiv KL(p(\\mathbf{x,y})||p(\\mathbf{x})p(\\mathbf{y}))\n", "= - \\iint p(\\mathbf{x,y}) \\ln \\frac{p(\\mathbf{x})p(\\mathbf{y})}{p(\\mathbf{x,y})} d\\mathbf xd\\mathbf y\n", "$$\n", "\n", "这叫做 $\\mathbf x$ 和 $\\mathbf y$ 的互信息(`mutual information`)。从 `K-L` 散度的性质中我们可以看出,$I[\\mathbf{x,y}] \\geq 0$,当且仅当 $\\mathbf{x,y}$ 是独立的时候等号成立。\n", "\n", "另外,我们从定义式中可以得到:\n", "\n", "$$\n", "I[\\mathbf{x,y}] = H[\\mathbf x] - H[\\mathbf{x|y}] = H[\\mathbf y] - H[\\mathbf{y|x}]\n", "$$\n", "\n", "所以我们可以将互信息看成是给定 $\\mathbf y$ 后 $\\mathbf x$ 信息量的减少值。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }