{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 附录 D 变分法" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "函数 $y(x)$ 可以看成一种操作符,即对于任意 $x$,返回一个输出 $y$。在这种情况下,我们可以定义泛函 $F[y]$:接受一个函数 $y(x)$,返回输出值 $F$。\n", "\n", "机器学习中一个常见的例子熵 $H[x]$,因为对于任何一个概率分布 $p(x)$,它返回了 $x$ 在该分布下的熵,因此 $p(x)$ 的熵也可以写作 $H[p]$。\n", "\n", "微积分中一个常见的问题是找到一个 $x$ 最大(小)化一个函数 $y(x)$,而变分法要解决的问题是找到一个 $y(x)$ 最大(小)化一个泛函 $F[y]$。\n", "\n", "例如变分法可以解释为什么两点之间直线最短,或者最大熵分布是高斯分布。\n", "\n", "传统的微积分给出:\n", "\n", "$$\n", "y(x+\\epsilon) = y(x) + \\frac{dy}{dx} \\epsilon + O(\\epsilon^2)\n", "$$\n", "\n", "以及\n", "\n", "$$\n", "y(x_1+\\epsilon_1, \\dots, x_D+\\epsilon_D) = y(x_1, \\dots, x_D) + \\sum_{i=1}^D \\frac{\\partial y}{\\partial x} \\epsilon_i + O(\\epsilon^2)\n", "$$\n", "\n", "类似的,泛函求导时,我们考虑一个 $y(x)$ 的小变化 $\\epsilon\\eta(x)$,($\\eta(x)$ 是关于 $x$ 的任意函数),记泛函 $F[y]$ 对函数 $y(x)$ 的导数为 $\\frac{\\delta F[y]}{\\delta y(x)}$,定义:\n", "\n", "$$\n", "F[y(x)+\\epsilon\\eta(x)] = F[y(x)] + \\epsilon \\int \\frac{\\delta F[y]}{\\delta y(x)} \\eta(x) dx + O(\\epsilon^2)\n", "$$\n", "\n", "泛函达到驻点的条件为,对任意 $\\eta(x)$,有:\n", "\n", "$$\n", "\\int \\frac{\\delta F[y]}{\\delta y(x)} \\eta(x) dx = 0\n", "$$\n", "\n", "要对任意 $\\eta(x)$ 成立,$\\frac{\\delta F[y]}{\\delta y(x)}$ 必须为 0。\n", "\n", "考虑这样一个泛函:\n", "\n", "$$\n", "F[y] = \\int G(y(x), y'(x), x) dx\n", "$$\n", "\n", "其中,$y'(x)$ 是 $y(x)$ 关于 $x$ 的导数,**$y(x)$ 在积分区间的边界的值是固定的**。考虑 $y(x)$ 的变分,展开之后有:\n", "\n", "$$\n", "\\begin{align}\n", "F[y(x)+\\epsilon\\eta(x)] \n", "& = \\int G(y+\\epsilon\\eta, y'+\\epsilon\\eta', x) dx \\\\\n", "& = \\int G(y, y', x) dx + \\epsilon \\int \n", "\\left\\{ \\frac{\\partial G}{\\partial y} \\eta(x) + \n", "\\frac{\\partial G}{\\partial y'} \\eta'(x)\\right\\} dx + O(\\epsilon^2) \\\\\n", "& = F[y(x)] + \\epsilon \\int \n", "\\left\\{ \\frac{\\partial G}{\\partial y} \\eta(x) + \n", "\\frac{\\partial G}{\\partial y'} \\eta'(x)\\right\\} dx + O(\\epsilon^2) \\\\\n", "\\end{align}\n", "$$\n", "\n", "为了得到定义的形式,对积分的第二部分进行分部积分:\n", "\n", "$$\n", "\\int \\frac{\\partial G}{\\partial y'} \\eta'(x) dx = \\left.\\frac{\\partial G}{\\partial y'} \\eta(x) \\right|_x - \\int \\frac{d}{dx} \\left(\\frac{\\partial G}{\\partial y'}\\right) \\eta(x) dx\n", "$$\n", "\n", "考虑到 $y(x)$ 在边界上的值是固定的,因此在边界上 $\\eta(x)$ 必须为 0,于是我们有\n", "\n", "$$\n", "F[y(x)+\\epsilon\\eta(x)] \n", "= F[y(x)] + \\epsilon \\int \n", "\\left\\{ \\frac{\\partial G}{\\partial y} - \n", "\\frac{d}{dx} \\left(\\frac{\\partial G}{\\partial y'}\\right) \\right\\}\\eta(x) dx + O(\\epsilon^2) \\\\\n", "$$\n", "\n", "令变分为 0,我们有\n", "\n", "$$\n", "\\frac{\\partial G}{\\partial y} - \n", "\\frac{d}{dx} \\left(\\frac{\\partial G}{\\partial y'}\\right) = 0\n", "$$\n", "\n", "这就是著名的欧拉-拉格朗日方程(`Euler-Lagrange equation`)。\n", "\n", "例如 $G=y(x)^2+\\left(y'(x)\\right)^2$,欧拉-拉格朗日方程为:\n", "\n", "$$\n", "y(x) - \\frac{d^2y}{dx^2} = 0\n", "$$\n", "\n", "利用边界条件,我们可以解出这个方程。\n", "\n", "在很多问题中,如果我们的被积函数为 $G(y,x)$,那么驻点的要求为\n", "\n", "$$\n", "\\frac{\\partial G}{\\partial y(x)}= 0\n", "$$" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }