{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 8 范数 和 迹\n", "## 8.1 向量范数\n", "\n", "### 8.1.1 $L^p$ 范数 \n", "通常我们用范数(`norm`)来衡量向量,向量的 $L_p$ 范数定义为:\n", "\n", "$$\n", "\\|\\mathbf x\\|_{p} = \\left(\\sum_{i} |x_i|^p \\right)^{\\frac{1}{p}}, p \\in \\mathbb R, p \\geq 1\n", "$$\n", "\n", "除了 $L_p$ 范数,范数还可以有很多定义方式,事实上,任何将向量映射为实数的函数 $f(\\mathbf x)$ 只要满足以下条件,都是合法的范数:\n", "\n", "- 非负性:$f(\\mathbf x) \\geq 0$,且 $f(\\mathbf x) = 0 \\Rightarrow \\mathbf{x = 0}$\n", "- 三角不等式:$f(\\mathbf{x+y}) \\leq f(\\mathbf x) + f(\\mathbf y)$\n", "- 数乘:$\\forall \\alpha \\in \\mathbb R, f(\\alpha\\mathbf x) = |\\alpha|f(\\mathbf x)$ \n", "\n", "### 8.1.2 $L^2$ 范数 \n", "$L^2$ 范数,即通常所说的欧几里得范数(`Euclidean norm`),是向量 $\\mathbf x$ 到坐标原点的欧几里得距离。因为它用的太广泛,所以我们通常省略其下标 2,将其记为 $\\|\\mathbf x\\|$。\n", "\n", "有时候也用 $L^2$ 范数的平方来衡量向量:$\\bf x^\\top x$。事实上,$L^2$ 范数的平方在计算上更为便利,例如它的对 $\\mathbf x$ 梯度的各个分量只依赖于 $\\mathbf x$ 的对应的各个分量,而 $L^2$ 范数对 $\\mathbf x$ 梯度的各个分量要依赖于整个 $\\mathbf x$ 向量。 \n", "\n", "### 8.1.3 其他范数 \n", "$L^2$ 范数并不一定适用于所有的情况,它在原点附近的增长就十分缓慢,因此不适用于需要区别 0 和非常小但是非 0 值的情况。在这种情况下,$L_1$ 范数就是一个比较好的选择,它在所有方向上的增长速率都是一样的,定义为:\n", "\n", "$$\\|\\mathbf x\\|_1 = \\sum_{i} |x_i|$$\n", "\n", "它经常使用在需要区分 0 和非零元素的情形中。\n", "\n", "有时候我们需要衡量向量中非零元素的个数,有些人将其称为“$L^0$ 范数”,但是这并不严谨,因为**它并不是一个范数**(不满足三角不等式和数乘)。$L^1$ 范数可以作为它的一个替代。\n", "\n", "另一个常使用的范数是 $L^{\\infty}$ 范数,它在数学上是向量元素绝对值的最大值,因此也被叫做 `max norm`:\n", "\n", "$$\n", "\\|\\mathbf x\\|_{\\infty} = \\max_i |x_i|\n", "$$\n", "\n", "### 8.1.4内积的范数形式 \n", "内积可以写成范数的形式,其中 $\\theta$ 是两个向量的夹角:\n", "\n", "$$\n", "\\mathbf{x^\\top y} = \\mathbf{\\|x\\|\\|y\\|} \\cos \\theta\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8.2 矩阵范数\n", "有时候我们想衡量一个矩阵,机器学习中通常使用的是 `F` 范数(`Frobenius norm`),其定义为:\n", "\n", "$$\n", "\\|\\mathbf A\\|_F = \\sqrt{\\sum_{i,j} A_{i,j}^2}\n", "$$\n", "\n", "或者叫做矩阵的 $L^2$ 范数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8.3 矩阵迹 \n", "矩阵的迹定义为所有对角元的和:\n", "\n", "$$\n", "\\mathrm{Tr}(\\mathbf A)=\\sum_{i} A_{i,i}\n", "$$\n", "\n", "矩阵的 F 范数可以用迹来表示:\n", "\n", "$$\n", "\\|\\mathbf A\\|_F = \\sqrt{\\mathrm{Tr}(\\mathbf{AA^\\top})}\n", "$$\n", "\n", "根据定义,矩阵的迹在转置操作下保持不变:\n", "\n", "$$\n", "\\mathrm{Tr}(\\mathbf A) = \\mathrm{Tr}(\\mathbf A^\\top)\n", "$$\n", "\n", "只要定义合法,乘法的迹满足如下的循环性质:\n", "\n", "$$\n", "\\mathrm{Tr}(\\mathbf{ABC}) = \\mathrm{Tr}(\\mathbf{CAB}) = \\mathrm{Tr}(\\mathbf{BCA})\n", "$$\n", "\n", "或者更一般的有:\n", "\n", "$$\n", "\\mathrm{Tr}(\\prod_{i=1}^n \\mathbf{F}^{(i)}) = \\mathrm{Tr}(\\mathbf{F}^{(n)}\\prod_{i=1}^{n-1} \\mathbf{F}^{(i)})\n", "$$\n", "\n", "一个重要的性质是,对于标量来说,其迹等于它本身:\n", "\n", "$$\\mathrm{Tr}(a)=a$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 1 }