{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 2.12 例子:主成分分析" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们有 $m$ 个数据点 $x^{(1)}, \\dots, x^{(m)} \\in \\mathbb R^{n}$,对于每个数据点 $x^{(i)}$,我们希望找到一个对应的点 $c^{(i)}\\in\\mathbb R^{l}, l < n$ 去表示它(相当于对它进行降维),并且让损失的信息量尽可能少。\n", "\n", "我们可以将这个过程看作是一个编码解码的过程,设编码和解码函数分别为 $f,g$,则有 $f(\\mathbf x)=\\mathbf c, x\\approx g(f(\\mathbf x))$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 主成分分析" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "考虑一个线性解码函数 $g(\\mathbf c)=\\mathbf{Dc}, \\mathbf D\\in \\mathbb R^{n\\times l}$,为了计算方便,我们将这个矩阵的列向量约束为相互正交的。另一方面,考虑到存在尺度放缩的问题,我们将这个矩阵的列向量约束为单位矩阵。\n", "\n", "对于给定的 $x$,我们需要找到信息损失最小的 $\\mathbf c^\\star$,即求解:\n", "\n", "$$\n", "\\mathbf c^\\star=\\arg\\min_{\\mathbf c} \\|\\mathbf x - g(\\mathbf c)\\|_2=\\arg\\min_{\\mathbf c} \\|\\mathbf x - g(\\mathbf c)\\|_2^2 \n", "$$\n", "\n", "这里我们用二范数来衡量信息的损失。展开之后我们有:\n", "\n", "$$\n", "\\|\\mathbf x - g(\\mathbf c)\\|_2^2 \n", "=(\\mathbf x - g(\\mathbf c))^\\top(\\mathbf x - g(\\mathbf c))\n", "= \\mathbf{x^\\top x} - 2\\mathbf x^\\top g(\\mathbf c) + g(\\mathbf c)^\\top g(\\mathbf c)\n", "$$\n", "\n", "结合 $g(\\mathbf c)$ 的表达式,忽略$\\mathbf{x^\\top x}$ 项,我们有:\n", "\n", "$$\n", "\\begin{aligned}\n", "\\mathbf c^\\star & \n", "= \\arg\\min_{\\mathbf c} -2\\mathbf{x^\\top Dc}+\\mathbf{c^\\top D^\\top Dc} \\\\\n", "& = \\arg\\min_{\\mathbf c} -2\\mathbf{x^\\top Dc}+\\mathbf{c}^\\top \\mathbf{I}_{l} \\mathbf c \\\\\n", "& = \\arg\\min_{\\mathbf c} -2\\mathbf{x^\\top Dc}+\\mathbf{c}^\\top\\mathbf c \\\\\n", "\\end{aligned}\n", "$$\n", "\n", "这里用到了 $\\bf D$ 的单位正交性。\n", "\n", "对 $\\bf c$ 求梯度,并令其为零,我们有\n", "\n", "$$\n", "\\begin{aligned}\n", "\\mathbf \\triangledown_{\\mathbf c}(-2\\mathbf{x^\\top Dc}+\\mathbf{c}^\\top\\mathbf c) & = \\mathbf 0 \\\\\n", "-2\\mathbf{D^\\top x}+2\\mathbf{c} & = \\mathbf 0\\\\\n", "\\mathbf c & = \\mathbf{D^\\top x}\n", "\\end{aligned}\n", "$$\n", "\n", "因此,我们的编码函数为\n", "\n", "$$\n", "f({\\bf x})={\\bf D^\\top x}\n", "$$\n", "\n", "此时通过编码解码得到的重构为:\n", "\n", "$$\n", "r({\\bf x})=g(f(\\bf x))={\\bf DD^{\\top}x}\n", "$$\n", "\n", "接下来求解最优的变换 $\\bf D$。由于我们需要将 $\\bf D$ 应用到所有的 $\\bf x_i$ 上,所以我们需要最优化:\n", "\n", "$$\n", "\\begin{aligned}\n", "{\\bf D}^{\\star} & =\\arg\\min_{\\mathbf D} \n", "\\sqrt{\\sum_{i,j} ({\\bf x}_{j}^{(i)}-r({\\bf x}^{(i)})_j)^2} \\\\\n", "s.t. &\\ \\mathbf{D^\\top D = I}_l \\\\\n", "\\end{aligned}\n", "$$\n", "\n", "为了方便,我们考虑 $l=1$ 的情况,此时问题简化为:\n", "\n", "$$\n", "\\begin{aligned}\n", "{\\bf d}^{\\star} & =\\arg\\min_{\\mathbf d} \n", "\\sum_{i} ({\\bf x}_{j}^{(i)}-{\\bf dd^\\top x}^{(i)}))^2 \\\\\n", "s.t. &\\ {\\bf d^\\top d}=1 \\\\\n", "\\end{aligned}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }