{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "训练数据集\n", "\\begin{align*} \\\\& T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\} \\end{align*} \n", "其中,$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$,$x_{i}$为第$i$个特征向量(实例),$y_{i}$为第$x_{i}$的类标记,当$y_{i}=+1$时,称$x_{i}$为正例;当$y_{i}= -1$时,称$x_{i}$为负例,$\\left( x_{i}, y_{i} \\right)$称为样本点。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性可分支持向量机(硬间隔支持向量机):给定线性可分训练数据集,通过间隔最大化或等价地求解相应地凸二次规划问题学习得到分离超平面为\n", "\\begin{align*} \\\\& w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", "以及相应的分类决策函数\n", "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} \n", "称为线型可分支持向量机。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "超平面$\\left( w, b \\right)$关于样本点$\\left( x_{i}, y_{i} \\right)$的函数间隔为\n", "\\begin{align*} \\\\& \\hat \\gamma_{i} = y_{i} \\left( w \\cdot x_{i} + b \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "超平面$\\left( w, b \\right)$关于训练集$T$的函数间隔\n", "\\begin{align*} \\\\& \\hat \\gamma = \\min_{i = 1, 2, \\cdots, N} \\hat \\gamma_{i} \\end{align*} \n", "即超平面$\\left( w, b \\right)$关于训练集$T$中所有样本点$\\left( x_{i}, y_{i} \\right)$的函数间隔的最小值。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "超平面$\\left( w, b \\right)$关于样本点$\\left( x_{i}, y_{i} \\right)$的几何间隔为\n", " \\begin{align*} \\\\& \\gamma_{i} = y_{i} \\left( \\dfrac{w}{\\| w \\|} \\cdot x_{i} + \\dfrac{b}{\\| w \\|} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "超平面$\\left( w, b \\right)$关于训练集$T$的几何间隔\n", "\\begin{align*} \\\\& \\gamma = \\min_{i = 1, 2, \\cdots, N} \\gamma_{i} \\end{align*} \n", "即超平面$\\left( w, b \\right)$关于训练集$T$中所有样本点$\\left( x_{i}, y_{i} \\right)$的几何间隔的最小值。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "函数间隔和几何间隔的关系\n", "\\begin{align*} \\\\& \\gamma_{i} = \\dfrac{\\hat \\gamma_{i}}{\\| w \\|} \n", "\\\\& \\gamma = \\dfrac{\\hat \\gamma}{\\| w \\|} \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最大间隔分离超平面等价为求解\n", "\\begin{align*} \\\\& \\max_{w,b} \\quad \\gamma\n", "\\\\ & s.t. \\quad y_{i} \\left( \\dfrac{w}{\\| w \\|} \\cdot x_{i} + \\dfrac{b}{\\| w \\|} \\right) \\geq \\gamma, \\quad i=1,2, \\cdots, N \\end{align*} \n", "等价的\n", "\\begin{align*} \\\\ & \\max_{w,b} \\quad \\dfrac{\\hat \\gamma}{\\| w \\|}\n", "\\\\ & s.t. \\quad y_{i} \\left( w \\cdot x_{i} + b \\right) \\geq \\hat \\gamma, \\quad i=1,2, \\cdots, N \\end{align*} \n", "等价的\n", "\\begin{align*} \\\\ & \\min_{w,b} \\quad \\dfrac{1}{2} \\| w \\|^{2}\n", "\\\\ & s.t. \\quad y_{i} \\left( w \\cdot x_{i} + b \\right) -1 \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性可分支持向量机学习算法(最大间隔法): \n", "输入:线性可分训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$ \n", "输出:最大间隔分离超平面和分类决策函数 \n", "1. 构建并求解约束最优化问题\n", "\\begin{align*} \\\\ & \\min_{w,b} \\quad \\dfrac{1}{2} \\| w \\|^{2}\n", "\\\\ & s.t. \\quad y_{i} \\left( w \\cdot x_{i} + b \\right) -1 \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} \n", "求得最优解$w^{*}, b^{*}$。 \n", "2. 得到分离超平面\n", "\\begin{align*} \\\\ & w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", "以及分类决策函数 \n", "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(硬间隔)支持向量:训练数据集的样本点中与分离超平面距离最近的样本点的实例,即使约束条件等号成立的样本点\n", "\\begin{align*} \\\\ & y_{i} \\left( w \\cdot x_{i} + b \\right) -1 = 0 \\end{align*} \n", "对$y_{i} = +1$的正例点,支持向量在超平面 \n", "\\begin{align*} \\\\ & H_{1}:w \\cdot x + b = 1 \\end{align*} \n", "对$y_{i} = -1$的正例点,支持向量在超平面 \n", "\\begin{align*} \\\\ & H_{1}:w \\cdot x + b = -1 \\end{align*} \n", "$H_{1}$和$H_{2}$称为间隔边界。 \n", "$H_{1}$和$H_{2}$之间的距离称为间隔,且$|H_{1}H_{2}| = \\dfrac{1}{\\| w \\|} + \\dfrac{1}{\\| w \\|} = \\dfrac{2}{\\| w \\|}$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最优化问题的求解: \n", "1. 引入拉格朗日乘子$\\alpha_{i} \\geq 0, i = 1, 2, \\cdots, N$构建拉格朗日函数\n", "\\begin{align*} \\\\ & L \\left( w, b, \\alpha \\right) = \\dfrac{1}{2} \\| w \\|^{2} + \\sum_{i=1}^{N} \\alpha_{i} \\left[- y_{i} \\left( w \\cdot x_{i} + b \\right) + 1 \\right] \n", "\\\\ & = \\dfrac{1}{2} \\| w \\|^{2} - \\sum_{i=1}^{N} \\alpha_{i} y_{i} \\left( w \\cdot x_{i} + b \\right) + \\sum_{i=1}^{N} \\alpha_{i} \\end{align*} \n", "其中,$\\alpha = \\left( \\alpha_{1}, \\alpha_{2}, \\cdots, \\alpha_{N} \\right)^{T}$为拉格朗日乘子向量。\n", "2. 求$\\min_{w,b}L \\left( w, b, \\alpha \\right)$:\n", "\\begin{align*} \\\\ & \\nabla _{w} L \\left( w, b, \\alpha \\right) = w - \\sum_{i=1}^{N} \\alpha_{i} y_{i} x_{i} = 0 \n", "\\\\ & \\nabla _{b} L \\left( w, b, \\alpha \\right) = -\\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0 \\end{align*} \n", "得 \n", "\\begin{align*} \\\\ & w = \\sum_{i=1}^{N} \\alpha_{i} y_{i} x_{i} \n", "\\\\ & \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0 \\end{align*} \n", "代入拉格朗日函数,得\n", "\\begin{align*} \\\\ & L \\left( w, b, \\alpha \\right) = \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} y_{i} \\left[ \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} x_{j} \\right) \\cdot x_{i} + b \\right] + \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} y_{i} b + \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \\end{align*} \n", "即\n", "\\begin{align*} \\\\ & \\min_{w,b}L \\left( w, b, \\alpha \\right) = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \\end{align*} \n", "3.求$\\max_{\\alpha} \\min_{w,b}L \\left( w, b, \\alpha \\right)$:\n", "\\begin{align*} \\\\ & \\max_{\\alpha} - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & \\alpha_{i} \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} \n", "等价的\n", "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & \\alpha_{i} \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性可分支持向量机(硬间隔支持向量机)学习算法: \n", "输入:线性可分训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$ \n", "输出:最大间隔分离超平面和分类决策函数 \n", "1. 构建并求解约束最优化问题\n", "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & \\alpha_{i} \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} \n", "求得最优解$\\alpha^{*} = \\left( \\alpha_{1}^{*}, \\alpha_{1}^{*}, \\cdots, \\alpha_{N}^{*} \\right) $。 \n", "2. 计算\n", "\\begin{align*} \\\\ & w^{*} = \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} x_{i} \\end{align*} \n", "并选择$\\alpha^{*}$的一个正分量$\\alpha_{j}^{*} > 0$,计算\n", "\\begin{align*} \\\\ & b^{*} = y_{j} - \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} \\left( x_{i} \\cdot x_{j} \\right) \\end{align*} \n", "3. 得到分离超平面\n", "\\begin{align*} \\\\ & w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", "以及分类决策函数 \n", "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "线性支持向量机(软间隔支持向量机):给定线性不可分训练数据集,通过求解凸二次规划问题 \n", "\\begin{align*} \\\\ & \\min_{w,b,\\xi} \\quad \\dfrac{1}{2} \\| w \\|^{2} + C \\sum_{i=1}^{N} \\xi_{i}\n", "\\\\ & s.t. \\quad y_{i} \\left( w \\cdot x_{i} + b \\right) \\geq 1 - \\xi_{i}\n", "\\\\ & \\xi_{i} \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} \n", "学习得到分离超平面为\n", "\\begin{align*} \\\\& w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", "以及相应的分类决策函数\n", "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} \n", "称为线型支持向量机。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最优化问题的求解: \n", "1. 引入拉格朗日乘子$\\alpha_{i} \\geq 0, \\mu_{i} \\geq 0, i = 1, 2, \\cdots, N$构建拉格朗日函数\n", "\\begin{align*} \\\\ & L \\left( w, b, \\xi, \\alpha, \\mu \\right) = \\dfrac{1}{2} \\| w \\|^{2} + C \\sum_{i=1}^{N} \\xi_{i} + \\sum_{i=1}^{N} \\alpha_{i} \\left[- y_{i} \\left( w \\cdot x_{i} + b \\right) + 1 - \\xi_{i} \\right] + \\sum_{i=1}^{N} \\mu_{i} \\left( -\\xi_{i} \\right)\n", "\\\\ & = \\dfrac{1}{2} \\| w \\|^{2} + C \\sum_{i=1}^{N} \\xi_{i} - \\sum_{i=1}^{N} \\alpha_{i} \\left[ y_{i} \\left( w \\cdot x_{i} + b \\right) -1 + \\xi_{i} \\right] - \\sum_{i=1}^{N} \\mu_{i} \\xi_{i} \\end{align*} \n", "其中,$\\alpha = \\left( \\alpha_{1}, \\alpha_{2}, \\cdots, \\alpha_{N} \\right)^{T}$以及$\\mu = \\left( \\mu_{1}, \\mu_{2}, \\cdots, \\mu_{N} \\right)^{T}$为拉格朗日乘子向量。 \n", "2. 求$\\min_{w,b}L \\left( w, b, \\xi, \\alpha, \\mu \\right)$:\n", "\\begin{align*} \\\\ & \\nabla_{w} L \\left( w, b, \\xi, \\alpha, \\mu \\right) = w - \\sum_{i=1}^{N} \\alpha_{i} y_{i} x_{i} = 0 \n", "\\\\ & \\nabla_{b} L \\left( w, b, \\xi, \\alpha, \\mu \\right) = -\\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0 \n", "\\\\ & \\nabla_{\\xi_{i}} L \\left( w, b, \\xi, \\alpha, \\mu \\right) = C - \\alpha_{i} - \\mu_{i} = 0 \\end{align*} \n", "得 \n", "\\begin{align*} \\\\ & w = \\sum_{i=1}^{N} \\alpha_{i} y_{i} x_{i} \n", "\\\\ & \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0 \n", "\\\\ & C - \\alpha_{i} - \\mu_{i} = 0\\end{align*} \n", "代入拉格朗日函数,得\n", "\\begin{align*} \\\\ & L \\left( w, b, \\xi, \\alpha, \\mu \\right) = \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + C \\sum_{i=1}^{N} \\xi_{i} - \\sum_{i=1}^{N} \\alpha_{i} y_{i} \\left[ \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} x_{j} \\right) \\cdot x_{i} + b \\right] \n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad + \\sum_{i=1}^{N} \\alpha_{i} - \\sum_{i=1}^{N} \\alpha_{i} \\xi_{i} - \\sum_{i}^{N} \\mu_{i} \\xi_{i}\n", "\\\\ & = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} y_{i} b + \\sum_{i=1}^{N} \\alpha_{i} + \\sum_{i=1}^{N} \\xi_{i} \\left( C - \\alpha_{i} - \\mu_{i} \\right)\n", "\\\\ & = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \\end{align*} \n", "即\n", "\\begin{align*} \\\\ & \\min_{w,b,\\xi}L \\left( w, b, \\xi, \\alpha, \\mu \\right) = - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \\end{align*} \n", "3.求$\\max_{\\alpha} \\min_{w,b, \\xi}L \\left( w, b, \\xi, \\alpha, \\mu \\right)$:\n", "\\begin{align*} \\\\ & \\max_{\\alpha} - \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) + \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & C - \\alpha_{i} - \\mu_{i} = 0\n", "\\\\ & \\alpha_{i} \\geq 0\n", "\\\\ & \\mu_{i} \\geq 0, \\quad i=1,2, \\cdots, N \\end{align*} \n", "等价的\n", "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性支持向量机(软间隔支持向量机)学习算法: \n", "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$ \n", "输出:最大间隔分离超平面和分类决策函数 \n", "1. 选择惩罚参数$C \\geq 0$,构建并求解约束最优化问题\n", "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} \\left( x_{i} \\cdot x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} \n", "求得最优解$\\alpha^{*} = \\left( \\alpha_{1}^{*}, \\alpha_{1}^{*}, \\cdots, \\alpha_{N}^{*} \\right) $。 \n", "2. 计算\n", "\\begin{align*} \\\\ & w^{*} = \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} x_{i} \\end{align*} \n", "并选择$\\alpha^{*}$的一个分量$0 < \\alpha_{j}^{*} < C$,计算\n", "\\begin{align*} \\\\ & b^{*} = y_{j} - \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} \\left( x_{i} \\cdot x_{j} \\right) \\end{align*} \n", "3. 得到分离超平面\n", "\\begin{align*} \\\\ & w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", "以及分类决策函数 \n", "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( w^{*} \\cdot x + b^{*} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(软间隔)支持向量:线性不可分情况下,最优化问题的解$\\alpha^{*} = \\left( \\alpha_{1}^{*}, \\alpha_{2}^{*}, \\cdots, \\alpha_{N}^{*} \\right)^{T}$中对应于$\\alpha_{i}^{*} > 0$的样本点$\\left( x_{i}, y_{i} \\right)$的实例$x_{i}$。 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "实例$x_{i}$的几何间隔\n", "\\begin{align*} \\\\& \\gamma_{i} = \\dfrac{y_{i} \\left( w \\cdot x_{i} + b \\right)}{ \\| w \\|} = \\dfrac{| 1 - \\xi_{i} |}{\\| w \\|} \\end{align*} \n", "且$\\dfrac{1}{2} | H_{1}H_{2} | = \\dfrac{1}{\\| w \\|}$ \n", "则实例$x_{i}$到间隔边界的距离\n", "\\begin{align*} \\\\& \\left| \\gamma_{i} - \\dfrac{1}{\\| w \\|} \\right| = \\left| \\dfrac{| 1 - \\xi_{i} |}{\\| w \\|} - \\dfrac{1}{\\| w \\|} \\right| \n", "\\\\ & = \\dfrac{\\xi_{i}}{\\| w \\|}\\end{align*} \n", "\\begin{align*} \\xi_{i} \\geq 0 \\Leftrightarrow \\left\\{\n", "\\begin{aligned} \n", "\\ & \\xi_{i}=0, x_{i}在间隔边界上;\n", "\\\\ & 0 < \\xi_{i} < 1, x_{i}在间隔边界与分离超平面之间;\n", "\\\\ & \\xi_{i}=1, x_{i}在分离超平面上;\n", "\\\\ & \\xi_{i}>1, x_{i}在分离超平面误分类一侧;\n", "\\end{aligned}\n", "\\right.\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性支持向量机(软间隔)的合页损失函数\n", "\\begin{align*} \\\\& L \\left( y \\left( w \\cdot x + b \\right) \\right) = \\left[ 1 - y \\left(w \\cdot x + b \\right) \\right]_{+} \\end{align*} \n", "其中,“+”为取正函数\n", "\\begin{align*} \\left[ z \\right]_{+} = \\left\\{\n", "\\begin{aligned} \n", "\\ & z, z > 0\n", "\\\\ & 0, z \\leq 0\n", "\\end{aligned}\n", "\\right.\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "核函数 \n", "设$\\mathcal{X}$是输入空间(欧氏空间$R^{n}$的子集或离散集合),$\\mathcal{H}$是特征空间(希尔伯特空间),如果存在一个从$\\mathcal{X}$到$\\mathcal{H}$的映射\n", "\\begin{align*} \\\\& \\phi \\left( x \\right) : \\mathcal{X} \\to \\mathcal{H} \\end{align*} \n", "使得对所有$x,z \\in \\mathcal{X}$,函数$K \\left(x, z \\right)$满足条件 \n", "\\begin{align*} \\\\ & K \\left(x, z \\right) = \\phi \\left( x \\right) \\cdot \\phi \\left( z \\right) \\end{align*} \n", "则称$K \\left(x, z \\right)$为核函数,$\\phi \\left( x \\right)$为映射函数,式中$\\phi \\left( x \\right) \\cdot \\phi \\left( z \\right)$为$\\phi \\left( x \\right)$和$\\phi \\left( z \\right)$的内积。 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "常用核函数: \n", "1. 多项式核函数\n", "\\begin{align*} \\\\& K \\left( x, z \\right) = \\left( x \\cdot z + 1 \\right)^{p} \\end{align*} \n", "2. 高斯核函数 \n", "\\begin{align*} \\\\& K \\left( x, z \\right) = \\exp \\left( - \\dfrac{\\| x - z \\|^{2}}{2 \\sigma^{2}} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "非线性支持向量机:从非线性分类训练集,通过核函数与软间隔最大化,学习得到分类决策函数 \n", "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} K \\left(x, x_{i} \\right) + b^{*} \\right) \\end{align*} \n", "称为非线性支持向量机,$K \\left( x, z \\right)$是正定核函数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "非线性支持向量机学习算法: \n", "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$ \n", "输出:分类决策函数 \n", "1. 选择适当的核函数$K \\left( x, z \\right)$和惩罚参数$C \\geq 0$,构建并求解约束最优化问题\n", "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} K \\left( x_{i}, x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} \n", "求得最优解$\\alpha^{*} = \\left( \\alpha_{1}^{*}, \\alpha_{1}^{*}, \\cdots, \\alpha_{N}^{*} \\right) $。 \n", "2. 计算\n", "\\begin{align*} \\\\ & w^{*} = \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} x_{i} \\end{align*} \n", "并选择$\\alpha^{*}$的一个分量$0 < \\alpha_{j}^{*} < C$,计算\n", "\\begin{align*} \\\\ & b^{*} = y_{j} - \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} K \\left( x_{i}, x_{j} \\right) \\end{align*} \n", "3. 得到分离超平面\n", "\\begin{align*} \\\\ & w^{*} \\cdot x + b^{*} = 0 \\end{align*} \n", "以及分类决策函数 \n", "\\begin{align*} \\\\& f \\left( x \\right) = sign \\left( \\sum_{i=1}^{N} \\alpha_{i}^{*} y_{i} K \\left( x_{i}, x_{j} \\right) + b^{*} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "序列最小最优化(sequential minimal optimization,SMO)算法 要解如下凸二次规划的对偶问题: \n", "\\begin{align*} \\\\ & \\min_{\\alpha} \\dfrac{1}{2} \\sum_{i=1}^{N} \\sum_{j=1}^{N} \\alpha_{i} \\alpha_{j} y_{i} y_{j} K \\left( x_{i}, x_{j} \\right) - \\sum_{i=1}^{N} \\alpha_{i} \n", "\\\\ & s.t. \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2, \\cdots, N \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "选择$\\alpha_{1}, \\alpha_{2}$两个变量,其他变量$\\alpha_{i} \\left( i = 3, 4, \\cdots, N \\right)$是固定的,SMO的最优化问题的子问题 \n", "\\begin{align*} \\\\ & \\min_{\\alpha_{1}, \\alpha_{2}} W \\left( \\alpha_{1}, \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\alpha_{1}^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\alpha_{1} \\alpha_{2} \n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\alpha_{1} + \\alpha_{2} \\right) + y_{1} \\alpha_{1} \\sum_{i=3}^{N} y_{i} \\alpha_{i} K_{i1} + y_{2} \\alpha_{2} \\sum_{i=3}^{N} y_{i} \\alpha_i K_{i2}\n", "\\\\ & s.t. \\quad \\alpha_{1} + \\alpha_{2} = -\\sum_{i=3}^{N} \\alpha_{i} y_{i} = \\varsigma\n", "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2 \\end{align*} \n", "其中,$K_{ij} = K \\left( x_{i}, x_{j} \\right), i,j = 1,2, \\cdots, N, \\varsigma$是常数,且省略了不含$\\alpha_{1}, \\alpha_{2}$的常数项。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "设凸二次规划的对偶问题的初始可行解为$\\alpha_{1}^{old}, \\alpha_{2}^{old}$,最优解为$\\alpha_{1}^{new}, \\alpha_{2}^{new}$,且在沿着约束方向未经剪辑时$\\alpha_{2}$的最优解为$ \\alpha_{2}^{new,unc}$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "由于$\\alpha_{2}^{new}$需要满足$0 \\leq \\alpha_{i} \\leq C$,所以最优解$\\alpha_{2}^{new}$的取值范围需满足\n", "\\begin{align*} \\\\ & L \\leq \\alpha_{2}^{new} \\leq H \\end{align*} \n", "其中,L与H是$\\alpha_{2}^{new}$所在的对角线段断点的界。 \n", "如果$y_{1} \\neq y_{2}$,则 \n", "\\begin{align*} \\\\ & L = \\max \\left( 0, \\alpha_{2}^{old} - \\alpha_{1}^{old} \\right), H = \\min \\left( C, C + \\alpha_{2}^{old} - \\alpha_{1}^{old} \\right) \\end{align*} \n", "如果$y_{1} = y_{2}$,则 \n", "\\begin{align*} \\\\ & L = \\max \\left( 0, \\alpha_{2}^{old} + \\alpha_{1}^{old} - C \\right), H = \\min \\left( C, \\alpha_{2}^{old} + \\alpha_{1}^{old} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "记\n", "\\begin{align*} \\\\ & g \\left( x \\right) = \\sum_{i=1}^{N} \\alpha_{i} y_{i} K \\left( x_{i}, x \\right) + b \\end{align*} \n", "令\n", "\\begin{align*} \\\\ & E_{i} = g \\left( x_{i} \\right) - y_{i} = \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} K \\left( x_{j}, x_{i} \\right) + b \\right) - y_{i}, \\quad i=1,2\n", "\\\\ & v_{i} = \\sum_{j=3}^{N} \\alpha_{j} y_{j} K \\left( x_{i}, x_{j} \\right) = g \\left( x_{i} \\right) - \\sum_{j=1}^{2}\\alpha_{j} y_{j} K \\left( x_{i}, x_{j} \\right) - b, \\quad i=1,2\\end{align*} \n", "则\n", "\\begin{align*} \\\\ & W \\left( \\alpha_{1}, \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\alpha_{1}^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\alpha_{1} \\alpha_{2} \n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\alpha_{1} + \\alpha_{2} \\right) + y_{1} v_{1} \\alpha_{1}+ y_{2} v_{2} \\alpha_{2} \n", "\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "由于$\\alpha_{1} y_{1} = \\varsigma, y_{i}^{2} = 1$,可将$\\alpha_{1}$表示为\n", "\\begin{align*} \\\\ & \\alpha_{1} = \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1}\\end{align*} \n", "代入,得\n", "\\begin{align*} \\\\ & W \\left( \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\left[ \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} \\right]^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} \\alpha_{2} \n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left[ \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} + \\alpha_{2} \\right] + y_{1} v_{1} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} + y_{2} v_{2} \\alpha_{2}\n", "\\\\ & = \\dfrac{1}{2} K_{11} \\left( \\varsigma - y_{2} \\alpha_{2} \\right)^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{2} K_{12} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) \\alpha_{2} \n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\varsigma - y_{2} \\alpha_{2} \\right) y_{1} - \\alpha_{2} + v_{1} \\left( \\varsigma - y_{2} \\alpha_{2} \\right) + y_{2} v_{2} \\alpha_{2}\n", "\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对$\\alpha_{2}$求导\n", "\\begin{align*} \\\\ & \\dfrac {\\partial W}{\\partial \\alpha_{2}} = K_{11} \\alpha_{2} + K_{22} \\alpha_{2} -2 K_{12} \\alpha_{2}\n", "\\\\ & \\quad\\quad\\quad - K_{11} \\varsigma y_{2} + K_{12} \\varsigma y_{2} + y_{1} y_{2} -1 - v_{1} y_{2} + y_{2} v_{2} \\end{align*} \n", "令其为0,得\n", "\\begin{align*} \\\\ & \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2} = y_{2} \\left( y_{2} - y_{1} + \\varsigma K_{11} - \\varsigma K_{12} + v_{1} - v_{2} \\right)\n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad = y_{2} \\left[ y_{2} - y_{1} + \\varsigma K_{11} - \\varsigma K_{12} + \\left( g \\left( x_{1} \\right) - \\sum_{j=1}^{2}\\alpha_{j} y_{j} K_1j - b \\right) \n", "\\\\ \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad - \\left( g \\left( x_{2} \\right) - \\sum_{j=1}^{2}\\alpha_{j} y_{j} K_2j - b \\right) \\right]\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将$\\varsigma = \\alpha_{1}^{old} y_{1} + \\alpha_{2}^{old} y_{2}$代入,得\n", "\\begin{align*} \\\\ & \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2}^{new,unc} = y_{2} \\left( \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2}^{old} y_{2} + y_{2} - y_{1} + g \\left( x_{1} \\right) - g \\left( x_{2} \\right) \\right)\n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad = \\left( K_{11} + K_{22} - 2 K_{12} \\right) \\alpha_{2}^{old} + y_{2} \\left( E_{1} - E_{2} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "令$\\eta = K_{11} + K_{22} - 2 K_{12}$代入,得\n", "\\begin{align*} \\\\ & \\alpha_{2}^{new,unc} = \\alpha_{2}^{old} + \\dfrac{y_{2} \\left( E_{1} - E_{2} \\right)}{\\eta}\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "经剪辑后\n", "\\begin{align*} \\alpha_{2}^{new} = \\left\\{\n", "\\begin{aligned} \n", "\\ & H, \\alpha_{2}^{new,unc} > H\n", "\\\\ & \\alpha_{2}^{new,unc}, L \\leq \\alpha_{2}^{new,unc} \\leq H\n", "\\\\ & L, \\alpha_{2}^{new,unc} < L \n", "\\end{aligned}\n", "\\right.\\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "由于$\\varsigma = \\alpha_{1}^{old} y_{1} + \\alpha_{2}^{old} y_{2}$及$\\varsigma = \\alpha_{1}^{new} y_{1} + \\alpha_{2}^{new} y_{2}$ \n", "则\n", "\\begin{align*} \\\\ & \\alpha_{1}^{old} y_{1} + \\alpha_{2}^{old} y_{2} = \\alpha_{1}^{new} y_{1} + \\alpha_{2}^{new} y_{2}\n", "\\\\ & \\quad\\quad\\quad\\quad \\alpha_{1}^{new} = \\alpha_{1}^{old} + y_{1} y_{2} \\left( \\alpha_{2}^{old} - \\alpha_{2}^{new} \\right) \\end{align*} " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "由分量$0 < \\alpha_{1}^{new} < C$,则\n", "\\begin{align*} \\\\ & b_1^{new} = y_{1} - \\sum_{i=3}^{N} \\alpha_{i} y_{i} K_{i1} - \\alpha_{1}^{new} y_{1} K_{11} - \\alpha_{2}^{new} y_{2} K_{21} \\end{align*} \n", "由\n", "\\begin{align*} \\\\ & E_{1} = g \\left( x_{1} \\right) - y_{1} = \\left( \\sum_{j=1}^{N} \\alpha_{j} y_{j} K_{ij} + b \\right) - y_{1}\n", "\\\\ & = \\sum_{i=3}^{N} \\alpha_{i} y_{i} K_{i1} + \\alpha_{1}^{old} y_{1} K_{11} + \\alpha_{2}^{old} y_{2} K_{21} + b^{old} - y_{1} \\end{align*} \n", "则\n", "\\begin{align*} \\\\ & y_{1} - \\sum_{i=3}^{N} \\alpha_{i} y_{i} K_{i1} = -E_{1} + \\alpha_{1}^{old} y_{1} K_{11} + \\alpha_{2}^{old} y_{2} K_{21} + b^{old} \\end{align*} \n", "代入,得\n", "\\begin{align*} \\\\ & b_1^{new} = -E_{1} + y_{1} K_{11} \\left( \\alpha_{1}^{new} - \\alpha_{1}^{old} \\right) - y_{2} K_{21} \\left( \\alpha_{2}^{new} - \\alpha_{2}^{old} \\right) + b^{old} \\end{align*} \n", "同理,得\n", "\\begin{align*} \\\\ & b_2^{new} = -E_{2} + y_{1} K_{12} \\left( \\alpha_{1}^{new} - \\alpha_{1}^{old} \\right) - y_{2} K_{22} \\left( \\alpha_{2}^{new} - \\alpha_{2}^{old} \\right) + b^{old} \\end{align*} \n", "如果$\\alpha_{1}^{new}, \\alpha_{2}^{new}$满足$0 < \\alpha_{i}^{new} < C, i = 1, 2$, \n", "则 \n", "\\begin{align*} \\\\ & b^{new} = b_{1}^{new} = b_{2}^{new}\\end{align*} \n", "否则\n", "\\begin{align*} \\\\ & b^{new} = \\dfrac{b_{1}^{new} + b_{2}^{new}}{2} \\end{align*} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "更新$E_{i}$ \n", "\\begin{align*} \\\\ & E_{i}^{new} = \\sum_{S} y_{j} \\alpha_{j} K_{ \\left( x_{i}, x_{j} \\right)} + b^{new} - y_{i} \\end{align*} \n", "其中,$S$是所有支持向量$x_{j}$的集合。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SMO算法: \n", "输入:训练数据集$T = \\left\\{ \\left( x_{1}, y_{1} \\right), \\left( x_{2}, y_{2} \\right), \\cdots, \\left( x_{N}, y_{N} \\right) \\right\\}$,其中$x_{i} \\in \\mathcal{X} = R^{n}, y_{i} \\in \\mathcal{Y} = \\left\\{ +1, -1 \\right\\}, i = 1, 2, \\cdots, N$,精度$\\varepsilon$; \n", "输出:近似解$\\hat \\alpha$ \n", "1. 取初始值$\\alpha^{0} = 0$,令$k = 0$;\n", "2. 选取优化变量$\\alpha_{1}^{\\left( k \\right)},\\alpha_{2}^{\\left( k \\right)}$,求解\n", "\\begin{align*} \\\\ & \\min_{\\alpha_{1}, \\alpha_{2}} W \\left( \\alpha_{1}, \\alpha_{2} \\right) = \\dfrac{1}{2} K_{11} \\alpha_{1}^{2} + \\dfrac{1}{2} K_{22} \\alpha_{2}^{2} + y_{1} y_{2} K_{12} \\alpha_{1} \\alpha_{2} \n", "\\\\ & \\quad\\quad\\quad\\quad\\quad\\quad - \\left( \\alpha_{1} + \\alpha_{2} \\right) + y_{1} \\alpha_{1} \\sum_{i=3}^{N} y_{i} \\alpha_{i} K_{i1} + y_{2} \\alpha_{2} \\sum_{i=3}^{N} y_{i} \\alpha_i K_{i2}\n", "\\\\ & s.t. \\quad \\alpha_{1} + \\alpha_{2} = -\\sum_{i=3}^{N} \\alpha_{i} y_{i} = \\varsigma\n", "\\\\ & 0 \\leq \\alpha_{i} \\leq C , \\quad i=1,2 \\end{align*} \n", "求得最优解$\\alpha_{1}^{\\left( k+1 \\right)},\\alpha_{2}^{\\left( k+1 \\right)}$,更新$\\alpha$为$\\alpha^{\\left( k+1 \\right)}$;\n", "3. 若在精度$\\varepsilon$范围内满足停机条件\n", "\\begin{align*} \\\\ & \\sum_{i=1}^{N} \\alpha_{i} y_{i} = 0\n", "\\\\ & 0 \\leq \\alpha_{i} \\leq C, i = 1, 2, \\cdots, N\n", "\\\\ & \\end{align*} \n", "\\begin{align*} y_{i} \\cdot g \\left( x_{i} \\right) = \\left\\{\n", "\\begin{aligned} \n", "\\ & \\geq 1, \\left\\{ x_{i} | \\alpha_{i} = 0 \\right\\}\n", "\\\\ & = 1, \\left\\{ x_{i} | 0 < \\alpha_{i} < C \\right\\}\n", "\\\\ & \\leq 1, \\left\\{ x_{i} | \\alpha_{i} = C \\right\\}\n", "\\end{aligned}\n", "\\right.\\end{align*} \n", "则转4.;否则令$k = k + 1$,转2.; \n", "4.取$\\hat \\alpha = \\alpha^{\\left( k + 1 \\right)}$。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }