{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4.1 判别函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第三章主要介绍了线性模型在回归问题中的应用。\n",
    "\n",
    "现在我们考虑分类问题：对于一个输入向量 $\\bf x$，我们的目标是将它划分为 $K$ 类中的一类 ${\\cal C}_k$，其中$k=1,\\dots,K$。\n",
    "\n",
    "通常这些类别都是不相交的，因此每个输入只对应于一个类别。这样，我们相当于将输入空间划分为多个决策区域，不同决策区域的边界叫做决策面（`decision surfaces`）。在本章，我们考虑线性分类模型。此时，决策面由输入向量 $\\bf x$ 的线性函数决定，因此是一个输入 $D$ 维空间中的一个 $D-1$ 维的超平面。\n",
    "\n",
    "能被线性决策面完全分开的数据称为线性可分的。\n",
    "\n",
    "对于回归问题，我们的目标值就是我们的预测值；对于分类问题，由于类别是离散的，不能直接预测，我们需要用一些其他方式来表示我们的类别。对于概率模型，最简单的方式是 0-1 表示。例如，对于二类问题，我们用单一目标值 $t\\in\\{0,1\\}$ 来表示类别，$t = 1$ 表示类别 ${\\cal C}_1$，$t = 0$ 表示类别 ${\\cal C}_2$，此时，$t$ 可以看成是样本属于类别 ${\\cal C}_1$ 的概率。\n",
    "\n",
    "对于 $K > 2$ 的情况，1-hot是一种常用的表示方法，即用一个 $K$ 维的 0-1 向量 $\\bf t$ 表示类别，当类别为 ${\\cal C}_j$ 时，除了 $t_j$ 为 1 之外，其他的 $t_k$ 都为 0。例如 $K = 5, j = 2$ 的情况对应于目标向量：\n",
    "\n",
    "$$\n",
    "\\mathbf t = (0,1,0,0,0)^\\top\n",
    "$$\n",
    "\n",
    "同样，我们也可以将 $t_k$ 看成是类别是 ${\\cal C}_k$ 的概率。\n",
    "\n",
    "在线性回归模型中，预测值 $y({\\bf x, w})$ 是一个关于参数 $\\bf w$ 的一个线性函数。最简单的情况为 $y({\\bf x})=f({\\bf w^\\top x} + w_0)$，此时 $y$ 可以为任意一个实数。而在线性分类的模型中，我们的预测值可以看成是概率，因此要限制在 $(0,1)$ 之间。为此，我们将上面的模型一般化，在参数 $\\bf w$ 的线性函数之上，加一个非线性函数 $f(·)$：\n",
    "\n",
    "$$\n",
    "y({\\bf x})=f({\\bf w^\\top x} + w_0)\n",
    "$$\n",
    "\n",
    "函数 $f$ 通常被叫做激活函数（`activation function`）。决策面对应与 $y({\\bf x})={\\rm constant}$，所以这相当于 ${\\bf w^\\top x} + w_0={\\rm constant}$，从而决策面对 $\\bf x$ 是线性的，即使激活函数 $f(·)$ 是非线性的。不过，由于函数对参数 $\\bf w$ 不是线性的，因此计算和优化与线性回归模型不同。\n",
    "\n",
    "在第一章，我们讨论了三种解决分类问题的方法：判别函数，判别式模型 $p({\\cal C}_k|{\\bf x})$，产生式模型 $p({\\bf x}|{\\cal C}_k)$。\n",
    "\n",
    "首先，讨论最简单的线性判别函数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1.1 二类问题"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于二类问题，最简单的线性判别函数为：\n",
    "\n",
    "$$\n",
    "y({\\bf x})={\\bf w^\\top x} + w_0\n",
    "$$\n",
    "\n",
    "其中 $\\bf w$ 是权重（`weight vector`），$w_0$ 是偏置项（`bias`）。偏置项的相反数也叫阈值（`threshold`）。\n",
    "\n",
    "如果 $y({\\bf x})\\geq 0$，那么 $\\bf x$ 属于 ${\\cal C}_1$ 类；否则属于 ${\\cal C}_2$ 类。此时，我们的决策面为 $y(\\mathbf x)=0$，对应与输入 $D$ 维空间中的一个 $DD-1$ 维的超平面。\n",
    "\n",
    "对于这个超平面上的两个点 $\\mathbf x_A,\\mathbf x_B$，由于 $y({\\bf x}_A)=y({\\bf x}_B)=0$，我们有 $\\mathbf w^\\top(\\mathbf x_A-\\mathbf x_B)=0$，从而 $\\bf w$ 与超平面中的任意向量垂直，即是超平面的法向量。\n",
    "\n",
    "根据法向量的性质，设超平面上的某点为 $\\bf x$（意味着 $y({\\bf x})=0$），超平面到原点的距离为\n",
    "\n",
    "$$\n",
    "\\frac{\\bf w^\\top x}{\\|\\bf w\\|}=-\\frac{w_0}{\\|\\bf w\\|}\n",
    "$$\n",
    "\n",
    "这样我们就看到，决策面的位置由偏置项 $w_0$ 来决定。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}