{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Assessment\n", "\n", "Chapter 7. Model Assessment and Selection\n", "\n", "前面章节介绍了各种模型，本章讲解如何选择模型，重点是如何评估一个模型的好坏。\n", "\n", "引入损失函数 $L(Y,\\hat{f}(x))$, 通常有\n", "\\begin{align}\n", "L(Y,\\hat{f}(x)) = \\begin{cases}(Y -\\hat{f}(x))^2 & \\textrm{squared error} \\\\ \n", " \\left|Y-\\hat{f}(x)\\right| & \\textrm{absolute error.}\\end{cases}\n", "\\end{align}\n", "定义 *test error*, or *generalization error* 为在一个独立测试样本上的误差\n", "\\begin{align}\n", "\\textrm{Err}_{T} = E\\left[L(Y,\\hat{f}(X)) \\big| T\\right]\n", "\\end{align}\n", "在所有测试样本上做平均，得到 *expected prediction error* or *expected test error*\n", "\\begin{align}\n", "\\textrm{Err} = E[L(Y,\\hat{f}(X))] = E[\\textrm{Err}_{T}]\n", "\\end{align}\n", "基于训练数据，模型也会有一定的误差，定义为 *training error*，\n", "\\begin{align}\n", "\\overline{\\textrm{err}} = \\frac{1}{N}\\sum_{i=1}^N[L(y_i,\\hat{f}(x_i))]\n", "\\end{align}\n", "\n", "如果数据足够，通常会将数据以2：1：1的比例分为训练、验证、和测试三部分，训练产生的各个模型，通过验证数据选择最佳的 (model selection)，最后用测试数据评估所选择的模型 (model assessment)。常用的验证分析方法有 AIC、BIC、MDL、SRM。在数据量不足的情况下，引入 Cross-Validation 和 Bootstrap。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 内容概览\n", "\n", "- Bias-variance decomposition. 以 linear regression 为例，假设 $Y=f(X)+\\epsilon$ with $E[\\epsilon] = 0$ and $\\text{Var}(\\epsilon) = \\sigma_{\\epsilon}^2$. 在某个输入点 $x_0$ 的 suqared-error 为 \n", "\\begin{align}\n", "\\text{Err(x_0)} =& E[(Y-\\hat{f}(x_0))^2\\vert X=x_0] \\\\\n", " =& \\sigma_{\\epsilon}^2 + [E\\hat{f}(x_0)-f(x_0)]^2 + E[\\hat{f}(x_0)-E\\hat{f}(x_0)]^2 \\\\\n", " =& \\textrm{Irreducible Error} + \\textrm{Bias}^2 + \\textrm{Variance}\n", "\\end{align}\n", "\n", "- Optimism of the Training Error Rate. 同样是基于训练数据中的 $N$ 个输入 $x_i$，观察另外一组输出 $Y^0$，可以得到 *in-sample error*\n", "\\begin{align}\n", "{\\textrm{Err}_{\\text{in}}} = \\frac{1}{N}\\sum_{i=1}^NE_{Y^0}[L(Y_i^0,\\hat{f}(x_i))\\vert T]\n", "\\end{align}\n", "引入 *optimism* 及其 expection，\n", "\\begin{align}\n", "\\text{op} \\equiv & \\text{Err}_{\\text{in}} - \\overline{\\text{err}} \\\\\n", "\\omega \\equiv & E_{\\mathbf{y}}(\\text{op})\n", "\\end{align}\n", "For squared error, 0-1, and other loss function, \n", "\\begin{align}\n", "\\omega = \\frac{2}{N} \\sum_{i=1}^{N}\\text{Cov}(\\hat{y}_i,y_i)\n", "\\end{align}\n", "If $\\hat{y}$ is obtained by a linear fit with $d$ inputs or basis functions,\n", "\\begin{align}\n", "\\omega = 2\\frac{d}{N}\\sigma_{\\epsilon}^2\n", "\\end{align}\n", "\n", "- Akaike information criterion (AIC)\n", "\\begin{align}\n", "\\text{AIC} = -\\frac{2}{N}\\cdot loglik + 2\\frac{d}{N}\n", "\\end{align}\n", "\n", "- Bayesian information criterion (BIC)\n", "\\begin{align}\n", "\\text{BIC} = -2 \\cdot loglik + \\log N \\cdot d\n", "\\end{align}\n", "\n", "- Cross-Validation \n", "\\begin{align}\n", "\\text{CV}(\\hat{f}) = \\frac{1}{N}\\sum_{i=1}^{N}L(y_i,\\hat{f}^{-k(i)}(x_i))\n", "\\end{align}\n", "\n", "- Bootstrap 重抽样。重数据集中每次抽 N 个数，抽样 B 次。\n", "\\begin{align}\n", "\\widehat{\\text{Err}} =\\frac{1}{B}\\frac{1}{N}\\sum_{b=1}^B\\sum_{i=1}^{N}L(y_i,\\hat{f}^{*b}(x_i))\n", "\\end{align}\n", "Bootstrap 有个 overfit 的问题，因为训练的数据与测试的数据可能重合，当 $N \\rightarrow \\infty$ 的时候，重合概率为 $\\lim_{N\\rightarrow \\infty} 1+\\left(1-\\frac{1}{N}\\right)^N \\approx 1- e^{-1} = 0.632$。这个正是 switching network 中的 blocking probability。 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 具体内容\n", "\n", "本章讨论的内容很基础，也很有意思，深挖的话是一个很长的历史故事，写起来应该很精彩。但是最近实在是太懒了，而且可以预见，之后 AI 用到的无非是 Cross-validation\n", "或者 bootstrap。这两个试用于数据集小的情况，非常好用，但是说不清楚道理。Bootstrap 的内涵，以后再说吧。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }