{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Assessment\n", "\n", "Chapter 7. Model Assessment and Selection\n", "\n", "前面章节介绍了各种模型,本章讲解如何选择模型,重点是如何评估一个模型的好坏。\n", "\n", "引入损失函数 $L(Y,\\hat{f}(x))$, 通常有\n", "\\begin{align}\n", "L(Y,\\hat{f}(x)) = \\begin{cases}(Y -\\hat{f}(x))^2 & \\textrm{squared error} \\\\ \n", " \\left|Y-\\hat{f}(x)\\right| & \\textrm{absolute error.}\\end{cases}\n", "\\end{align}\n", "定义 *test error*, or *generalization error* 为在一个独立测试样本上的误差\n", "\\begin{align}\n", "\\textrm{Err}_{T} = E\\left[L(Y,\\hat{f}(X)) \\big| T\\right]\n", "\\end{align}\n", "在所有测试样本上做平均,得到 *expected prediction error* or *expected test error*\n", "\\begin{align}\n", "\\textrm{Err} = E[L(Y,\\hat{f}(X))] = E[\\textrm{Err}_{T}]\n", "\\end{align}\n", "基于训练数据,模型也会有一定的误差,定义为 *training error*,\n", "\\begin{align}\n", "\\overline{\\textrm{err}} = \\frac{1}{N}\\sum_{i=1}^N[L(y_i,\\hat{f}(x_i))]\n", "\\end{align}\n", "\n", "如果数据足够,通常会将数据以2:1:1的比例分为训练、验证、和测试三部分,训练产生的各个模型,通过验证数据选择最佳的 (model selection),最后用测试数据评估所选择的模型 (model assessment)。常用的验证分析方法有 AIC、BIC、MDL、SRM。在数据量不足的情况下,引入 Cross-Validation 和 Bootstrap。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 内容概览\n", "\n", "- Bias-variance decomposition. 以 linear regression 为例,假设 $Y=f(X)+\\epsilon$ with $E[\\epsilon] = 0$ and $\\text{Var}(\\epsilon) = \\sigma_{\\epsilon}^2$. 在某个输入点 $x_0$ 的 suqared-error 为 \n", "\\begin{align}\n", "\\text{Err(x_0)} =& E[(Y-\\hat{f}(x_0))^2\\vert X=x_0] \\\\\n", " =& \\sigma_{\\epsilon}^2 + [E\\hat{f}(x_0)-f(x_0)]^2 + E[\\hat{f}(x_0)-E\\hat{f}(x_0)]^2 \\\\\n", " =& \\textrm{Irreducible Error} + \\textrm{Bias}^2 + \\textrm{Variance}\n", "\\end{align}\n", "\n", "- Optimism of the Training Error Rate. 同样是基于训练数据中的 $N$ 个输入 $x_i$,观察另外一组输出 $Y^0$,可以得到 *in-sample error*\n", "\\begin{align}\n", "{\\textrm{Err}_{\\text{in}}} = \\frac{1}{N}\\sum_{i=1}^NE_{Y^0}[L(Y_i^0,\\hat{f}(x_i))\\vert T]\n", "\\end{align}\n", "引入 *optimism* 及其 expection,\n", "\\begin{align}\n", "\\text{op} \\equiv & \\text{Err}_{\\text{in}} - \\overline{\\text{err}} \\\\\n", "\\omega \\equiv & E_{\\mathbf{y}}(\\text{op})\n", "\\end{align}\n", "For squared error, 0-1, and other loss function, \n", "\\begin{align}\n", "\\omega = \\frac{2}{N} \\sum_{i=1}^{N}\\text{Cov}(\\hat{y}_i,y_i)\n", "\\end{align}\n", "If $\\hat{y}$ is obtained by a linear fit with $d$ inputs or basis functions,\n", "\\begin{align}\n", "\\omega = 2\\frac{d}{N}\\sigma_{\\epsilon}^2\n", "\\end{align}\n", "\n", "- Akaike information criterion (AIC)\n", "\\begin{align}\n", "\\text{AIC} = -\\frac{2}{N}\\cdot loglik + 2\\frac{d}{N}\n", "\\end{align}\n", "\n", "- Bayesian information criterion (BIC)\n", "\\begin{align}\n", "\\text{BIC} = -2 \\cdot loglik + \\log N \\cdot d\n", "\\end{align}\n", "\n", "- Cross-Validation \n", "\\begin{align}\n", "\\text{CV}(\\hat{f}) = \\frac{1}{N}\\sum_{i=1}^{N}L(y_i,\\hat{f}^{-k(i)}(x_i))\n", "\\end{align}\n", "\n", "- Bootstrap 重抽样。重数据集中每次抽 N 个数,抽样 B 次。\n", "\\begin{align}\n", "\\widehat{\\text{Err}} =\\frac{1}{B}\\frac{1}{N}\\sum_{b=1}^B\\sum_{i=1}^{N}L(y_i,\\hat{f}^{*b}(x_i))\n", "\\end{align}\n", "Bootstrap 有个 overfit 的问题,因为训练的数据与测试的数据可能重合,当 $N \\rightarrow \\infty$ 的时候,重合概率为 $\\lim_{N\\rightarrow \\infty} 1+\\left(1-\\frac{1}{N}\\right)^N \\approx 1- e^{-1} = 0.632$。这个正是 switching network 中的 blocking probability。 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 具体内容\n", "\n", "本章讨论的内容很基础,也很有意思,深挖的话是一个很长的历史故事,写起来应该很精彩。但是最近实在是太懒了,而且可以预见,之后 AI 用到的无非是 Cross-validation\n", "或者 bootstrap。这两个试用于数据集小的情况,非常好用,但是说不清楚道理。Bootstrap 的内涵,以后再说吧。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }