{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.3 模型选择" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在之前多项式拟合的例子中,我们知道多项式的阶数 $M$ 的选择影响着我们拟合的效果,也决定了模型中参数的个数(模型复杂度)。我们还知道正则项参数 $\\lambda$ 也可以控制模型的复杂度。这些参数的选择是我们所关心的问题。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 验证集 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在最大似然的例子中我们看到,在训练集上的结果好不代表在测试的时候结果一定好,因为存在过拟合的现象。\n", "\n", "如果我们的数据足够多,我们可以从训练数据中分出一小部分数据作为验证集(`validation set`),在训练过程中对验证集进行测试,选择在验证集上效果最好的模型作为我们的结果。最后再用测试集(`test set`)对模型进行最终的评估。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 交叉验证" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在很多情况下,训练和测试数据都是有限的,为了得到更好的模型,我们希望使用尽可能多的训练数据。但是如果验证集的数据很少,那么在测试集上的测评结果就有很大的不确定性。\n", "\n", "一个解决这种问题的方法叫做交叉验证(cross-validation),我们将数据分成 $S$ 等份,每次使用 $S-1$ 份数据进行训练,剩下的一份作为验证集,重复 $S$ 次。下图是一个 $S = 4$ 的交叉验证的展示,粉红色的部分表示验证集。\n", "\n", "当训练数据特别少时,一个合适的方法是使用留一法(leave-one-out),令 $S = N$ 即与训练数据的总数相等。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZoAAADtCAYAAACGVrOuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADIZJREFUeJzt3V+Ipfddx/H3J25iAmKaULQpoca0dakp6IV7EVjTCRFM\nbhraJSCWYFKCkYhpUDTijdoFFwJa1NroRQjYxjQXUROkiJoyARVhaViICwnbmommTYrBrFvSQTrZ\nrxfnGXacnD9zZs9358z4fsFyZp7z/J755TA5b54/c55UFZIkdblsrycgSTrYDI0kqZWhkSS1MjSS\npFaGRpLUytBIklodmvZkEq99lqRdqKrs9RyWxdTQANTqyUsxj/8XsnIE/25pcZL4ei6Ir+ViJTZm\nKw+dSZJaGRpJUitDI0lqZWgkSa0MjSSplaGRJLUyNJKkVoZGktTK0EiSWhkaSVIrQyNJamVoJEmt\nDI0kqZWhkSS1MjSSpFaGRpLUytBIkloZGknSTEkOJ/n9JKtJziU5n+QXdjLW0EiSduJm4CHgh4EX\nhmU7uv+3oZGkfSTJFUn24r37GeA9VfUR4Pg8Aw2NJC2pJPcMh6juSHIiyWvAOnB9kpXhuVvGjFtL\n8viW728Y1j2e5FiSF5OsJ3k5yV07mUtVvVVV39nc5Dz/HYfmWVmStCceAd4eHi8fvp6mGH9Y63bg\nXuBR4CxwP/BkklNVdWZx0/2/DI0kLb8N4GhVbWwuSObaqdh0GDhcVa8P23gaWAPuAx6++GmO56Ez\nSVp+j22NzEV4djMyAFX1BvAScOMCtj2RoZGk5ffKgrbz6phlZ4FrF7T9sQyNJC2/9THLpl1aPOm0\nyDsTlu/qONxOGRpJ2p/eGh6v2bowyZXAdZd+OpMZGknan9YYXSRw27blD7Jk7+1edSZJ+1BVnUvy\nBPBARpegnWb01/tHgTeZ73DYzHWT/CCjiAF8cHj8eJIPDF//UVWdGzfW0EjScpt2LuYhRu/jdzPa\ni3kOuBV4fsa47dvfybrXAp/dNu4TwCeHr/8cGBuaVE3efpKq1ZM7nKtmycoRpr3emk8SX88F8bVc\nrOH1bD3Bvp8s1XE8SdLBY2gkSa0MjSSplaGRJLUyNJKkVoZGktTK0EiSWhkaSVIrQyNJamVoJEmt\nDI0kqZWhkSS1MjSSpFaGRpLUytBIklrNvB/NJZyLJB0Y3o/mgqmhkSTpYnnoTJLUytBIkloZGklS\nK0MjSWplaCRJrQyNJKmVoZEktTI0kqRWh6Y96ScDSNLu+MkAF0wNDYCfHLA4SajVk3s9jQMjK0f8\n/VyQJL6WC5TYmK08dCZJamVoJEmtDI0kqZWhkSS1MjSSpFaGRpLUytBIkloZGklSK0MjSWplaCRJ\nrQyNJKmVoZEktTI0kqRWhkaS1MrQSJJaGRpJUitDI0lqZWgkSTMluTPJl5J8Pcl3k/xbkr9I8uFZ\nY2feylmSJODPgP8CngK+DvwI8MvAnUl+uqpemDTQ0EjSPpLkCmCjqs5f4h/9c1W1um0uXwZOAb8N\n3DlpoIfOJGlJJbknyfkkdyQ5keQ1YB24PsnK8NwtY8atJXl8y/c3DOseT3IsyYtJ1pO8nOSuncxl\ne2SGZS8Bp4GPTBvrHo0kLb9HgLeHx8uHr6ep4d92twP3Ao8CZ4H7gSeTnKqqM/NOKsllwPuAtWnr\nGRpJWn4bwNGq2thckGQ32zkMHK6q14dtPM0oEvcBD+9ie58GrgN+b9pKHjqTpOX32NbIXIRnNyMD\nUFVvAC8BN867oSQ/Afwh8C+M9pAmMjSStPxeWdB2Xh2z7Cxw7TwbSfIh4G+BbwGfmHVhgqGRpOW3\nPmbZuHMwmyadFnlnwvIdH4dL8gHgHxgdzvuZqvr2rDGeo5Gk/emt4fGarQuTXMnovMnCJbkOeA64\nCvhYVY3bQ3oX92gkaX9aY7RXcdu25Q/S8N6e5L3A3wPvBW4fLm3eEfdoJGkfqqpzSZ4AHsjoErTT\nwM3AUeBN5jgctsN1/w74ceBPgZuS3LRtPl+aNNDQSNJym3Yu5iFG7+N3M9qLeQ64FXh+xrjt29/J\nuj85rPdLw7/t25gYmlRN3n6Smva85pOEWj2519M4MLJyBH8/FyOJr+UCDa/nrv7Q5SDyHI0kqZWh\nkSS1MjSSpFaGRpLUytBIkloZGklSK0MjSWplaCRJrQyNJKmVoZEktTI0kqRWhkaS1MrQSJJaGRpJ\nUitDI0lqNfN+NJdwLpJ0YHg/mgumhkaSpIvloTNJUitDI0lqZWgkSa0MjSSplaGRJLUyNJKkVoZG\nktTK0EiSWh2a9qSfDCBJu+MnA1wwNTQAfnLA4iTx9VygJNTqyb2exoGQlSP+bi5QYmO28tCZJKmV\noZEktTI0kqRWhkaS1MrQSJJaGRpJUitDI0lqZWgkSa0MjSSplaGRJLUyNJKkVoZGktTK0EiSWhka\nSVIrQyNJamVoJEmtDI0kqZWhkSTNlOTjSb6S5N+TrCf5zyT/mORTs8bOvJWzJEnAR4HvAF8Avg1c\nDRwDvpjkpqr6rUkDM+0+4UnK+4gvThLvy75ASajVk3s9jQMhK0f83Vyg4f/1NG37CmCjqs53bH/O\nuVwGfA340ap6z6T1PHQmSUsqyT1Jzie5I8mJJK8B68D1SVaG524ZM24tyeNbvr9hWPd4kmNJXhwO\nf72c5K7dzm+I3beA701bz0NnkrT8HgHeHh4vH76epoZ/290O3As8CpwF7geeTHKqqs7sZCJJrh7m\ncC2jQ2c/C/z6tDGGRpKW3wZwtKo2Nhckuzoydxg4XFWvD9t4GlgD7gMe3uE2ngE296K+B3ymqv5k\n2gBDI0nL77GtkbkIz25GBqCq3kjyEnDjHNv4VeAa4P3Ap4A/TvL9VfUHkwYYGklafq8saDuvjll2\nltFhsB2pqhe2fPvFJH8FnEjyVFV9c9wYLwaQpOW3PmbZtMsEJ+1EvDNh+cVcIfdlRudsfmrSCoZG\nkvant4bHa7YuTHIlcN0lnMeVw+PE8BkaSdqf1hhdJHDbtuUP0vDenuSHxiz7PkYXEvwP8M+TxnqO\nRpL2oao6l+QJ4IGMLkE7DdwMHAXeZL7DYTtZ91+TrAKnGH0ywPuBnwd+DPjNqnpz0kBDI0nLbdq5\nmIcYvY/fzWgv5jngVuD5GeO2b38n636e0d/h3Mro42f+GzgJ/FpVfWXaQD+C5hLyI2gWy4+gWRw/\ngmaxOj+CZj/yHI0kqZWhkSS1MjSSpFaGRpLUytBIkloZGklSK0MjSWplaCRJrQyNJKmVoZEktTI0\nkqRWhkaS1MrQSJJaGRpJUitDI0lqZWgkSa1m3vjsEs5Fkg4Mb3x2wdTQSJJ0sTx0JklqZWgkSa0M\njSSplaGRJLUyNJKkVoZGktTK0EiSWh2a9qR/sClJu+MfbF4wNTQA/kHn4iTx9VwgX8/FSUKtntzr\naRwYWTmy11NYKh46kyS1MjSSpFaGRpLUytBIkloZGklSK0MjSWplaCRJrQyNJKmVoZEktTI0kqRW\nhkaS1MrQSJJaGRpJUitDI0lqZWgkSa0MjSSplaGRJLUyNJKkuSX5bJLzSf5j1rqGRpI0lyQfBn4D\neBuYeT/1Q+0zkiQtTJIrgI2qOr+H0/g88FXgKuBDs1Z2j0aSllSSe4bDU3ckOZHkNWAduD7JyvDc\nLWPGrSV5fMv3NwzrHk9yLMmLSdaTvJzkrjnndBfwMeAzQHCPRpIOhEcYHaZ6BLh8+HqaYnwAbgfu\nBR4FzgL3A08mOVVVZ2ZNIskPAJ8DPldVZ5LsaPKGRpKW3wZwtKo2Nhfs9E1+m8PA4ap6fdjG08Aa\ncB/w8A7G/w5wHjg+zw81NJK0/B7bGpmL8OxmZACq6o0kLwE3zhqY5KPAg8DdVfXdeX6o52gkafm9\nsqDtvDpm2Vng2mmDMtp9+gLwT1X11Lw/1D0aSVp+62OWTTsJP+m9/Z0Jy2cdh/skcBQ4lmTrVWZX\nAYeSfBB4u6remGcykqTl9tbweM3WhUmuBK5b8M+6fnh8esLzZ4C/ZhSkdzE0krQ/rTG6SOA24Jkt\nyx9k8adF/gbY/gkAAX4XeB/wi8A3Jw02NJK0D1XVuSRPAA8M51BOAzczOsT1JrMPh201dd2q+gbw\njXcNSn4FuLqq/nLaeEMjSctt2rmYhxi9j9/NaC/mOeBW4PkZ47Zvf6fr7mpsqiavk6SmPa/5JMHX\nc3F8PRcnCbV6cq+ncWBk5QhVtas/dDmIvLxZktTK0EiSWhkaSVIrQyNJamVoJEmtDI0kqZWhkSS1\nMjSSpFaGRpLUytBIkloZGklSK0MjSWplaCRJrQyNJKmVoZEktTI0kqRWM298dgnnIkkHhjc+u2Bq\naCRJulgeOpMktTI0kqRWhkaS1MrQSJJaGRpJUqv/BQ0g0S5YP/0nAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import scipy as sp\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline\n", "\n", "fig, axes = plt.subplots(4, 1)\n", "axes = axes.flatten()\n", "\n", "for i, ax in enumerate(axes):\n", " for j in xrange(4):\n", " ax.plot([j,j], [0,1], color='black')\n", " ax.set_xlim(0,4)\n", " ax.set_xticks([])\n", " ax.set_ylim(0,1)\n", " ax.set_yticks([])\n", " ax.fill_between(np.linspace(i,i+1), 1, color=\"pink\")\n", " ax.text(4.2, 0.35, \"run {}\".format(i+1), fontsize=\"xx-large\")\n", " \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 信息量准则" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "交叉验证的一个主要缺点是训练次数会随着 $S$ 的增加而加大,对于一些计算很耗时的模型来说不是很适用。另一个深入的问题是对于每次的训练,我们可能会使用不同的超参来得到最好的效果。\n", "\n", "因此,我们需要一个更好的方法来决定模型的复杂度。\n", "\n", "历史上有一些信息量准则被人提出,用来衡量模型的复杂度,例如 `AIC (Akaike, 1974)`(`Akaike information criterion`)最大化:\n", "\n", "$$\n", "\\ln p(\\mathcal D|\\mathbf w_{ML}) - M\n", "$$\n", "\n", "其中 $M$ 是模型中可变参数的数目。类似的准则还有 `BIC(Bayesian information criterion)` 等。这些准则没有考虑到模型参数的不确定性,但是在一些简单的模型上效果很好。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }