{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Goal - show examples of cross-validation for model selection and bootstrapping for prediction error estimation.\n", "
" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix, roc_auc_score\n", "from sklearn import linear_model\n", "import course_utils as bd\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "%matplotlib inline\n", "reload(bd)\n", "\n", "#Load data and downsample for a 50/50 split, then split into a train/test\n", "f = '/Users/briand/Desktop/ds course/datasets/ads_dataset_cut.txt'\n", "\n", "train_split = 0.8\n", "tdat = pd.read_csv(f, header = 0,sep = '\\t')\n", "moddat = bd.downSample(tdat, 'y_buy', 4)\n", "\n", "#We know the dataset is sorted so we can just split by index\n", "train = moddat[:int(np.floor(moddat.shape[0]*train_split))]\n", "test = moddat[int(np.floor(moddat.shape[0]*train_split)):]\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say we want to build a classifier that does ranking based on the probability of the label being 1. We can use Logistic Regression with AUC as our validation metric. We need to choose a regularization weight, but because we have limited data, we can do this via cross-validation.
\n",
"\n",
"The following function performs the cross-validation for various levels of C\n",
"\n",
"
Now we can run our experiment
" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "xval_dict = {'e':[], 'mu':[], 'sig':[]}\n", "k = 10\n", "auc_cv = xValAUC(train, 'y_buy', k, [10**i for i in range(-20,20)])\n", "for i in range(-20,20):\n", " xval_dict['e'].append(i)\n", " xval_dict['mu'].append(np.array(auc_cv[10**i]).mean())\n", " xval_dict['sig'].append(np.sqrt(np.array(auc_cv[10**i]).var()))\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now load the results from above into a dataframe and begin to analyze\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "res = pd.DataFrame(xval_dict)\n", "\n", "#Get the confidence intervals\n", "res['low'] = res['mu'] - 1.96*res['sig']/np.sqrt(10)\n", "res['up'] = res['mu'] + 1.96*res['sig']/np.sqrt(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's plot the results to get a sense of how AUC varies with regularization strength. We'll also plot the lower and upper 95% confidence bands to get a sense of the variance.\n", "
" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "We see that AUC is definitely affected by the choice of regularization strength (here we seem to do better with less regularization, probably due to very strong signal-to-noise in the data). But its hard to see what is best in the range $1$ to $10^{30}$. We can zoom in to see this better."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEZCAYAAACJjGL9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYHFW9//F3J5N9IYGQQAJkEEUCsiiQC7JkBOSGRUEE\nVERAvIpeQXC5YFAhFxUQEVyuXBUVAUFcrwICApEAbmySsC/BhEASCAkEErYkpH5/fE79uqamuqd7\n0j1d3f15PU893V1dXXVqpvt86yx1DpiZmZmZmZmZmZmZmZmZmZmZmZmZmZlZzvwM+Gp4vhfwSIXb\n1ttxwO39dKxKrQPe1OhEWPMY0OgEWEsbCcwHjkqsGwUsBA6r8bGisIAy5m0q3LY3s4GP9T1ZFRkJ\nrAKuy3gvK1OfCVyeeD0a+DbwJLASmAdcCGxU64Rm2BT4CbAYeAl4OKRveD8c2/qJA4XV0yrgBJSJ\njQvrzgPuBH5Xh+MV6rBtpQFlfbwfBc8uYEIF2yfTNBiYBUwB/h0F4t2BZcDUmqaypw2BvwNDgN1Q\nwHo3sAGwVZ2PbWYt5hLgSpQRLgPGl9juYeCgxOsO4Dlgp/D618ASYAVwK7Bt6hhxdVIX8FTivbcD\n/0RXvFcBv0hsOxa4FlgKPA9cA0wK730dWAu8iq7UvxvWbwPcBCxHVVxHJI61EXA18CJwRzhOb1VP\nfwY+F/b5+dR7vZUo/gN4huqu4NcBJwFPoL/veShwDkZ/g7clth0PvEx26eRrwNwqjmtmVtIYlME/\nBxxbZruvAD9PvD4IeDDx+jhgBDAIVa3cm3ivVKAYjKpkTgYGoqv31cBZ4f0NgfcBQ1EV0K+A/0vs\n9xbg+MTrEWHfx6IS+U7hvKaE968KyzBgO+Bp4LYy5zwZBaPNgI/TM+MtFSguSxzvkjL7z7IOlULG\nAJsDj1KsXvs+cG5i25OBP5TYzz+AM6s8tplZSTejqqjRZbbZCl31Dw2vrwC+XGLbMSjDGxVelwoU\newOLUp/9K8VAkbYTuqqO3UL3NooP0DPj/yFwBgpEq4GtE+99nfIlii8DfwvPNwLWUCxBQe+B4kbg\n7DL7z7IO2D/x+lPo/wPwbyiwxu4GDi+xn8eAT1R5bGtCbqOw/nA0unK+GfhGYv31qEpnJfAhVBXy\nMPBeVJXyHlRlBcqEz0UNtS+iRnIotn2UMpGegeJJim0Uw1FGvyDs91ZUx55sw0i2CUxGmekLieUo\n1LYwDlWXJau9FvaSvmNQlRqoKms23Utdb6ASVNIgFFDiz0zs5RhZ0mmM93EHqmrrQlVsW6GqtCx9\nPbY1GQcKq7fxwAWoLv2TwJHAnuG9A1CJYBRqNyA8fgg4BHgI+FdYfxQKIPuijHzLsL5Uhh5bQrHN\nITY5se3nUQlgatjvtLDPeL/pfS5EwWRsYhkFfBq1v6wFtkhsvwWlvRN4MypVLAnL7uFc49/mQorn\nGtuS4lX/zagRu9peRuk0JoPppSi4fwQFsdUl9nEzqrarphOBmVkPv0JX7LGPoVLD4BLbbwK8gjLj\nkxLrP4XaJEahdoKL6F4t8zPKt1F8Bl2JH0b3NopvoG6pQ1B7xf+F/cYZ9S9Q9VFsJCp9HB32NwjY\nlWJ33LixfBhqbC/XRvFD4AYUTOOlE1W/HRy2ORv4Cwp2A4D9wvtxQ/5g1IvseuCtYZuNgNNRIM6y\nDjWcx20UD6NAHtscVb8toBjUs4xFJbvLKAaeScC3gO3LfM7M7P87FGWU6XaJWZS/4e1mlJkne0eN\nAH6PMsn56Gr3DYqB4hKKmX8X3at8dqZnr6d4201RO8RK1IPpE2G/caDYDTX2Po+6+YJKIHFPqWUh\nvTuE98ahnlMvosbes8gOFEPDPg/KeO/7KMDG250XznkFajM4OLX9aNS4v5DifRTno4w8yzrgRFTV\ntwz4Jj1rF26mWJorJ76PYgnF+yi+ggKlWcWmox/g48BpGe+PRVdxc1H96HZh/eboB/wg8AC6IjSz\n/vETSjf4m9XUQHR104mK6HModiOMfRNdgYCKznHvi00o9v4Yia7q0p81s9rrRI30kxucDsuJejdm\nT0WBYgHqpXEVaqRMmoJKDqBg0AlsjG4imhPWr0JFWvewMKuvrwL3o+quJ3vZ1tpEvQPFJLp3w3ua\nnj1Q5lIc92cquorZLLVNJ7q79o7aJ9HMEr6COgyc0+iEWH7UO1BUMk7Ouaj3xb2oge1e1JgYGwn8\nBt0huqrWCTQzs/I66rz/RahROrY5KlUkraT7EAnzKfa2GAT8Fg3r8PuM/c/Dg4+ZmVXrCXQPTy50\noAR1ov7eWY3ZG1DsU/9x1B8edBPPZajbXyn9MbJns5jZ6ATkyMxGJyBHZjY6ATkys9EJyJGq8s56\nlyjWouqkP6EeUD9BjdInhPd/iG4c+hlK+AMUx9XZA93UdB/Fwd9moBuUzMysn9Q7UIDuGL0+tS55\np+7fUbfYtL/gIUbMzBrOGXHrmN3oBOTI7EYnIEdmNzoBOTK70QmwxnAbhZlZ9arKO12iMDOzshwo\nzMysLAcKMzMry4HCzMzKcqDIr2PpObOZmVm/c6DIpwLwFuAuNM7V7o1NjplZ82qF7rEDKD0t6Eg0\nHegT6MbE9BDtZmZ90Qp5Z8Wa/WRHosEOs2b+SxqIJrE/qZftzMwq0ex5Z1Wa+WQ3R4Mk/oTSJQoz\ns3rI1aCAlm0qmif8QuBbrH/A+zJwO2rTeGU992X11YFmcIzQoJnx8lp4bDUboouiTVPL5cCdGduP\nR9OwrumvBNZBITw284VsNw4U/W9vNMfGx4Cra7C/ASg4XICGcJ8H3BOWi4B1NThGrIAyuY3CMbOm\nytwMjfw7OLEMAhaSPadIfxgBTAjLJuFxPhrVOG1/4FBgcWp5EnixgmONArYI26bnXgFdGHwwPB+I\nfoMdwOfpPlhm7HzgoxSDycvob38O+h6lHQRsH7Z5OXxmHbqImJex/fbofxaF7daF54+WSH8XsAP6\nnw5OPF5NdsZ/BrAvsCSxzAOez9gWdPF0GGqXezCx3Exlf/+hwFjgObID7/7AS2iunGeobUDaCJiF\nOqIMBlag83wcODhj+2HAfigwPh+2fw14NSy54UDR/+4B3oWGVK+FdShIXAAMQT/8ndGXNStIDAzb\nVVry2AXNQrgFujJ8GViGMtmTM7afhH7oq9GPcHVYSv3IDwe+H/a5DFgeHm8FrsjYfqfwmQ6UScWP\ndwM/zdj+WOAHwLMoY3g2LMtKpOcplDFNBPYK5zMR+CWaTzrtUJSRb4Gm8R2KgsrZ6Ko57RSy/26l\nnIGCQkfY9whgONmZOOj/OyakezjKjArofLMCRRdwILrgKITHAej7lHWMiei7lfz/vk7p0tApZc4t\ny4dDmrcBtgvLsWge76zv0G/QVAVj0XkPQBnvVHRxknZU2H4SuuhZjs5zenieNhb9X98CbB2WycA+\n9Px9vYAuAB9HGf7YsAwrca6j0ZQLY1HJawP0P14Szjttc+BaFEReC8tdaPrauir0vkmuRTT/OfS3\nbVGm+gSar3w4yuSeBN6fsf0EYEf0o3sKBYpa6gDGZSxPAX/M2H4H1PtrDcqc4sdH0NVc2kCKV8n1\nsBO632Uh+hsur+OxrKe3o2D1Qlheo/K/fwf6fm+GLuDSwa6ASh7LUeb/WGL5G7UtrVdiCKo1GBqW\nYSh4/q0P+6oq72z2TNaBom+GAG9Dme5KlMktAJY2ME1m1n/aKu/M+5XbduiK3cwsT/Ked9ZUXk92\nF9TtdTmq5zYzy5O85p11kaeTHY4asu5CPWq+iOo/zczyJk95J9NRI+PjZN99PBbdTzAXuIPuLf29\nfRbydbI7oS6CB6IGVDOzvMpN3jkQdcfrRN0X56AW+6RvUuza9VbUV7rSz0KOTtbMrInkZirUqSiz\nX4C6MF5Fz0HtpgC3hOePosAwvsLPNkIn8HUU1MzM2kI9A8Uk1Bc+9nRYlzQX3ZwFCg6TUZ/mSj7b\nXwro7slrUV/rEdT+XgIzs9yq553ZlRRtzgW+A9yL7ry8F3ijws/2h+2Bi9Edk+cDR+KxlMyszdQz\nUCxCt5zHNqfnkAArgeMTr+ejO4aHVfDZ2MzE89lhqZXn0dg8v0MBzMysGXWFJXc6UKbfiQbIymqQ\n3oDiENsfB35WxWchPyUPM7Nmkqu88wDUSD0PmBHWnRAW0BSfj6JusL9BgaPcZ9NqcbJDQnqm1mBf\nZmbNIFeBot7W52SHo1E8nwauQ+MemZm1AweKXoxAN/A9g9oedq5piszM8s8z3PViIBpTfj9qNyeE\nmZnlVFsVn8zMaiQ3d2abmVkLcKAwM7OyHCjMzKysdgoUI2ijqf/MzEyqaZD5LvD5eiXEzKyJtFVH\noEpPtoCGBNmxjmkxM2sWDhQZtkHDlrvqyczM3WMzHYSG6WirKGpmZpVn/LOA99YzIWZmTcQlipQB\naC6JPzc6IWZm1v9clWRmVj2XKMzMrHYcKMzMrCwHCjMzK8uBwszMymr1QPEFYMNGJ8LMzEqbDjwC\nPI6mH00bB9wAzEGzzR2XeG8G8CBwP3AlMCTj8+Va7jcGVgCDq020mVmLy02P0YHAPKATGISCwZTU\nNjOBc8LzccByND1rJ/AvisHhl8CxGccod7LHAL+tOtVmZq0vN91jp6JAsQBYA1wFHJLaZgkwOjwf\njQLFWuCl8JnhKHAMBxZVefyDgD/2Id1mZtZPDgcuTrw+GvheapsBwGxgMbASOCDx3ifCuqXA5SWO\nUSoqDgKeBzapKsVmZu2hqhJFR71SQWUJOR1VSXUBWwE3ATsAE4BTUBXUi8CvgQ8DV2TsY2bi+eyw\n7I6qrp6pPtlmZi2nKyy5sxtqqI7NoGeD9nXAHonXs4BdgSOBHyfWfwT4fsYxSgWjccDO1STWzKyN\n5KYxuwNNFtSJeh5lNWZfAJwZnk8AnkbdWXdEvaCGoTkkLgU+nXGM3JysmVkTyVXeeQDwKGrUnhHW\nnRAW0JX/NcBc1A32qMRnT6XYPfZS1O6QlquTNTNrEm2Vd7bVyZqZ1UhuuseamVkLaLVA0UF9e3KZ\nmbWdVgsU+wNXNzoRZmatpNUCxUHArY1OhJmZ5UeyQaaAhgt5W2OSYmbWNNqqI1DyZLcDnkQBw8zM\nSmvbXk/xIIBtFSnNzOqtlQLFOOAPjU7E+osGaDEzs1posdJDtDVEj0D0NETfgmgXiFyVZma11mJ5\nZ3ktdLLRfhA9C9EnIJoC0VkQPR6Ws7TOzKwmWijv7F2LnGz0aYiegWhaan0Bol0hugCiRRDdC9Gp\nEG3RmHSaWYtokbyzMk1+stEgiC6C6EGI3tTLtgMh6oLoRxAtg+gvIcCM75ekmlkrafK8szpNfLLR\nhhDNguiPEI3ufftunx0M0cEQXQHRCohuCFVWu0A0rD7pNauVqADRcIgmQrSlvs/Wz6rKO5u9oTQC\nDgXmA/c1OC1ViKagoUb+AJwGhTfWY18jgPcAB6LZAd+K7ieZi/4m94XnT0GhzoE1GgSMCMvI1POh\nwGvAKyWWV6Gwtr7pA5XM2BjNf7JJeJwAjEfztK8CXk49llhXWB06GwwHNuhlGZ14PiIc67WwvF7h\n41pgHfreryuzJN8voCH6B6F5YbIe0+s6wmfXhGOurfD5YGAMMDY8lnu+DlgBrA7/h8VoVsonEkt4\nXXix1H+zMaIC+h+OL7MMIvt/2Nv/eQ36m6yp4vkbffhtR1SR/7dCoLgH+AKaArUJRNOBy1CAuKQO\n+x+MgsUOaAKo+HEoxcARB48H0d9wdJllg4x1o+geDOLHgSgjTWeqL6MfwhCUqZZa1pIZQHg1fP7V\nCpcC+rHGgSAZEDZE86k/G5ZnwuNzKINMn1f6HEcmltgaNGXvi8BLieellpdRRjIk/F/ix6EZ65Lv\nDURd2geEcxxQZonfj9O3usRj1rq14bPxIJuDKny+GngBBYAVZZ6vgMJrxT9fNAjYAk2HnF7ehP73\nySDyr3Cs9N9saJl1Q0I611a5ROg7kw4EoO/N0sTyXOLx9RL/w1L/3/gxHbgreT4NCn+hOm0XKFZQ\nvBrMsagAnIymgz0cCn/t5+OPB7anGDx2QHezr0OZW9byYon1q8i+0n6976WWqIC++MnAMQLNcljN\nMjTs8Fl6BoNngGW1K7nEVSaF1bXZn/UUxUE/HTwGUrxCT1+xl1r/RvhcRxXLAHRhsbT7Uni5rqdd\nlahQ7xJFs4uAXzU6Eb2LBkN0MUT3QdTZ6NQU+R4NszZVs2ro8eiKM207VMebBxFwbKMTUV40DqJb\nIfoDRKManRozM2oYKH4JTMtYvzdwZa0Osp4iivWFORS9DaJ/QXSOh+UwsxypWaC4p8x7D1a4j+nA\nI8DjqG4+bRxwAzAHeAA4LvHeGOA3wMPAQ8BuGZ/PaffY6M0QfRei5RAd3ejUmJml1CzvfKyP78UG\nAvOATtQyPwdID0MxEzgnPB8HLKc4lemlwPHheQfqfZOWo0ARFSB6V6hieg6isyGa1OhUmZllqNkw\n4/PQ0N1pB6Iuar2ZGvaxAPVIugo4JLXNEtTdkvC4HHVJ2wDYC/hpeG8t6oGTQ9EQiI4D7gUuAq4D\nJkPhdCgsamjSzMxqoKPMe6cA1wJHoGqoArAz8E7g4Ar2PQl4KvH6aeDfUttcDPwZ3WwzCjgyrN8S\n9UW+BHXnvAd1LX2lguP2k2g88KmwzAW+CNwIhXUNTZaZWY2VCxSPob72R6HpRSM0H/UJqE9ybyop\n2pyOqqS6UP/om1Bg6ADeAZwI3AV8G2XEZ2TsY2bi+WzqfuNdtD0KoocBvwb2hUKlbTZmZo3QFZbc\n2Q01VMdm0LNB+zpgj8TrWcAu6E7a+Yn1e6LSTVo/tVFEA8LYSjdDtBiiL6nbq5lZU6pZ3rkKWJlY\nXkK3zv8Y2KiCz3egtoxOdMdtVmP2BcCZ4fkEVD21YXh9G7B1eD4T+EbGMfohUET7QPTPsBztAczM\nrAXUNe/cEPgcqnKpxAHAo6hRe0ZYd0JYQD2drkF1/Pejaq7YjqjaaS7wO/q911O0LUTXQvQEREf4\nLmYzayH9Uhtzb38cpAJ1ONloE4h+CNFSiD6rXk1mZi2l7oFiEPkZ0ruGJxuNgOgr4Sa58yEaW7t9\nm5nlSlV5Z7leT++n5wiDY4EPoDumW0Q0EI0XdRZwO7ALFOaX/4yZWfsoFyjeQ/eoE6Eb4r4N/LGe\nieo/0f7A+Wio8sOgcGeDE2Rm1jJ2bXQCgr7OfbADRH+C6DGIDnVDtZm1mbq1UWwHfA31YCo3YGB/\n6sPJRl+F6BmITgwza5mZtZuaBootUbfW+4C7gWXovoi8qPJkoyEQvQTRhPokx8ysKdQsUPwdlRy+\niKYehO53S+dBtYFiH4j+Xp+kmJk1jZqNHvssusltArmeHKgq+wM3NjoRZmatZAyaE+JGNHzHCnqO\nANtI1ZYo/gnRHr1vZ2bW0urWmD0BOAn4G92HD2+kKk42Gg/RCjdgm5lVFyj62i20E01I1GhRjqa4\nMzNriEL1eXn6ZuqWVk2J4mcQfapuKTEzax5tdY1d4clGhTCPxFb1TY6ZWVNwoMjYbHsNF25mZtRw\nUMDv9XKQz1RzoAZzt1gzsz4qFyjuoRh10o0ezVZs2R/4QaMTYWZm/a+CgBUNg2glRGPqnxwzs6ZQ\ns6qn2HjgVGBbYFjiIPtUl66G2RO4DworGp0QM7NmVG4Ij9gVwCNovKeZ6P6Ju+uXpJpz+4SZWZ39\nMzwmpz+tNFBMR0HmceC0jPfHATcAc4AHgONS7w9E83NfU2L/lVQ9zYVo94pSa2bWHmrezvyP8Hgj\ncDDwDqCSrqYD0dwVnWie7TnAlNQ2M4FzwvNxaAa9ZHXY51CJ5uoSx+jlZKNNIXoBokqq2MzM2kXN\nRo+Nx0T6Ghoc8PPAF4AfA5+tYN9TUaBYAKwBrgIOSW2zBBgdno9GgWJteL0ZcGA4Xl9vNd8P+DMU\n1va6pZmZZSp3pb0IXcn/AngRjRzbVcW+J9F98MCn6Tny7MXAn4HFwCjgyMR7FwL/RTGQ9IXbJ8zM\n1lO5QLEtcDjwZeAy4DcoaPyjzGeSKinanI6qpLqArYCbgB2BacBS1D7R1cs+Ziaezw4LEA0A3g2c\nUVlyzcxaVhfVXej3yUTgFDTr3RPA2RV8ZjfUUB2bQc8G7euA5PwQs4Bdw/6fQjPqLQFeRsEqrUww\ninaE6PEK0mlm1m7qdtP0KOBYYC662u9NBwoqncBgshuzLwDODM8noOqpDVPbTKNPvZ6i/4Lo+xWk\n08ys3dQ0UAxD7Qa/Q1OjXoq6vFbai+gA4FHUqD0jrDshLKCeTteg4HM/cFTGPqbRp15P0U0QpRvP\nzcyshoHiSuA51Dbxfop3ZedJiZONhodhO9anIdzMrFXVLFAci6qb8qxUoPh3iG7r36SYmTWNmo31\ndOl6JqSR3C3WzMyA0iWK+yGa2r9JMTNrGs02VcR6yTjZaCJEyyEa2P/JMTNrCjUfZhx0r0NnYvuI\n7Psa8mB/YBYU3mh0QszMWkElgeLnaIjxOUAy881zoHD7hJlZP3qYvg/KV2+p4lM0AKLnIJrcmOSY\nmTWFmo0eG3sA2LRvael3OwHLofBkoxNiZtYqKql62hh4CLgTeD2si4D31itR68HVTmZmNVZJoJhZ\n70TU0P5o/CgzMzOgWz1bNAKiVRCNbFxy1ttGwLXAOxudEDNraTVvo9gduAtYhWaqWwe8VH266m4a\ncDcUVjU6IX20Hareewy4p8Q2e7N+EzmZmdXFPcBb0CRCA4GPAuc2NEVFyRLFtyE6vXFJWS/vQQMw\nfqTMNgPRxE6rUEA5D00V68BhZtWq+Z3Z8dXtfYl1c2p9kD5KBoqHINqlcUnpsxPJnia2lKGoZHEG\nmkb24Tqly8xaV1WBopL7I25DU4r+GM029wwaWXbHqpNWexFQgGhzVOKZ0IR3ZO+M/qaL+vj5Aag6\nMG1H1LD/EAomD4XlORozzss4IOv+luXAgoz1GwFboGrOpagkVet0F9AIyROA8cAQ4Ek04Vbatuhv\nOjCxDEBzqdyVsf2OwDvC84hi2u9D39W0MWgo/5fD0pfv8QB0Tlmf3QpNHDYqtdyGLjjSjgfeB7yI\n/gfxMgu4I2P74WEZgi5mhobnS8KStgeazbKA5rpZBCxGM1u+1tuJ9qMC2d+7LdB3enBqmYPOI20f\n9D+IvzcDwvMb0Jw9aQcAb0a/7TfC4zr0v/pXn8+mKOSdlamk19Mx6KROBD4LbIbmp8iTdwM3N2GQ\ngNLtEZXKChKgaWTPQxncTmhSqG3RD/2IjO3fhEoq8RcyCo8L0BS4aVOAQ4GxqeV2snvK7V5i/bUU\nZzlM2gM4C1WtTUBf6qXAJcB/Z2y/CcrslwKvhOeboECUVeo6Ef191qGMainKoC4nO1BMAQ5BP9rk\nsozsQDEZ/T0LFH+QBRQEsgLFMWhyr5Eow12DguO5wPkZ238WOBVlTkPC0oH+lmdlbL8nmoRsFbAy\nsazJ2Bb0f1wGbID+B/FSauqB04CT0N8wuVyI/qZpE4Atw/Pd0HTLE4FvAj/I2P6d4TNxQHk1rF9F\nsdt+0hCUEUeJZTCwNvHZpI8B+6HMf6PE8nHgFxnbfwbYF1idWs4hO1BsjS4K0xn/iIxtQd/drekZ\nWO6jNoGiKpVGlOHA5mRHvkaKSxRXATdC4aeNTlDOFdBV6ysZ7+0GfJLilzK+Ov0H8J2M7XdGGc8L\nqWUBUI+5ykegzH8tuupMOxxlkhPQ9/XZsFxGdsYT947LY+eHAroiH4kynxczthmDznM1yihfR5l+\nq44K+lF0YTIRmIQCQQFl2D/P2P4HqOajkFjWoIz/yozt90E3Fi9HAXJ5WFbSmn/TqkoUlXgvChAL\nwuu3U3pq0v4WaZTYaFmofsq7nRqdADMz6hD8/omuXpLF5QdqfZA+itSAHT3U6IT0ogO1FzxG6aKm\nmVl/qfl9FGuAFal1perFs0wHHkHVEadlvD8ONejMQQHouLB+c+AW4MGw/jMl9r8/8Kcq0tPfxgJ/\nBLZH1TsvNzY5Zma191Pgw8D96H6K75Fd55tlIDAPzWUxCAWDKaltZqIGIFDQWI6uwDehWFUzElV/\npT8bQTQbogMqTE9/2waVIi6k8rk/zMzqreYlipPQXcOvo9b/l4BTKtz/VBQoFqCSyVWo50jSEoo3\njY1GgWIt6jIa36+xCvVcmZhxjJ1RF7+8KaBGtnNRD5W1jU2OWV09T/ceRl7ysTxf7p+WF4cDFyde\nH41KJEkDgNmoS9lK1H84rRP1b0+P4xRBNKsWCa2ToY1OgFk/acWeQa2g1P+lqv9XueqQayjdhSqi\nsmHGK0nM6ajk0IVuSLkJ3ay0Mrw/EvgNcDKZXRnft45i//zZYelvo8ke/ypPNw6ZWfvqCkvNPYd6\nOp2KBtybljjYtAr3sRtqqI7NoGeD9nXo5qrYLCAeimMQaqguVdUVQfT2CtNSawVgL9RQ/dcGpcEs\nL1yiyKealCjK6UDVQJehgPE11FZR7T6eQFVH8e3t6QbpCyjemTsBjXu0IcqIL0MNwaVEEFXSzlJL\nA9Agfn9FPbk+jquYzBwo8qnugSJpCOq2ugwNfVCNA1CPpXmoRAFwQlhAPZ2uQWPm3I+GmgANObAO\nBZd7wzI9te9GfDmvCGk5EvXqMjMHirzql0AxFI3r9Gs0ns1X0O3zedGIL+fG1PjWd7MW4ECRT3UP\nFJeju7K/hm4WyyN/Oc3yoRl+i7NRd9HBdT7Oz9DtAJtkrP9qal0nqjlJVqEfBdyNOvQspmc7bjXq\nHijW0X2UyeSSlxnu6vXlHIIa8cfWaf9mrSbvgaITDYb5COq2Xy8jUB45F/hC6r1L6DmybyfdA8Xn\n0GCWh6IBPAcCBwHf6GN6+rWNIq/qcbLT0JfpD/S8IjCzbHnPeM5Ag5l+CbWJgi4IV9C9k87GKKCM\nC69PRVf1TwP/gTL1N5U5zjFoKPB4NIukSyhfotgABZlaTuPgQEFtT3YcGq7kKRTNzaxyec945qHM\n+y1oaPaNw/qfoOr12KdRVQ+o88wS1FNzGBpp4Q3KB4pZKBiNQvNevCPxXm+BYjqqsqplT04HCmp3\nsmPRF+LX6/3fAAASO0lEQVRCSk/MYmal9fJbjKLaLH2yJ8q049/2HIr3Zu2Lgkjsr2gECdCF49cT\n721F+RLFFiiQbB1e/x74duL93gLFh8meDXB9OFBQ25Pdoob7Mms3ec54LkZVybEvUZw2YSCqWpqK\nMu1VFKcCuB5N5hUbQvlA8SXUNhH7MJo5Me5GfzEa+y3pLRTHgctticIjmhYtbHQCzKzmhqF7ngZQ\nvFofgubY2QG1J/wK+BDK1K+hOBXAEjTdQay3ydGOCdvEx+lA06kehNpHFtLzpuUtKc7Y+Hc0+Or7\ngN9WcnJWmb5cxbjkYFZ7eS1RfAiNSL0Zmkp3PBoB4laKc5FPRZn7/WjUhdh0VNrYBk07eymlSxS7\no9LAdqnj/ByNVUd4byXwblTKmIhGvj47sZ/PoZGzDwnHHIRuWnavp/VQ7ckW0JAi7s1kVlt5zXiu\nB76Zsf4IFATiap7H0cgT6VqWL6Ig8jSqhlpH9k3H/4tuTE7bFbWPjAmvD0b3SKxA0y98A5Vwko5C\nNzivCse+Bo2b1xcOFFR/sm9Cf3jfWW1WW+2Q8UxB7Qn9Pb7c+qhJoGimE66FfYA/0x5fajNbf+9D\nV/xj0dX/1VQ3FXRLaNdAYWZWiU+gO6XnoTaITzU2OdYX1ZQMCqiRaMs6pcWsnbmUnk9uo6C6k92Y\n7pMomVnttFXG00QcKGizkzXLMf8W88mN2WZmVn8OFGZmVpYDhZmZleVAYWZmZdU7UExHkwA9DpyW\n8f441BNpDvAAcFwVn63GhymOP29m7WUBGk68PwxB0xUsQtOufp/uw4LMRkN6xLOFPpx4b3PgH2hs\nqvPp7nq6z23RMgaim1Q60cBWc9At8EkzgXPC83HoD9RR4Wehspb7QcCLaBRHM6uPPPd6mo9utu0P\nZ6IBB8egPO3vKJ+L3QIcX+KzFwEnAKNR/rdzWP8B4H/6mJ7c93qaik52Abqj8So0ImLSEvRHITwu\nR2OpVPLZSu0C/Cvs28wsNgRNLLQoLBcCg8N7twKHhed7oGE7Dgyv96U4n0XawcD30KB/y4Dv0jMw\nlBprrhONHPESGhRwS5QvngacXtkp1Uc9A8UkiuOsg0ZfTI+6eDEaencxmvDj5Co+W6l3oShuZpb0\nJXRRumNYpgJfDu/NBrrC82noYnPvxOvZZfabDAQD0BDnyZkzzwGeA/4S9hV7ANgflUZ2Bh5CM+Jd\niIJHw9QzUFRStDkdVStNBHZC9XnVTkU6M7F0Zbzv8Z3MGm8myhPSy8wqti+1bV8dBZyFrvyXAf8N\nfCS8dxvFTHwvlLnHr6ehEkeWG9AF7zg0ncFnUNqHh/dPQyWFicCP0BDi8fwW54RjzUZ54RBge+Ba\n4MpwzE/37VTpontemRu70X3IjBn0bJS+DhXrYrNQVVEln4Xeg9FQ1GA0upftzGz9NGMbxSt0b/vc\nBs0wB8rYX0UTEC1BbZ1Po7bOV4ANSxxrKKp6ehpVn38xsc8s1wMnZqwfANwObAucF/YzCE2utE2Z\n/aXlvo3ibjQfbCeq9/sAGqI36RFgv/B8AvBWVMSr5LOV6AD+kwYX28wslxajPCa2RVgHCgb3AKeg\nzHkN8Dfg8ygAPF9in68BJ6HqpjeH7e7uQ9o+gRrCHwLeFvaxJqRl+z7sL9cOAB5Ff9gZYd0JYQEV\nz65B7RP3o6Jguc+m5fkqxqyd5Pm3OB91tx+aWDpQ/f9fUT40DrUZnJX43NdRj8kvhdfxRef3yhxr\nYlgKqGZkIcWL4Q2Af08c/8NoFrs3p/YxHs3lHVdXfR9NlzoSeIzqusl6UEDa7GTNcizPv8X5qNdS\ncjkLtQF8B5UiFqMeUIMTn9sfeAO1G4Cu7N9A06iWslc43svoHokPJd4bB9yJgs0LqISSdX/HpcD7\nE683Q/dXPE/P+yt640BBm52sWY75t5hPuW+jMDOzFuBAYWZmZbVyoPgd6jllZmZtrFQ921h0/8SQ\nfkyLWTtzG0U+uY2ijL1RH+RyN7qYmVkFWjVQeNgOMzMDShef7kcDfJlZ/3DVUz75PgqyT3ZDdGNK\nR8Z7ZlYfbZXxNBEHCkqf7Ih+TYWZtVXG00QcKGizkzXLsTz/FtdRHMo7NhO4vI/7mwT8AU2G9hTF\nseuSx1tFcbrTHyXe2xcN8bEEDXYaG4MGIaz1Ra4DBW12smY5luffYlagOJO+B4pbgAvQlM07oIDR\nlTreliU+ex8aOjz+XDzJ0f8Ch/cxPeW4e6yZWR8lZ6HrQvNHzEAzz82n+0jWSSPRxEVnowEC7wN+\nQ8/pTkvlrSPQ0OH3AavR/BZTgclhP7nkBl8zM82HsxEaInx3NKna3WhY76RC6hEUFN6W2u62sP5v\nwOeAJ8P6pag0UUCBZgUatfaYWpxEvbRaiWIbfDe2WR7NJH9ToaZ9BU0OdBvwR+DIjG1WojksvoLy\nmncAhwHDEtvsjUoI26Dhy69F1VQAn0RDm/8ATbv6n8BNaO6JP6H7v/bGaipZz1ZAk4S8tUFpMWtn\neW6jWEPPfOHrwE/D8y50pZ90HnBRif1tgSZcW4pGgPgOcHOJbQeg4LJdxnubAveiiYzuRBMdTaZY\n+qgFt1GkbIWidrqoaGbtbSE9G5e3BBYkXo+lOKMcKMNeVGZ/70Ez0e0ObAzcUWLbrKqq2IVo9rzX\nKE53+iSaG3tcif1ZHySj4ifoey8GM1s/eS5RnI2mOZ2ELo73Q7PMbRve70Kljm+iTHov1L116xL7\n2wYYhWbDOxo1gG8U3tsW2AldtI5EpY2HKVY9xd6NRriOPYimSd2O7r2h1ldTdI+dDjwCPA6clvH+\nF1DR61407MZa1J8Y1APhwbD+SrLbHpIn+wvgozVJtZlVK1cZT8pQVJU0HzUe3w0cnHi/C90PcTrK\n9Beg+axLORlVO61C7RnJOazfhfK8VcCzKBhslfr8EJTnbZ5Yt09I3yKy20b6KveBYiAwD+hEUXoO\nMKXM9gdTrOfrBP5FMTj8Ejg24zPxyRbQP6VzPdJrZn2Xm4ynD7pQoGhFuW+jmIoCxQJUrLsKOKTM\n9kehUgGoWLgG1Rl2hMdS9YUAo9GdkgvWJ8FmZtZTPQPFJLpH6afDuizDUf3cb8Pr54FvoUajxai4\nWKpXAcCLqI3CzKwvmrlEVHf1DBTV/OHfgxqbVoTXWwGnoKqkiahRqFydoZlZX81GXV6thHremb2I\n7o01m6NSRZYPUqx2AtgF3dG4PLz+HfBO4IqMz85MPJ8dFjMzK+qi+3hUudEBPIFKBYMp3Zi9AQoI\nyTsbdwQeCOsKwKXApzM+6+KiWT74t5hPue/1BHAA8Chq1J4R1p1A92F5j0XdX9NOpdg99lLUcyot\nVydr1sb8W8ynpggU9RahezEmNzohZm2urTKeJuJAgU72JYp3RZpZY7RVxtNEHCjQyd7b6ESYWXtl\nPE0k9zfc9Zc/NzoBZpZbC9D0o81sJg0ex86BwsxaWTyXRa3MpPJM+0jUzf9lNH1qXzW8tNYKM9zd\n3ugEmJllWI7m1p6CBv1rWq1Qonip0Qkws1ybirraP48mK4oHGz0Y3d/1Apq1bvvEZ05DNwi/hEaD\n3QeNhj0D+ACajKi39tFZaB7sJRnvDQV+DiwLx78TzW8Bmivj1nDsG+k5N8VuqKTyQkj/tF7S0fYa\nXiQzM6CX32IEUS2WPqRrAXAfGmduLBoq6KvA29GI07uim3qPQcN8D0Kz4S0ENgn72AJ4U3h+JnBZ\nlWn4D3pWPZ0AXI0CRiGkZ1R47+/A+RTnxngpccxJKLhMD6/3C69LTXTkxmwzaw4FKNRi6cOhI+B/\n0JBCL6ApUD8EfBz4IXBX2OYy4HU0Y91aVOrYDmXWC9G0B+FUajKp0GrUrf8tFHtvrkRBaReK83ff\njqZdjR0NXAfcEF7fjObXOLAGaSrJgcLMWl1yFOuFaKDRycDnUfCIl83QPNZPoEFJZ6JSxy/C+lq6\nHPgTmn5hEfAN1GY8MaTl1cS2T1IMTpOBI1Lp3oNi6acuHCjMrNVtkXq+GAWMr6PqqHgZiSZJAwWH\nvVDGHKGMHPpW/ZX1mbXAWajU8k7UXnJMSFvW/N3xPhaiIJNM9yg0g1/dOFCYWSsroAFFJwEbAl9C\nV/E/Bj6JGroLwAjgIBQstkaN10NQddRrwBthf8+ggU4rqX4agNogBoXnQyiOWdeFGs8HoiqnNeEY\nC1FV0n+Hbfek+7StP0fTMuwfPjs07KvUXD+GG7PN8iKvv8X5qAfTg6ia5hKUuYImS7szrF+MShMj\nUQZ+B2pEXo4aneOqnQ1Ru8HzKEMv5zhgXWr5aXjvgxTn1n4G+DbFC/ct0VzcK1Gvp+/SvQF9KppO\nYTmau/sauk/pkOQhPGizkzXLMf8W88m9nszMrP4cKMzM+m4VqiJKL3s0MlHWnYu7Zvng32I+uerJ\nzMzqz4HCzMzKaoXRY82s8V7A1U959EKjE1CJ6aiv8OOoL3PaF9AYJ/cC96O7FceE98agkRcfBh5C\nIyam+YtpZla93OSdA4F56C7GQWg43Clltj8YDXAVuxQ4PjzvADbI+ExuTjYHuhqdgBzpanQCcqSr\n0QnIka5GJyBHctOYPRUFigXo9vSrgEPKbH8UGl8FFBT2ongX41rgxbqksnV0NToBOdLV6ATkSFej\nE5AjXY1OQLOqZ6CYRPdRG5+m9Hgkw9Ht9L8Nr7cEnkO32/8TuJjug2SZmVk/qWegqKZo8x40ociK\n8LoDeAdwUXh8GfhiTVNnZmYNtxvFyTVAUwhmNWgD/B8aJCu2CRrMK7YncG3G5+ZRnDzdixcvXrxU\ntswjJzrQBCCdwGBKN2ZvgEZBHJZafxsa7hc0gcg3MDOzlnMA8CiKXjPCuhPCEjsWuDLjszuiaQrn\nAr8ju9eTmZmZmZnZ+pmJelTFN+5Nb2hqGqO3GxvbyQLgPvRduLOxSel3P0VzPN+fWLchcBPwGJoE\nZ0zG51pR1t9iJu2ZV2wO3IImb3oA+ExY31bfjTOBzzU6EQ1U7Y2NrW4++gG0o72At9M9czwPODU8\nPw04t78T1SBZf4t2zSs2AXYKz0ei5oApVPHdaJVBASuZv7ZVVXtjYzto1+/D7fQc2+e9aJQDwuOh\n/Zqixsn6W0B7fjeeQReQoPkzHkb3tFX83WiVQHESavT+CS1efMpQzY2N7SBCQ8HcDXy8wWnJgwmo\nCobwOKGBacmDds4rQDUPb0dzglf83WiWQHETKkKml/cC/4vu5N4JWAJ8q0FpbJSo0QnImT3QD+EA\n4NOoCsIk7kPfrto9rxiJRr84Gc3Cl1T2u9Esw4y/u8LtfgxcU8+E5NAi1FgV2xyVKtrVkvD4HLqR\ncyqqhmhXz6I66meATYGljU1OQyXPvd3yikEoSFwO/D6sq/i70SwlinI2TTx/H90br9rB3cBbKN7Y\n+AHg6kYmqIGGA6PC8xHA/rTf9yHtanSvEuHx92W2bXXtmlcUUFXbQ8C3E+vb6rtxGeoOORedaDvW\nwWbd2NiOtkSNdnNQN8B2+1v8AlgMrEbtVh9FPcBupk26QCak/xbH0755xZ7AOvS7SHYNbtfvhpmZ\nmZmZmZmZmZmZmZmZmZmZmZmZmVm7WrUenz0R3beyjp6j1X4XDfk+Fw0nUo0j0I1RszLe2xq4DvV7\nvwf4JTC+yv2bmVkV0uPdVGMnYDI9hzU/EGXmAP8G/KPK/d4AvDNj/VAUIA5KrJsGbFfl/s3MrApx\noCgA30RDOtwHHBnWDwAuQsMy3wj8EXh/ah/pQPEDNIxK7BGy7wD+UDjW/RTnATgjpOkRNFdA0vHA\nz3o/JTMzq6U4ULwfBYICqsp5Eg2WdjgKDqDM/nngsNQ+0oHiGrqXCG4Gdk59ZmI4xkZo4qlZFOcQ\nuQV4R0Zav4WGyDbrd60wKKDZ+toTuBINs7wUuBXYFQ1Z/quwzbMoE69EenKc9PDNu4Z9LQfeAK4A\n9i7z+d7Wm9WVA4WZMvJaZc7pYd83C+vKHa9A92CSNS/Ag/QsmZj1CwcKM81X8QH0e9gYXd3fAfwV\nVUsVUNVTV4nPJzP9q4FjwvPdgBUUZxGL3YUaouOqpw+iUkw5V6IqrQMT6/bGjdlmZnX1UuL5eRQb\ns48I6wpoVrS4MfsmYN/w3mfQ8NWrUYnhR4l9/Q/qOjuX7PYGUHCIG7PPSawv1UYB8FbgetT76UEU\nPDYuc35mZtYPRoTHjVDm7/sWrO00y1SoZo1yLZrQZTBwFu09laiZmZmZmZmZmZmZmZmZmZmZmZmZ\nmZlZs/t//RFhp9En1ZEAAAAASUVORK5CYII=\n",
"text/plain": [
" So now we have 2 ways to select our regularization:\n",
"\n",
"
\n",
"This latter decision criteria is more conservative and accounts for the fact that we have variance in our cross-validated estimates of AUC. We argue that any $C$ where $AUC_{xval}^C>max(AUC_{xval})-stderr(max(AUC_{xval})$ is statistically equivalent to the max. Therefore we take the most conservative, least complex model option.
\n",
"\n",
"Now that we have selected a model, let's retrain on the full training set and evaluate on the test. We'll bootstrap the testing estimation so we can get a sense of the variance.\n",
"
Now let's look at the distribution of AUC across the bootstrapped samples. Even though we can't use the test data for model selection, we can at least look at the test results for models built with the 2 selection criteria discussed above.\n", "\n", "
\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "