{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Regression\n", "\n", "In this section of the tutorial, we will investigate the use of linear regression in `sklearn`. As for all models in the `sklearn` framework, linear regression models mainly rely on `fit(X, y)` and `predict(X)` methods. Once fitted, linear coefficients of the model can be accessed _via_ the `coef_` property. $R^2$ can be computed using the `.score(X, y)` method.\n", "\n", "More information about the use of linear regression in `sklearn` can be found at: .\n", "\n", "To begin with, let us import libraries we need and define a function to plot a linear regression in 1D." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "from sklearn.linear_model import LinearRegression, Lasso\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "def plot_regression(regressor, data, y):\n", " plt.scatter(data[:, 0], y, s=40)\n", " x_left, x_right = data[:,0].min() - .5, data[:,0].max() + .5\n", " y_left, y_right = regressor.predict([[x_left], [x_right]])\n", " plt.plot([x_left, x_right], [y_left, y_right], color=\"r\")\n", " plt.xlim(data[:,0].min() - .5, data[:,0].max() + .5)\n", " plt.ylim(y.min() - .5, y.max() + .5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us generate some data for which `y` is the sum of a component that is linear in `X` and some Gaussian noise:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAD7CAYAAAChScXIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl001X6x/H3TUpbCCIojqyCjgMuICBLca+jAlIrokIt\nLSCKoKD+XHAUZVUQ18FBi6CCiuyoIJkKFRk6boiAIIg7o4hUUIZhsaUF8r2/P9KWrizdkjSf1zmc\nkybfJPd70Ifbe5/7PMZai4iIhAZXoAcgIiLHTkFbRCSEKGiLiIQQBW0RkRCioC0iEkIUtEVEQkhE\nZX+BMUY5hSIiZWCtNUWfq5KZtrU2pP6MHj064GPQfeuedc8hft8//4y97DL/n59/Pu73l0bLIyIi\nFW3+fGjfHrp1g+XLoWnTCvvoSl8eEREJG/v2wd13w8cfQ2oqdOxY4V+hmXYJYmNjAz2EgAjH+9Y9\nh49Kv+9Vq6BdO3C74fPPKyVgA5gjrZ1UyBcYYyv7O0REAsbngyeegEmTYPJkuOGGCvlYYwy2hI3I\nci2PGGOigX8DUUAk8I61dnh5PlNEJGRs2QJ9+0JEBKxdC02aVPpXlmt5xFqbDVxurW0LnAdcboy5\nuEJGJiISzObO9S+BXHMNLFtWJQEbKmAj0lqblfswEnADu8r7mSIiQWvvXrjzTv8a9pIl/iyRKlTu\njUhjjMsYsx7YAayw1n5V/mGJiAShlSuhbVuoWdO/2VjFARsqZqbtAG2NMScCacaYWGttesFrxowZ\nk/84NjY2bHevRSREHToEjz8OKSkwZQr07FnhX5Genk56evpRr6vQ7BFjzEhgv7X2mQLPKXtERELX\nTz9BcjJER8OMGdCoUZV8bWnZI+VaHjHG1DfG1M19XBO4ClhXns8UEQkkx3Hwer0kJAzg+c6XktOm\nDc5118F771VZwD6S8i6PNAReN8a48P8D8Ia1dnn5hyUiUvUcx+HGG/vyadoGnsqqSXt+5vLoU2m4\ncj0LCI7TiOUK2tbajcD5FTQWEZEq4zgOqampzJz5NgDJydfjOA57l3zGx9kHWcrFtCed/dkuPGkx\npKamEh8fH+BR60SkiIShvBn1e+9tIjNzEAB1ak1hjDuDxH0HGMRMvFxb4B0pJCSsZe7c6VU2xko5\nESkiEopSU1NzA/anQDSn8x9mZb3OPrOfdtzA9kIBO7gEwxKNiEiVmjnz7dwZdhTJvMEqYphHIt3s\nE/zufh/ILnB1Nh7PSyQlVXyaX1lopi0iYelEsniRPpzHBq7kfTbQBkihQYO67N4dk79s4vG8RNeu\nrYmLiwvsgHMpaItI2Lm7XUueWvAwXnsLHVhDNjXJm1GnpEzA5XIxa9ZCAJKSxhEXF4fLFRwLE9qI\nFJHwcfAgPPoo9pVXmHB6Sx7f8L9iM+oFC2YERYAubSNSQVtEQlZJaXulzoo3b4akJKhXD159FedP\nfyI1NbXAjLpncM2oFbRFpDopKW3P45lafLZsLbz+OjzwAIwc6a/QFySB+UgUtEWkWvF6vSQmjsxP\n2/PLplatTtx3Xw++++4XPAdyeGznVhrt3o2ZPRtatw7kkI+L8rRFpFo5nLYXXeDZSLKyPDzxxAIu\nPHQ1b/A670S4+DDuSmade261yHGuDvcgIpIrlQj2MPbQtcxhHoOZxdBDv+B9/2tSU1MDPbgKoaAt\nIiEpOfl6PJ6pFDwIcybT+IRM2rCJtqxnKVcD0WRmDsrfcAx1CtoiEpLi4uLo2rU1Hk8M8AIDSOYT\nvLxGZ67hn/zOnwI9xEqhjUgRCVmO4/De3LnUe/BhGuzby+KbejN85koyM1dRcHPS44lhzpxxQVGl\n71gpe0REQsYx51+vWAH9+sGNN8KECTiRkfTq1Y+0tI1Be2jmWCloi0hIOKb86wMHYNQoeOMNmD4d\nunYt9P5gPjRzrBS0RSQklJZ/nb/E0aIF9OkDjRvDtGlwyimBHG6lUZ62iATUsS55lJx/HU1m5m38\n+ugT8NN38OijcPvtYIrFtGpPQVtEKl3xJQ+Hd965m/r1/8aFF8bQt+8NR1zCOIn/8jKvcP4PW+GT\nj+Dss6v2BoJIaC3yiEhIKtwp5nZgJTk5J7Jt210sWNCJxMQR3HhjX9555x0yMrbidk8iL//6Ct7n\nC85ja8Q2Nk17OawDNmhNW0SqQELCAObP7wgMAbzASKDwmrXb3YaIiAPk5NwPzCKS3xlHcxJZyx3R\n9YjsfmHIZYCUR2lr2uFx9yISJBzgBaD4mrXPdzc5OZcCd9KSaazEoQUfMbhTW9oOS8LtjiAx8Va8\nXi+O4wRk9MFAQVtEKo3jOHi9XjIyMnC7xwGXARuP8A4Xg5jKh1zGFB7iOp7hi23bmTjRy4IFnZg/\nvyOJiSPo1atf2AZuLY+ISKUoKd8aJgINgT1A4VOL9TmPV6hHUw7Sh9l8y1lACm734/h8mwn1E47H\nq1KWR4wxTY0xK4wxm4wxXxpj7i7P54lI9VF483FI7p+NwG6gPhADpAApdDV/Zj2b+YaL6MynuQE7\nm4iI5/H5rqV4+l/1KQB1vMqb8ncQuNdau94YUxtYa4xZZq39ugLGJiIhrLR8axgMrAHuIZI3mcD7\nJNX4LymdYvn7uuUczHwZ8B8/r1vXsG3buVU/+CBWrqBtrd0ObM99/Icx5mugEaCgLRLmjrwsajib\nPzOHdWxx/8G66dMYk5hIx0LHz8fhOA5JSaPIzBxI4eWRl0hKGlfZtxCUKmxN2xjTHPg3cK619o8C\nz2tNWyTMOI7DRRddzqefZuBfEjkccKEVd9CQsaznEeNhZ4+/8uZbM0tM5XMcp9oUgDpelXqMPXdp\n5E3g/woG7DxjxozJfxwbG0tsbGxFfK2IBKnU1FQ2bNgNdMC/du0PuKcwmWnspiG7uYgr2WxWM/zc\nM0r9HJfLxYIFM4oUgBoXkgWgjiY9PZ309PSjXlfumbYxpgbwT2CJtfa5El7XTFskzBw+THM7kAo8\nRFcM0/mV17mN0TzKQSLxZ4eMp0ePv1b7mfPxqqzsEQNMA74qKWCLSPWSl3edkDCAhIQBx3DQxUUU\nVzERy0tkkMSbPMwTuQHbz+e7irS0jdWmh2NlK9dM2xhzMfABsAHI+6Dh1tqlBa7RTFukGjimOte5\n/OVVR9A8czqzGcC3nMBgdvM/1lJ4fTsGGAf8TELCWubOnV7FdxW8KmWmba39yFrrsta2tda2y/2z\n9OjvFJFQU1LedWbmqhJnyXHdu/PcmZGkE8NEzqM3Cex178KY1uTlZvsDdmsgrqpvJaRpAUlEjknp\nda6LHHTZsQNXfDy3Rhq+mJLC/oRIEhLW89ZbUxg+vDdu93jgM/wz7BnAgdwUvp5VeTshS/W0RaRC\nOI7D6jFjaPH0M6w4/S9Ejh9D9x49uGLw4Pxr4uPj+eabLaSlrSczsxPwYn4KX1ycZtzHQrVHROSY\n5K1Tl9jpfPojuIePpNWPP5Jsh/Ihfyl1vbu69HCsbOoRKSLlUtpBl0GdmzDm+y9Ztm0PA31fs5uG\nue8Ij8JOlUVBW0TKreAs2VjLI3UiOHfRIiafcRZ3fnYTMLTIO1KUFVJGaoIgIuXmcrmIj49n7sTx\nzNnzK602bsR8+ikfND8TCL8mu4GgoC0ix8frhfPPh06d4MMP4c9/Jjn5ejyeqeT1dfTLVlZIJdDy\niIgcm6wsGDYMliyBN96Aiy/OfymcCztVFq1pi8gR5a1Xz5z5NgDJydcfzupYvx769IG2bWHyZKhb\nt9T3KyukYihoi0ipSjui3q1LK+ZfdD6uJ56AiRMhOTnAIw0flVqaVURCl+M4jB49msWL0/H5rgCa\nAnHUyezGHe+0YffXn3PSZ5/B6acHeqiCZtoiYS1vhr148Vp8vrwWr1O5lrpM5VtepD3f9zqF2fNf\nC+Qww5Jm2iJSTF4RKJ9vPRBNTbL4O+vowgyu5zFW4iHBtTbQw5QCtEMgEsYKFoFqx+espT21yKEd\n41nJJqXsBSEFbZEwZ3AYxtMspRuPMor+zGAvNXG7l6mQUxDSmrZIGFv22mvUuPUu3M559GUWW2gO\nZBMR0ZaHHurF2LFjlbIXIEr5E5HCFi7E3n47c09uwO1bLHuz/CVUdSgmOChoi1RzpR2OAQo93//G\n7lydloZZsQJmzcLp1EmHYoKQgrZINVba4ZguXVrhOA5Ll35KTk4T2rOP2WYTvzRtSuwXn+Mq4WSj\nBAcFbZFqzN+gYGRu/8bDDQqio9tz4EAGOKcxjNO5n+XcxUm85T7AW29NoUePHoEcthyBSrOKVGOl\n9W/Mzh5CQ6cG71OXOHbRgU3M51t8vno888zzgRqulIOCtkg1UNpvszewjrXsZhlduZwVbOU0/IF9\nKFu2bK/SMUrF0IlIkWqgZcumwETgFiAaD38wiaFcwhziac9qHi72nmbNmlT1MKUCaKYtUg18++1W\noD4QQ0ceYB1nYPHSjrasZjNFmxO43ZMYNuyOwAxWyqXcQdsYM90Ys8MYs7EiBiQipXMcB6/XS0LC\nABISBuD1enEcB2MMLhIZznl4eYGHOZeBvE4mfWnc+GSiozsAKUAK0dEduPbaDmq2G6IqYnnkVeB5\nYEYFfJaIlKKktL7U1BF07TqPwVfHcteb93DQaU8HvuMXmpLXDT0l5QkAnnnmRbZs2UqzZg3p3793\nAO9EyqNCUv6MMc0Br7W2dQmvKeVPpIwKHpjJyMhg9ervyMnZBNTKvSKbvlEteTl6DwuanM6QH33s\nK3Kycd681+jdu3+xHG6degxuKs0qEmJKmlnDC8AgYAa1yeR57uLCnP2MvfRSxi1dxImFTjaOIy4u\nLr/8asEc7szMW0hLiyE1NVXLJCFGQVskSJUUbP3ZITHE8CwzmcoKLud8HuKak77E5XIRHx9fLAiX\nlsOdmTmIWbMWKmiHmCoJ2mPGjMl/HBsbS2xsbFV8rUhIKynYuqjBwzTlTkZxB7NYSHc8nhiSksYF\nbqBSIdLT00lPTz/qdVrTFglSCQkDmD+/IzAEgGb8xBv05QC/0Y8TyaD/MVXk8x9xH0Fm5ioKHnH3\neGKYM2ecZtpBqtKOsRtj5gCfAC2MMVuNMQPK+5ki4q/S5/FMBbK5iTl8RicWczXxURGccXEdEhLW\nMmfOuKNuJsbFxdG1a2s8nhjy0v48nhg1OAhRKhglEgCllVEtGHwdx6HfdYnEvfse7XxR9OFmvvMs\nKVPWR973qfxq6FCVP5EgUVJWSK1aU2jT5iSaNGmOMcYfxOvVw/Trx5aWLRntqU9ORA0F2zCioC0S\nJBYuXEjv3rdz6NDJgAGuA34A1gH34MbH2BrjGeLaw4mzZ+G6/vqAjlcCQ3naIkHg0KFD9OkzmEOH\n6gJ35j47EdgD/ERzdjCTZLIOnkvHmr8ysUYNtE0oBel3LJEq4jgOSUlJZGefCGzAnxUyBNgI1CeJ\nEXxGJ97iBrqyjM37h+avQYvk0UxbpArkrWMvXPhvYBQFc6/rkMNkPLRjOleRzhe0Ddg4Jfhppi1S\ngUqrwpd3uhG6FLr+Ij5iPW3ZQ006EFcgYGfj8bxEUlLPKr8HCW7aiBSpIMWzQhyiop6lfv1ojDH8\n8sudQFNgBG4+ZhRPMYiXuI0U/smD+GteDwcgOvpFundvq4JOYUwbkSKVrHCtkEigLzk5J7Jt2yBg\nTu5VcZzBK8zkVPZyGu24m+08iP+X3quBF4F6dOjQQAFbSqSgLVJBCtcK8QKbgLxiT02BR+hLFM/y\nCeNJZBIHsEwCLgOmARcBE4Cfadx4rQK2lEhBW6SCFF4GfBt/CVX/huOJXMQU9tKaIVzB/WykMfAc\ncDpwMf6A3Rq4Ao/nAhWAklLpn3KRCnK4uW52oecv4QO+oB2/cw0d6Mvupovp3XsNDz/cm0aNduN2\nTwAuAGLweC5QTRA5Im1EilSQ3r1vZsGCb4Es4AIi+Dej6cEtzOA2XuZdrihWWU81QaQ02ogUqWTG\nGCAJaMafeY3Z/MROUmjHw/zGTyVW1iutcYFIaRS0RY6iaEW+Pn2uA2D27EXA4Qp9ycnXk/rPR+iV\ndSdP8QGP8iQv0AiX6y7qnxTNWWf9mX79egXsPqR60PKIyBGUlHvtdo8D6uLz3Q0crtDX4pQG3LDs\nXU7PzuYmez+baILbPQnYi8/3COBSQ105ZqryJ1IG/q4vIwv0afQCIzmcyucAfbiMj5hBFgtpw+io\nHdSub2ne/DTWrCnePV0dY+RYVFrnGpHqrHifxsKpfDV4m8dZymx2MZj23MN97MlZw+7dkRjjIyfn\nAQ4HbCjYUFekLLSmLVJGf+EbZtGXHUTQltH8zgnACGAemZkD2bJlSqCHKNWQZtoiR1CwT6PfdcBz\n3MIUPiaGV6lDPL/xOw/iL7O6Cn+p1U00a9a0yHtBhaCkvDTTFimF4zg4jkPdugfIzj4Dn+9aTiKL\nl9jCmdxLLGfzFQOBmgXeFQ0Mwu0ez7BhLzJjxgLS0mLy24rldU/X4RkpKwVtCVulNdcF/wbk0KEP\nsn07+VkiV7om8KrNYL69lD7cxQFGlvrZDRrUzc+/Lnx4ZpwOz0i5KHtEwlJJzXU9nql06dIKay1L\nlqwkJ6cWsJYauBjHCJKYyUCXm6XONfir8Xnxr2Gv4vBGZTbR0R2YO3c8PXr0CMStSTWhlD+RAoqn\n8kFewLU2k5ycS4DOtOSvzCKJX2jCQF5hJ/Nxu8fj8/0Hf/nVfvjXsAsvfygPW8pLKX8iBRRP5QOI\nJjv7DnJymgIubuMjPuQSXuY2rmMROzkF8C99eDwx+GfbMURF7aVx4xR6917DnDnjFLClUmlNW8KO\n4zhkZGSU+vrJHORlvqU567mUlXxTpAVYSsoEXC5XgXXqSVqnlipT7uURY0w3/IWB3cAr1toni7yu\n5REJGnlr2e+++wk5OTWBzym4PHKVOZPpdjtzGcoj/MYBviJv6UMtwKQqVcqatjHGDXwLXAlsA1YD\nidbarwtco6AtQcFxHEaPHs2ECdPx+f6K/z/ZH4EmROIwng3cRA63GA//cv0Jn+9OYBNu92IaNKhL\nSsoE4uPjFbClSlRWadZOwA/W2p9yv2Qu0AP4+khvEqlqjuNwww3JLF78AY5zBv5g/TVwMmdxObN5\nlZ8wtOFadtnXiIpozQUXpNK4cSOSkqZq+UOCRnn/K2wMbC3w8y+5z4kEFa/Xi9e7Asc5GWgF7ALq\ncTt38gEvMplRXM8OdvE9sIKcnGE0btyIuXOna3YtQaW8M+1jWvcYM2ZM/uPY2FhiY2PL+bUix+eZ\nZ57H56sHnA2spD51mEYWjXmdi/mI72iZe+UgYCHQPmBjlfCUnp5Oenr6Ua8r75p2Z2CMtbZb7s/D\nAafgZqTWtCUYNG16Lr/8cimwkqsYx6skMJOLGMk/OUhkgStTgM/weNarfKoEVGXlaa8B/mKMaW6M\niQQSgMXl/EyRI3IcB6/XS0LCABISBuD1enEc54jvad68CVFs4O+cxDTuoC/DeYgdHKTg+7KBFKKi\nPlB9EAla5VoesdYeMsbcCaThT/mbVjBzRKSilXT8PDV1BF27zjtiKt7Y3tdw8kf38QOtaMt6dlEP\n+AaIIS+lLyLieU491ZCS8pzWsSVo6Ri7BL2ChZ0yMjJYvbp4N5jo6A506NCARo2a5hd+crlcYC1M\nnowdM4YJderxyH8c4EsOd51ZiNt9FxdccB5/+9tQZYlI0FDtEQlJJc2s4QWgHdAbWJT7XBSwCUg8\n3Icx5RlcAwfCjh0waxaHzjiDSy7pypo12zh06C5AtUIkeCloS0gqubBTFnAGUB9/4wHwB3KLP3Af\n4Lqos5nj2Uv04MEwdizUqAEcnrUfPoLeU7NrCUoK2hKSEhIGMH9+Rw4HZ/CXRH0IWEvBI+jQgSjG\n8hT/5jpm8urlnRn9r3ereMQiFUNV/qQaeRsYStEKfa24ltUMpAHbacODvH/wwHFlmIiEAgVtCWrF\nezQC+IpcZbmLSfyL53mWViTwKvvcE1mzZjvz53dk/vyOJCaOoFevfgrcEvK0PCJBzXEcevXqx9Kl\nG8jKuhD/mvVX+GfZ3/Mn9vEaN3MSO0niv2ymC1FRaRw6FIXPt56CyyceT4wOzEjI0PKIhJy8TUOX\ny0109B5crhVAIvAYUJvuNGY9LViL5WL+IKNmJr1759CxY4vcvo6Fl08yMwflb0CKhCo1QZCgVDjV\n7wKgNnkbj9Hs52k2EM90EmjKh2wFIujcvjVz5kwjMfHWwA5epBIpaEtQKNoZvUWLJqSlfUlW1ihg\nFHASsIzzaMJsktlIK9rQhj3sw78pCWvWTKZXr37069eL1NRRZGbeQuHlkZdIShpX9TcnUoG0pi0B\nV9IBGpfrMRynObAfGITB4f94nIfZxX1MZSb1gOEUTfvzeGKYNetRZsxYQFraxgKd1nWIRkJLZTVB\nECmzvMJPw4eP4ptv9mPtBvICsOMsAH4D1tKA//EaN1OHpsRQhx85CVhASWl/mZmDmDPnHRYsmFHk\nEM04HaKRakFBWwIir5OM17sCnw9gJIUDcCQwlHje4yUGMZXBPMZIfEwFnsLt3ozP17nUz3e5XMTH\nxytTRKodTTskIFJTU1myZCU+36nAVcVer8kpTGYO/+D/uIG3GMNYfLlzjJo1f2D48FtLyN/OW7fu\nWSX3IBIImmlLQMyc+TY5OU3wp/A1BUYA/o3DNqxnNv9mHbtpyw/s5dTcd2UDz3H//QMZO3YsX331\nI2lpMcXWrVUHW6ozbURKQPhrivyAP2jfDvTDsIF7OZOHWMZ9xsO8iIMcPHgScE/uu56jUSMXW7Zs\nIiIiQsWfpFpTwSgJmKLpfMnJ1+M4DgkJ95CTUwdYRUN28jrXUIsM+hlLjRansmWLITu7Cf4TkH/Q\nsmVDNmz4nMjIyCN9nUi1oKAtAVFSOp/HM5UuXVphrcXrXcE1vgimsIfJXMqTru84v1MTNmz4L1lZ\nq9AxdAlXOsYuAeH1elmy5IvcethDgCFkZq7ivfe+5NabevDDFecxOWont/+pPisuPsT8t5+madPT\nyMoajI6hixSnjUipNI7jMGTIQ2Rn30XRANwi82ra3TaIxtf1gN92sKhOnfxXZ89eVOyzRMRPM22p\ncHmHZi67rAsZGbsKvWZweICnWEoKLzdsBjNmQIGADaWVY1U6nwhoTVvKqKTNxbxUu8Nr2CcArYBP\ngFU0Ziev059IsknmF/44ycfvv/9cLNsjrxyrjqFLONNGpFSY0jYXu3ZtTXLyDSQm3kdOziX464Jc\nDPxBTz7mRXbxPJcxgR9wsLjdloULnylxY1HpfBLuFLSlwpTcbDebWrU6ER29h127apNXec/DszwH\nxJJFEh34jJOAD4DngF9ISFjL3LnTA3EbIkFN2SNSYWbOfDt3hl14czEr60L+978o/DPsIXSgI5/j\nws3vtONqPqMbsB64CFDankhZlDloG2N6GWM2GWN8xpjzK3JQEtxK/81pE9beg4saPMQEUoljBOO5\nhQn8wSLgM2AcMAM4oI1FkTIoz0x7I9AT/++6EiYcx2Hr1h+BiRTN7nC7N9OE/7GcK+jGUjqwhgX0\nBiA6ugYez3rgZ+BFPJ4Y1QkRKYMy52lba78B/7qLhI/U1FQ2bNgNdABigEG5rzzHrSce4rFdo5nI\naJ7iYRzc5BV5uu++gXTu3Fn1rUXKSYdr5LjMnPl27mnF24FUYCG1Ocg/cHP5nt3EcTZreBN/ezCA\nl4D6/PBDBuPHq761SHkdMWgbY5YBDUp46WFrrbdyhiShwQXE05FTmUUSH3AKVzcwfLttMNAMyDtu\nPg74CWPWBWykItXJEYO2tbZ4dfoyGDNmTP7j2NhYYmNjK+JjJQCSk68nNXUE+zP78xDPcTeTGMpE\nlnqe5N4BNzJx4stkZq7icHaIv9CTGuqKHFl6ejrp6elHva7cedrGmBXAMGvt2lJeV552CCt68rFP\nn+tIffFV+i97nwNOI/rRj/95FtC1a2vmzXuNhISbdZJRpAJU+OEaY0xPYBJQH9gDrLPWXl3CdQra\nISYvUL/xxlt88skqdu7MJifnfsBFv6gnmWR3sDWhN0P/s40ff95Os2ZNGDZsSP56tU4yipSfTkRK\nvoKzZ2stLVs25dtvt2KMoU+f63jttXksW/ZV/mwZ/sEJnMrzuOnM5/R1RbL3Lw3ZutXkbkoePsau\nGbVIxVDQFqBo3ZCBwCzgv+S19IqOnszBg//F59sM1AIghnRmcQXLOYF7GU0WUfjztDvkvt+FmhSI\nVCwdYxfAv3ThD9if4s/y2A9sIK9BQXb2Gny+esBy3BxiJI/yDtcxjEYMZjtZ3Jt77Ub8bcBScz9Z\nTQpEqoLytMNM4bohb+M/HBMJeHN/BriUZrzOTJ4km2jOpwsZxFK01oj/vQtRHRGRqqOZdthzgL7A\nSKAj0JFEvHzGIhbRgy68R0ahYF0aNSkQqQoK2mGmcFeY64CngU3Ap9QhiRmsZBS16UZDnmUzlheB\nD4F/ULTWiL+8aiSQoloiIlVEG5FhJq8rzNKlG8jKqgV8B4zjQtrwBn1Joyv38yz7eRUYBZwDnIPL\n9Q7GnIjPdzfgz79u3bo+p512GsYYpfaJVDBlj0g+x3EYPXo0TzyxAHuoPSPYze2sZTBTWUyP3KtS\nuPhiL40bNwIgMdH//Jw57wDKvxapbAraYaa0Ho55QTYhYQCr5zdjJhP5g4P0ZxPbOT333UrfEwk0\nBe0wkdcJfejQB9m+nQLLGQUOvxjDC50vI+Gz1TxOPf7BpVi+pmCZ1c6dT+Pjj5dpJi0SIKUFbaX8\nVSN5B2feffcTcnJq4W/75c/8yMy8hbS0GNLmzeNqr5f+v/5ErKs2nzsjKFhm1e8KTjstRwFbJAjp\n/8pqJO/gjL8T+lCK5lW3zbyCDgMHQb16eL76ih0N6+e+5i+zCtNz/7RWcwuRIKWgXY0cPjjjLvR8\nBAd5lJEsYBqvtu8MKSm4atcmJeUJoqMnUzSVT/nWIsFLQbtauh7w52KfwWY+5BI6soqLajbm7Afu\nzr8qPj6e7t3b4vHEACko31ok+Gkjshrxer0kJo4gM3MlcBv9+IBn2M1jdOOVqK+4Oq5dsSp8eVkm\nKqUqElyaYQG0AAALsUlEQVSUPRIG8g7OrFq6jmezojiXn0h2udnZ8BRSUiYQHx+vYCwSIhS0w4Sz\nYgXZCQn8u259Zp/Xgd79e2nmLBKCFLSrqbzljTkz3uTGTevpvn0rkTNm4LrmmkAPTUTKQXna1UTR\nrjNbt/5M1he/8vL+/fxObc6p1YB2r85lQffuml2LVEMK2kGupCC9ceOu3NS+DQxgE0/iMJbRpDAU\nsnL4LS2G1NRUHUEXqYa0PBLECrcGG4S/W8xyYAP1yGIq59OSg/QhjU20KvDOFBIS1jJ37vTADFxE\nyk3txkJQ4dZgQ/AfgrmHy/iU9bRlG7XoxN+KBGwRqc4UtINY4dZgUAMfE1jELJIYxEvcy5PkMB2d\naBQJH1rTDhEt+JZZrORXdtCWTeykKf5WYbOB1uR1U4+IeJ6uXTvoRKNINaWZdhBLTr4eT60pDGQy\nH3Ex07iHa+nOTv6K/9j5i/hbhTUG1gBz6dy5SbFTjyJSfZR5I9IY8zRwDXAA2AwMsNbuKeE6bUSW\nkfP773zWtj21tv/OTc79fE1DYCJQE/gvcBVwIxAHHFDjApFqpMIP1xhjrgKWW2sdY8wTANbah0q4\nTkG7LJYvx958M5vPP59BO7P4YWsGzZo15NChg3zxxS72768N7MFfgtXfszG/yYFm2SIhr1JPRBpj\negI3WGuTS3hNQft4HDgAI0ZgZ83i0eYtefqLvJxsf/eZLl1a0b9/b2bPXkRGxlYggkaNGhZrJyYi\noa2yg7YXmGOtnV3Cawrax+qbb6BPH2jalLTevblh8NO56X55zQzUu1EkXJQpT9sYs8wYs7GEP/EF\nrnkEOFBSwJZjZC1MnQqXXAKDB8OiRUxf/H6hdD+/aDIzB+WXURWR8HPElD9r7VVHet0YczPQHbji\nSNeNGTMm/3FsbCyxsbHHOr7qb+dOGDgQfv4ZPvwQzjor0CMSkQBIT08nPT39qNeVZyOyG/AscJm1\nducRrtPySGmWLYObb4akJHjsMYiKyn/pcEODVWh5RCT8VEb2yPdAJLAr96mV1tohJVynoF1UTg48\n/DDMmwevvw5XFP9FJa+hQVraxgIbkcoQEQkXqqcdLL76yr/ZeMYZ8PLLcPLJpV6qVmAi4UtBu4IV\nLJkKHD3lzlp48UUYPRomTIBbbwVT7O9DRARQ0K5QhUumDgQ24Xa/Q4MGJ5KS8kTxXoy//+4P0hkZ\nMGsWtGwZsLGLSGhQadYKdLhk6ifASmAlPt8Itm27i5tueoRevfrhOI7/4rQ0aNsWzjkHPvlEAVtE\nykVV/srgcMnU5fgLNh0+AJOdfQtpaTEsWbiQuI8+grfegpkz4fLLAzhiEakuFLTL5W2g+AGY5pnX\n0OrW2+CqK2D9epy6dUn1eo99/VtEpBSKGkfgOA5er5eEhAEkJAzA6/XiOI6/ZKpnKuAr8g7LUF5g\nBZN4t8W5MH8+Tt263HhjXxITRzJ/fkfmz+9IYuKIwksoIiLHSBuRpSjen9FfsKlr19bMm/caCQk3\nk5r6MTk5NYHP+RN7mM4tnMIOboncQ71OzWjUqCktWjTh739fTFaWDsmIyLFT9shx8p9IHFlqwaa4\nuDi8Xi9Dhz5Eu1/3MdXZx6t05jHXjxwy+/D5HgFcRERM4tChesDHFP7FRs13RaR0yh45TkX7M/od\nLtjkcrno0aULW3teybyTD/Dy5Rex9GIDNQ7i820G7gSGcOjQevx1r1MDcRsiUs0oaJfVxo3QqRPm\nt9+o9e23jP7XuzRq1JCcnAeAWgUujMbfqODNAs+p+a6IlI2CdikObzYW6XReayojTnDDX/8Kw4bB\n3LlQr95RP8/tXoa/r2MKHk8MXbu2VvNdETluYb2mfaSj6CUVbDq9Zgpv1tlNu+anYWbOhDPPLPR5\nR6rMd++91/L999sA1RARkaPTRmQRJWWHRERM4tRTTf5RdCC/YNP5GVu5+8u1RA0dihk1CmrUKPEz\nVZlPRCqCgnYRpWWHwPlERe0nLu4if6DNzoYHHoDUVHjjDX93mSNQZT4RqQgK2kUkJAxg/vyOQNES\n4CnAZ3g860l9/FYumzLFXztk8mSoWzcAIxWRcFRa0NYx9hIYXNyWeSbt/vY3eOUVf2cZlVEVkSAQ\ntr+zl5Yd0oAUlvAFvVjHiCvjITlZAVtEgkbYBu24uDi6dm1NdHQH8lLx4jmLdfzISq7m6lq1uWpw\nv0APU0SkkLBd04bDBaHuv+NBhm3fShfrpi+38oXnX8r4EJGA0kZkadatwyYmsq1hQ0bWa8z+yEhl\nfIhIwCloF+U48Pe/w5NPwj/+4W+2KyISJMIme+SYGu5u2wb9+0N2NqxeDc2bB2awIiLHqVrNtI9U\nAzt/fXrhQrj9drjzThg+HCKq3b9bIlINhMVM+3DD3cOnHDMz/T0bl771Ft2XLYPly2HRIrjggsAO\nVkSkDKrVTltpNbBbZnaj7a0DIScH1q1TwBaRkFXmoG2MecwY84UxZr0xZrkxpmlFDqwiGBwe4CmW\nMJkFrdrB669DnTqBHpaISJmVZ6b9lLW2jbW2LbAIGF1BYyqzgqccG/ML73Ml17CYy2o24Yzh9wd6\neCIi5VbmoG2t3Vfgx9rAzvIPp3zyTjn2iTqLtZzD+9Tmmlp7Oefq9mo4ICLVQrk2Io0x44G+QBbQ\nuUJGVA6urCzePDGSrHo5PN7qMv5zcn1mJd2mgzIiUm0cMWgbY5YBDUp46WFrrdda+wjwiDHmIWAi\nMKCkzxkzZkz+49jYWGJjY8s63tKtXg19+mAuuQTPd98x/oQTKv47REQqSXp6Ounp6Ue9rkLytI0x\npwHvWmtblfBa5eZp+3zw1FMwcSKkpECvXpX3XSIiVaTC87SNMX+x1n6f+2MPYF1ZP6vMtm6Fvn39\nj9euhaZBl8AiIlKhyrPQO8EYs9EYsx6IBao2PWP+fGjfHrp18x+YUcAWkTAQesfY9+2Du++Gjz+G\nWbOgY8eK+2wRkSBR2vJIaKVUrFoF7dr564V8/rkCtoiEndCoPeLzwYQJ8Pzz/ga7N9wQ6BGJiARE\n8AftLVv8fRojI/2bjU2aBHpEIiIBE9zLI3Pn+pdArr0Wli1TwBaRsBecQXvvXujXD0aPhiVL4IEH\noApPNB5Lgnt1FI73rXsOH9XlvoMvaK9cCW3bQs2a/s3G9u2rfAjV5S/3eIXjfeuew0d1ue/gWdM+\ndAgef9x/qnHKFOjZM9AjEhEJOsERtH/6yb/ZGB3tb1LQqFGgRyQiEpSq5HBNpX6BiEg1VdLhmkoP\n2iIiUnGCbyNSRERKpaAtIhJCFLRLYYx52hjzdW7z4reNMScGekyVzRjTyxizyRjjM8acH+jxVDZj\nTDdjzDfGmO+NMQ8GejyVzRgz3RizwxizMdBjqUrGmKbGmBW5/21/aYy5O9BjKg8F7dK9B5xrrW0D\nfAcMD/B4qsJGoCfwQaAHUtmMMW7gBaAbcA6QaIw5O7CjqnSv4r/fcHMQuNdaey7+tohDQ/nvWkG7\nFNbaZdZaJ/fHVUC1P0Nvrf3GWvtdoMdRRToBP1hrf7LWHgTm4m/mUW1Zaz8E/hfocVQ1a+12a+36\n3Md/AF8DIZtXrKB9bG4B3g30IKRCNQa2Fvj5l9znpBozxjQH2uGfiIWk4DhcEyBHa1yce80jwAFr\n7ewqHVwlOZZ7DhPKdQ0zxpjawJvA/+XOuENSWAdta+1VR3rdGHMz0B24okoGVAWOds9hZBtQsEdd\nU/yzbamGjDE1gLeAmdbaRYEeT3loeaQUxphuwANAD2ttdqDHEwDFTmJVM2uAvxhjmhtjIoEEYHGA\nxySVwBhjgGnAV9ba5wI9nvJS0C7d80BtYJkxZp0xZnKgB1TZjDE9jTFb8e+wpxpjlgR6TJXFWnsI\nuBNIA74C5llrvw7sqCqXMWYO8AnQwhiz1RgzINBjqiIXAcnA5bn/L6/LnZSFJB1jFxEJIZppi4iE\nEAVtEZEQoqAtIhJCFLRFREKIgraISAhR0BYRCSEK2iIiIURBW0QkhPw/q6qYpZKJG3UAAAAASUVO\nRK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "X = np.random.randn(100, 1)\n", "y = 1.3 * X + .1 * np.random.randn(100, 1)\n", "regressor = LinearRegression()\n", "regressor.fit(X, y)\n", "plot_regression(regressor, X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the noise is of limited magnitude, the model fits the data pretty well:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R^2 coefficient of determination: 0.9947\n" ] } ], "source": [ "print(\"R^2 coefficient of determination: %.4f\" % regressor.score(X, y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sparse Linear Regression, Variable Selection\n", "\n", "It so happens that one would like to perform variable selection while fitting a linear model. This can be done using the Lasso (see class `Lasso` in `sklearn` ).\n", "Compared to standard linear models, Lasso ones have an additional property `sparse_coef_` that is a sparse representation of non-zero entries in the vector of regression coefficients.\n", "\n", "We now generate data in dimension 30 such that coefficients that linearly link `X` to `y` are greater for even dimensions (and close to 0 for uneven ones)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05, 1.3, 0.05]\n" ] } ], "source": [ "d = 30\n", "X = np.random.randn(100, d)\n", "beta = [1.3, 0.05] * (d // 2)\n", "y = np.dot(X, beta) + .1 * np.random.randn(100, )\n", "print(beta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, if we fit a Lasso on this dataset, we get non-zero entries only for even dimensions." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " (0, 0)\t1.05617908426\n", " (0, 2)\t1.02629455661\n", " (0, 4)\t1.03793641374\n", " (0, 6)\t1.18403131698\n", " (0, 8)\t1.09909940784\n", " (0, 10)\t1.05089948592\n", " (0, 12)\t1.09547069121\n", " (0, 14)\t1.04909010234\n", " (0, 16)\t0.966487004581\n", " (0, 18)\t1.21713300881\n", " (0, 20)\t0.98686143766\n", " (0, 22)\t1.08398813087\n", " (0, 24)\t1.163494737\n", " (0, 26)\t1.11094326391\n", " (0, 28)\t1.04431845091\n" ] } ], "source": [ "regressor = Lasso(alpha=.2)\n", "regressor.fit(X, y)\n", "print(regressor.sparse_coef_)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.97198672377337769" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regressor.score(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, using only selected covariates, we are able to fit a standard linear model using only those covariates and get a better $R^2$:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reduced_X = X[:, regressor.sparse_coef_.nonzero()[1]]\n", "regressor = LinearRegression()\n", "regressor.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.9997622982324621" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regressor.score(X, y)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }