{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring regularization for linear regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goal\n", "\n", "The goal of this lab is to explore the effect of regularization on the coefficients and accuracy of linear regression models for a toy (Ames housing) dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set up" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "np.random.seed(999)\n", "\n", "from sklearn.linear_model import LinearRegression, LogisticRegression, Lasso, Ridge\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.datasets import load_wine, load_boston\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import r2_score\n", "\n", "import matplotlib.pyplot as plt\n", "import matplotlib as mpl\n", "%config InlineBackend.figure_format = 'retina'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data, drop columns with missing values, one-hot encode categoricals" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilities...PoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
0160RL65.08450PaveNaNRegLvlAllPub...0NaNNaNNaN022008WDNormal208500
1220RL80.09600PaveNaNRegLvlAllPub...0NaNNaNNaN052007WDNormal181500
\n", "

2 rows × 81 columns

\n", "
" ], "text/plain": [ " Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n", "0 1 60 RL 65.0 8450 Pave NaN Reg \n", "1 2 20 RL 80.0 9600 Pave NaN Reg \n", "\n", " LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold \\\n", "0 Lvl AllPub ... 0 NaN NaN NaN 0 2 \n", "1 Lvl AllPub ... 0 NaN NaN NaN 0 5 \n", "\n", " YrSold SaleType SaleCondition SalePrice \n", "0 2008 WD Normal 208500 \n", "1 2007 WD Normal 181500 \n", "\n", "[2 rows x 81 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_ames = pd.read_csv(\"ames.csv\") # in same directory as this notebook\n", "df_ames.head(2)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1460, 215)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore columns with missing values for this exercise\n", "cols_with_missing = df_ames.columns[df_ames.isnull().any()]\n", "cols = set(df_ames.columns) - set(cols_with_missing)\n", "\n", "X = df_ames[cols]\n", "X = X.drop('LotArea', axis=1) # has some outliers, let's drop for stability in this lab\n", "X = X.drop('SalePrice', axis=1)\n", "y = df_ames['SalePrice']\n", "\n", "X = pd.get_dummies(X) # dummy encode categorical variables\n", "X.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examine coefficients of nonregularized OLS model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**1. Split off validation/test set and train unregularized linear regression model**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=999)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression()" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm = LinearRegression()\n", "lm.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2. Compare the R^2 on training and test set**" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.92 R^2 on training set\n", "-5800585.57 R^2 on test set\n" ] } ], "source": [ "print(f\"{lm.score(X_train, y_train):.2f} R^2 on training set\")\n", "print(f\"{lm.score(X_test, y_test):.2f} R^2 on test set\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Why is the testing score much worse?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "Because the unconstrained coefficients of the OLS model do not generalize well; they are over fit to the training set, which makes them performed poorly on the unseen test set.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3. Predict $\\overline{y}$ computed from training set instead of using the model; what is the R^2? on test set?**\n", "\n", "R^2 should give 0 as the neutral value when the prediction is no better and no worse than simply predicting the average of the training set." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred = np.full(shape=(len(y_test),1), fill_value=np.mean(y_train)) # get vector of y_train bar\n", "r2_score(y_pred, y_test) # how well do we predict y_test using y_train bar?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4. Extract $\\beta_1, ..., \\beta_p$ and count how many are close to 0**\n", "\n", "Note: `sum(np.abs(x) < 1e-5)` is a decent way to check for values of `x` close to zero but not necessarily zero. There is also `numpy.isclose()` but that is too strict (requires numbers to be really close to zero) for this exercise." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "lm_beta = lm.coef_" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(np.abs(lm_beta) < 1e-5) # how many close to 0?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**5. Plot the coefficient index versus the value**\n", "\n", "R^2 should give 0 as the neutral value when the prediction is no better and no worse than simply predicting the average of the training set. The coefficients should look something like the following images, depending on which training samples are chosen:\n", "\n", "\n", "\n", "(There is also some randomness probably due to how the sklearn regression model deletes collinear columns to prevent the symbolic solution that finds coefficients from failing.)\n", "\n", "The following function is a handy way to plot the coefficients." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def plotbeta(beta, which, yrange=(-20_000, 20_000),fontsize=10, xlabel=True, ylabel=True, ax=None):\n", " if ax is None:\n", " fig,ax = plt.subplots(1,1,figsize=(4,2.5))\n", " ax.bar(range(len(beta)),beta)\n", " if xlabel:\n", " ax.set_xlabel(\"Coefficient $\\\\beta_i$ for $i \\\\geq 1$\", fontsize=fontsize)\n", " if ylabel:\n", " ax.set_ylabel(\"Coefficient value\", fontsize=fontsize)\n", " if yrange is not None:\n", " ax.set_ylim(*yrange)\n", " ax.tick_params(axis='both', which='major', labelsize=fontsize)\n", " plt.gca().yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:.0f}'))\n", " ax.set_title(f\"{which} $\\\\overline{{\\\\beta}}$={np.mean(beta):.0f}, $\\\\sigma(\\\\beta)$={np.std(beta):.1f}\", fontsize=fontsize)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAArQAAAGOCAYAAABmN5CMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAABYlAAAWJQFJUiTwAABGUklEQVR4nO3dd7gkVZn48e9LRjKYEJQBFMSw4IWVtCBBERQVA2ZFDL9VUcGwgqICrrouKoJhZdXVgdVdFBRUgoBEhQVkEAFF8pARyWkYHHl/f5xqaXq6+3ao2/f2zPfzPPXUdNU5dc6prp5+b/WpcyIzkSRJksbVEtNdAUmSJGkYBrSSJEkaawa0kiRJGmsGtJIkSRprBrSSJEkaawa0kiRJGmsGtJIkSRprBrSSJEkaawa0kiRJGmsGtJIkSRprBrSSJEkaawa0kiRJGmsGtJIkSRprBrSSJEkaawa0kiRJGmsGtJIkSRprS013BSSNp4jIQfNmZtRZF0nS4i0yB/5OkiRJkqadXQ4kSZI01gxoJUmqSUQcGRG3R8QKHfYvGxGfiIhLI2JetVwcEe9qSbdpRGTrdkntGdBKM1BEbBYR34+Ia6svvPuqL8AvRcRaHfJkr/1aI2LJiHhPRJwVEXdFxF+rL+FLIuK7EfHKAerc0xd1XSLi3yPitIi4sSrrroj4XUQcEBFrtEm/RkS8OyKOjYirqzz3RsRvIuJdEbHEMOlb8r48Ik6JiJuqfNdGxNERsWUdeQapW0TMbVwjbZbb2qSPiHhnRJwXEfdHxEPV+f1QRCzZqR0tx3hbUxnv7pBm7Yj4XkTcEhHzq3oeGhGrdUjfVztGKSI2A94KfDEzH2yzfxngFOALwALgcOAIYF3guxHx0kbazJwDHAd8LiJWnKL69vUZGiRPRLyjy/vVWP7Wkqfn93iQ41f5+r3uprzdA5bT9+d0kPe9Ke82EfGTiLi1Om+3Rvl/62WT5BvofPTDPrTSDBIRAXwR+DjlC+9U4FJgGWAr4IXAQ8AemXlMS96EyR+4qv6TOx7YGbgHOAG4CVgdWB/YErgoM/+pj3ovU9V1W+Bi4ExgeeBNwMrAzpl5cq/H67HMR4CLgD8CtwMrAFsAmwG3AFtk5o1N6d8LfAu4FTgDuAF4CvAaYBXgJ8DuWf2n2G/6pnL+nfL+3UkJSO4Angm8kvIg7tsz8wfD5BmkbhExF1gVOLTN6XwgM7/cUqcjgbdV5/YXwIPAi4HndGp7S/6nU67dJYEVgfdk5ndb0qwPnAs8GfgZ8CfKNb49cAWwdWbe2ZKnr3aMUkScQqn/mpk5r83+fwEOBv4TeF/TtfZm4IfA5zLz003pXwicD+yfmV+Ygvr29RkaJE9EbALs1qEK2wA7ACdk5q5NeebS43s84PEHue6mvN0DltP353SQ973K9yngXyn/Px1P+f/nicALgDMy8+Md2jvw+ehLZrq4uMyQBfgMkMB1wHPb7H8tMI8S7G7fsi/LR3rSMt5apb0YWKXN/ie0HruHY/5LdczDqf5Qrra/udr+r1NwrpbrsP3zVZn/0bJ9B+AVwBIt259KCQgTeO2g6Zv2/Q24DXhyy77tqzzX1pBnkLrNBeb2eG53a5QLPLFp+9LAsdW+d3TJH8CvgGuAL1Xp390m3cnVvg+2bD+kcT21ydNzO0a5ABsAjwLf7pLmGkrAsULL9tdX7d2nTZ7LgeuBJaegzn19hgbN06X8/6vyvHIq3uMuxx/kupvydvdbzqCf0wHf992rfacCK7XZv3Td71Pfxxn2gnFxcalnAWYBfwUeAZ7fJd17qw//n2gKaOg9oP2PTl+eQ9S97y/qKTyPGzf+4+0jzyerPF8fJj2webX9Zx3y3QfcP2yeAevWc5AAHFkdY682+55X7ZvTJf/elOBuW+BA2gS0wHo89sdba2C+EvBAh2uq53bUcC2tSPkj83fA/Y3PWJvlyZRfVhLYscOx1qn2/7zNvmOqff/QZt8B1b6XjqLNVZmDfIb6ytN0Hd1ES7Bex3vc6fiDXnejaHe/5Qz7Oe21LZTuqddW5+VJNV9rA52Pdot9aKWZY0/Kz8vHZualXdJ9l/Kz0IbAiwYop/FT2gYD5F1IRKxD+ZI4LRfuN/j6an16HWX16BXV+pI+8vy1Wi8YMv1VlD9IXhgRT2zeERHbUr4wf1VDnkHqBrBsRLw1Ij4ZEXtHxPYd+tk9tVpf22ZfY9tERKzaujMiNqIEd4dl5tld6rlDtT4lMx9t3pGZ9wPnUH4t2GKIdgwsIp4M/BY4iPKFfjjwdcqddCjn+Rrg/My8nfIz79+A8zoccrNqfX5TGRERe1N+eflVZra7Zs+p1i8ZvDV9G+Qz1G+ef67W/5WZ7fpODvsedzr+MNddO3W3u59yBv6c9lEGlO5u6wInAndH6e+/b/W+dHwuoEeDno+FOLGCNHM0+qx2DV4yc0FEnEn5OX9rSh/KfvwU2Bd4b0SsRPlpak5mXt/ncRraflEDH6LDF3VE7EPpI9erizPzuHY7IuJjlDtpq1R1+SfKf8hf7OXAEbEU8Pbq5S+HSZ+Zd0XEvpSfLv8YEcdR/oBYn9If9lQe+w984DxDtOWpwH+3bLsuIvbMzLOatt1Rrddtc4z1mv79bJoCuKr8/6Z0e/jkJNXdsFpf2WH/VcBOlD+8TmvZ12s7hvE/lPYdDOyX1e2kiPhSVbclKX0N74gyosEmwOVt/qhr2LRaz4mI7Smf320o5+H3lK5A7fy2Wm/buqOuz9Egn6FhPncRsTylvY9S/kBvZ+D3eJLjD3PdjaLd/ZQz0Od0gLb8Y7X+M6Xv7fNbjnM28LrM/Eu3NrUpv+fz0ZM6bx27uLgMvlA66CflAarJ0jZ+3mzuT9VTl4Mq7espHfqbfza9kxLcvqLPen+hUW9Kn8/vULpDJKWf7lPa5JnbUvZky+wu5d/WkvakdmV2yf/lKt8JdaWn9G27q6VeVwFvrjNPP3Wj/HS9A+XhsSdQfuo7nPJl8hCwcVPaRt/nq4HVm7YvRXnQpFG/XVrK+CzlLuWWTdsOpH2Xg2+32960v9Gf7xODtmOIz+JLqrJ/Q8vP0tX+U6v9L65eb1C9PqXLMRv9Np8IHNXyPh8FPK1L3nnAbVP1ORrkMzRInqa8e1R5ju+wf6j3uNvxB73uRtHufsthwM9pv20B/q3av4Dyf9KOlED4uZQ/nBM4c4DPWc/no6fj1XEQFxeX4RfKwx9JD33lgH+v0n6zaVvSY0BbpV+acifis5SnY+9u+o/tCJoe7prkOAN/Udd8/p4CvJrylPItwEQPeT5U1fPy5i+EYdLz2AgVh1DukjwBmGg6TwfXkWfYtjTlawTBxzZtW4Ly82JWX3rfpjxxfhkluLqy2rdTU54XVm04uOX4BzJYQNv4Q2m/QdsxxLV0RHWs13bYf0xz+ykjgyTwoy7H/AtVv1DK3d01KF2GGse6pEvem4EFM/QzNEiec6o29/vHc0/vcbfj13Xdjard3coZ5HM6SFsov1Ik5Y/VjVv2LQ/cWO3fstd2DXMddDxeHQdxcXEZfqH0M03K8EaTpf1hlfZTTdv6CmjbHHNJyp3bB6pj7dZjvoG/qKfoPK4DzAcumyTdXlX9/gA8tYfjTpoe2K5K89M2+55AefDhb8B6w+QZti0teZ9Z5b2zZftSwEcpd9nnUR5O+yXlp/PGU8mbNKW9gvIrw7ItxzmQ9gFtY/SDj3ao1zeq/e8bph0DXkPXVue87YNBlJ9vs/GeULobJJ0f7Fun2n9Mh/0XNx+vzf676OPBwBra39NnaJA8lOGkkhIE9fUQUC/v8WTHn4Lrbsrb3a2cfj6nQ5Txieo4V3XI991q/959tGeo89FusQ+tNHP8hvKT/YspP9u3VT0YsV318pxO6fqVpUP+jyPi+cCnKD/5HdctT/VA2BOBs5qOcWf1+qyIuBjYOCLWy8xrm/LtQ019aNu04/qI+COwSUQ8MTPvaE1Tlf9Vyp2MHbM81NNRH+kbYyie0aZeD0XEBZQ7IS/gsYc2BskzcFvaaKR/3MxWmbkA+Eq1NJe3PCWAm0cJoKH8/Nh4yPDh0oV6Id+JiO9QHhbbhxIAQ+eHE59VrTv1dWzVth39ijIpxTrA7dl+coSnUPoUXtd0TTfK7jQofaP/7IUd9t9dre/vUJ9VKU/lt+7bhyn4HPXyGRoizzAPAfXyHk92/FqvuxG1u2M5fX5OB21L45zd0yFr4/pdvsfmQI0PgzUY0Eozx2zKgzSvjojnZman/4TeCTyN8p9MXQ/ANGt8qXadoKEy6Bf1PpSgoVdHMElw3eJp1brdTDz7UvogXwy8ZLIv6z7TL1utn9Rhf2P7I0PmGaRunTSeUm73pHQ7bwOWA47IzMaICvOB/+qQfoISjP+Gcs3+X7W9EcDvFBFLZNMT59XDiltTvow7jRrQqt92dJLVeqXWelU+Tvmp9z+btt1K+aViQ9r7+wNhrTsiYnVKWy/N9g/VbEj5LF7cZt8+TN3nqONnaNA8EbEc5fp5lM7XSzdd3+Mej1/3dQdT3+6eymnR7nM6aBlnU7oTPSsilsnM1v+Lnlet5/ZSQI3n4/HquM3r4uJSz0Lpz5qU4YCe02b/bpSHIhYAO7Ts66nLAWX2rpfQ/mGXp1I6/SdlhpnJjtV4gOIlbfatTgnCau9yQHlqd6Gf1imBRqNO57TZ/+lq34X01me23/SNcXdvA9Zq2bcL5T/wecAaw+Tpt26UhzcWSkMJhhrv9ydb9q3cJv0/Uv38TZcuEC15DqRDn0X6HOB+kHZU+2dX+97RY50vqtK/pWX76yhf9JfTMjg9j3WxeWab4zUenPkuj594ZBng6GrfHh3qsme1/wPT/Rka9HPXlO5tVZpf1Hmt9nP8Aa+7KW/3EO9JX5/TQdsC/KDa97mW7S+h/B91D7Bqy771q/KWbtne0/nod/EOrTSzHEj5Oe0jwO8j4mTKz0VLU8YC3JwS3LwpM9uO7RoRs7sc//3VMfYGbouI3/DYT5nrAi+n/Gz0M8oX9GQad57eEBG/yup/qyhT4f5nVe+vdMo8hJ2BL1XDxVxD6ebwFErf3fUoweF7mjNExB489hT+r4EPtflpfG5mzh4kfeUYyrBrLwYuj4hjq7psROlaEJSHTe4cJs8Addsd2C8izqC83/dTvmxeTrmLcyLlgZtmp0bEPEpXhvspgcbLKHdjX5NNXUiG8H7KFKRfi4gdKYHi5pSuN1cC+7ekH6QdwN/HXO91nOHPUoa3+35E7Ezp5/ePlPfoKuBlmflwS56fUIapeynlqfNmjc/Juyg/555OGV94J8r1Ojszj+hQl50o7/PPeqx7r/r+DA2Yp9n/q9bf7pJm0Pe41+ND/9fdKNo9aDn9fk4HbctHKOdo/yjjY19A+SPj1ZTr8z2ZeU9LntOqNOvy+Lu3vZ6P/tQZHbu4uNSzUJ4YP4LyH/o8yoNal1H+I1+7Q57sYVkVeDrlIaJjKT8B30e5k3or5cvirbS5e9uhzL80HftCytOw36L8R5nA96fo/DwP+CblZ9g7KIHKvZQxOw+k/R2eA3s4P2cOmr4p39KUn4LPq87tAkrfv+Pp8LRxv3kGaMuLgP+lDKd2D2VSgL9Qhp96O21GtKBMZzynSj+/uhYPB2b1+V416trpqfKnA9+vrr9HKNO8HtbhPey7HVW+31XndbU+6v0qStDzIOVXkYspXYJW7JB+GUowcH7L9mdU7T8F+DGPXa93VvXu+EsIZXzQecBxM+Qz1Heeprwb0cNDQEO8xz0df8DrbsrbPcR70tfndMi2rE65i31ddc7upPyhtUWH9HOrts8a5Hz0u0RVgCT1JSKeQfkSOJXyn+kOlID5XspPtt/OzKOnq34SQDVL0p3AVzLz41Nc1icowz5NZObvqm2vptzt/XhmfqnP430Q+BqwbWb+uu76SosSuxxIGlTjZ9RT+/2ilkZoG8pdvkNGUNZXgfdSuiw0phFtfE4u6udA1VPqnwB+YjArTc6AVtKgBvqilkYpM39B6Xs5irIejoi3AdtHxApZhv1qfE5+1+fhZlH6GM6ur4bSossuB5IGEhEnUR4wWCMz75ru+kgzUUTcDjyUmbOmuy7SosyAVtJA/KKWJM0UBrSSJEkaa0tMnkSSJEmauQxoJUmSNNYMaCVJkjTWDGglSZI01gxoJUmSNNYMaCVJkjTWnClMmiEi4jpgZWDuNFdFkqRRmQXcl5nrDnMQA1pp5lh5+eWXX32jjTZafborIknSKFx++eXMmzdv6OMY0Eozx9yNNtpo9Tlz5kx3PSRJGolNN92Uiy66aO6wx7EPrSRJksaaAa0kSZLG2iIT0EbEGhHx7og4NiKujoh5EXFvRPwmIt4VEW3bGhFbRcSJEXFXRDwUEZdExD4RsWSXsvaIiAsi4oGqjDMjYtcu6ZePiIMi4oqIeDgibo+IH0fERl3yrB0R34uIWyJifkTMjYhDI2K1LnlsywxsiyRJmlqLTEAL7A58B9gcOB84FPgJ8Dzgu8CPIyKaM0TEq4CzgW2BY4FvAssAXwWOaldIRHwZmA2sWZX3A+D5wC8i4gNt0i8LnAp8BrgPOAz4FfBq4MKI2LxNnvWBOcCewAVVfa4F9gb+LyLWaJPHtszAtkiSpBHIzEViAXYAXgEs0bL9qcANQAKvbdq+MnA7MB/YrGn7csC5Vfo3thxrq2r71cBqTdtnAXcCDwOzWvJ8ospzdHPdgFdV2//Qps4nV/s+2LL9kGr74S3bbcsMbMsA1/CciYmJlCRpcTExMZHAnBw2Dhz2AOOwAJ+sAo6vN217Z7XtiDbpd6j2ndWy/chq+55t8ny22ndQ07YArq+2r9smz9nVvu2btq1XbbuuTUC1EvAA8CCwgm2Z2W0Z4Do1oJUkLVbqCmgXpS4H3fy1Wi9o2rZDtf5lm/RnAw8BW1U/TfeS56SWNADrA88ArszM63rM0/j3KZn5aHPizLwfOAd4ArBFj/WyLQvXa1RtaSsi5rRbgGdPlleSJC1skQ9oI2Ip4O3Vy+aAZ8NqfWVrnsxcQLkTtxTlzhwRsQKwFvBAZt7apqirqvUGvZQxqjy2ZVrbIkmSRmBxmFjhi5QHw07MzJObtq9Sre/tkK+xfdUB08/kPDO1XoPkman16igzN223vbpLOzFZfkmS9HiL9B3aiPgQ8FHgT8Db+s1erbPPfP2kH6SMUeUZRRmLe1skSVINFtmANiL2ogzF9EfKwz13tSRp3FFbhfZWbkk3Wfp2d/D6LWNUeWzL1NdrrM3a74S2a0mSZqJFMqCNiH2AbwCXUYLZ29oku6JaL9Tnsep3uy7lIbJrATLzQeBmYMWIWLPN8Z5VrZv7WHYsY1R5bMu0tkWSJI3AIhfQRsS+lAHvL6YEs7d3SHp6td65zb5tKU+sn5uZ83vMs0tLGoBrKGPgbhAR6/aY54xqvVO0zG4WESsBWwPzgPN6rJdtWbheo2qLJEkagUUqoI2IT1MeApsD7JiZd3RJfgxwB/DGiNis6RjLAZ+rXn6rJc/h1Xr/5qlOI2IWsBdlMoDvN7ZnZjblObg5EKpmw9qG0iXirKY81wCnUCYF2Kul/IOAFYAjqzuTtmVmt0WSJI3AIjPKQUTsQRlE/2/Ar4EPtcx0CzA3M2cDZOZ9EfEeSgB1ZkQcBdwFvJIyRNMxwI+aM2fmuRFxCPAR4JKIOIYyJesbgNUpM0jNbSnzEGBX4HXA+RFxGmUM1N0pY6q+s3VcU+D9lFmxvhYROwKXU6b03Z7yk/b+LfWyLTOwLZIkaUSGnZlhpizAgZQnzLstZ7bJtzVwInA35SfjS4EPA0t2KWsP4LeUmaHup9zJ27VL+uUpd/Guotwt/AtlytXndMnzdMpdxVuBRygzWx0GrN4lj22ZgW3p4xqeMTOFrbPv8W3XkiTVqa6ZwiLTUYakmSAi5kxMTEzMmTNnuqvCrP1OYO4XX77QWpKkOm266aZcdNFFF2WHMdp7tUj1oZUkSdLix4BWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVhKz9juBWfudUHvauoy6PEnSeDGglSRJ0lgzoJUkSdJYM6CVJEnSWDOglSRJ0lgzoJUkSdJYM6CVJEnSWDOglSRJ0lgzoJUkSdJYM6CVJEnSWDOglSRJ0lgzoJUkSdJYM6CVJEnSWDOglSRJ0lgzoJUkSdJYqy2gjYglIuKDEXFeRNwbEQua9r0gIv4jIjaoqzxJkiQJagpoI2IZ4FTgUGB94H4gmpJcB7wTeEsd5UmSJEkNdd2h/Rdge+Ag4CnAd5t3ZuY9wNnAS2sqT5IkSQLqC2jfApyTmZ/NzEeBbJPmOuAZNZUnSZIkAfUFtOsC502S5i5g9ZrKkyRJkoD6Atp5wKqTpHkGcE9N5UmSJElAfQHtxcBO1cNhC4mIVSj9Zy+oqTxJkiQJqC+g/Q7wdOCHEbFy846IWBWYDawGHF5TeZIkSRIAS9VxkMz834h4MbAn8ErgboCIuBB4LrAs8M3MPLGO8iRJkqSG2iZWyMx3Ucaa/SPwJMo4tBPA1cC7MvODdZUlSZIkNdRyh7YhM2cDsyNieUoXg3sz88E6y5AkSZKa1RrQNmTmPMrIB5IkSdKUqq3LgSRJkjQdagloI+LaHpdr6iivSz1eFxFfj4hfR8R9EZER8YNJ8mwVESdGxF0R8VBEXBIR+0TEkl3y7BERF0TEAxFxb0ScGRG7dkm/fEQcFBFXRMTDEXF7RPw4IjbqkmftiPheRNwSEfMjYm5EHBoRq9mW8WqLJEmaWnXdoV2C8hBY67IqMKtalqmxvE4+BXwA2AS4ebLEEfEq4GxgW+BY4JuUen4VOKpDni9ThiFbkzJc2Q+A5wO/iIgPtEm/LHAq8BngPuAw4FfAq4ELI2LzNnnWB+ZQRo24oKrPtcDewP9FxBq2ZTzaIkmSRiAzp3QBngmcCJwFLDfFZW0PPIsSTG8HJPCDDmlXBm4H5gObNW1fDji3yvvGljxbVduvBlZr2j4LuBN4GJjVkucTVZ6jgSWatr+q2v6H5u3VvpOrfR9s2X5Itf1w2zLz2zLA9TtnYmIip8M6+x6f6+x7/ONet1u3Szuq+kmSFj0TExMJzMkhY8Ap70ObmVcDrwHWAg6Y4rLOyMyrMkt0MInXUYYXOyozL2w6xsOUO70A72vJ895q/fnMvLspz1zKXcRlKXfvAIiIaMrz8cx8tCnPz4BfA88BXtSUZz1gJ6BxzGYHAA8Cb4uIFWzLjG+LJEkagZE8FFYFI6cCbxpFeT3aoVr/ss2+s4GHgK2qn6Z7yXNSSxqA9YFnAFdm5nU95mn8+5TmQAsgM+8HzgGeAGzRY71sy8L1GlVbJEnSCIxylIMFwFNHWN5kNqzWV7buyMwFwHWUYc3WA6juvK0FPJCZt7Y53lXVeoNeyhhVHtsyrW1pKyLmtFuAZ0+WV5IkLWwkAW1EPJHysM2NoyivR6tU63s77G9sX3XA9DM5z0yt1yB5Zmq9JEnSiNQysUJEfKbL8Z9OedBmFcqDOOMiqnUv/XGb9ZN+kDJGlWcUZSyWbcnMTdseoNylneijTEmSRH0zhR04yf77gM9l5sE1lVeHxh21VTrsX7kl3WTp293B67eMUeWxLVNfL0mSNCJ1BbTbd9j+KHA38Keq/+NMcgWwGaXP45zmHRGxFLAupd/vtQCZ+WBE3AysFRFrtumv+axq3dzH8opq3alfZZ15bMvMa4skSRqBWvrQZuZZHZZfZ+ZlMzCYBTi9Wu/cZt+2lCfWz83M+T3m2aUlDcA1wA3ABhGxbo95zqjWO0XE496fiFgJ2BqYB5zXY71sy8L1GlVbJEnSCIxylIOZ5hjgDuCNEbFZY2NELAd8rnr5rZY8h1fr/ZunOo2IWcBelMkAvt/YXo2H28hzcHMgVM2GtQ3wR8qkE4081wCnUCYF2Kul/IOAFYAjM/NB2zLj2yJJkkZgoC4HEfGMQQvMzBsGzTuZiNgN2K162RgibMuImF39+47M/FhVj/si4j2UAOrMiDgKuAt4JWWIpmOAH7XU/dyIOAT4CHBJRBxDmZL1DcDqlBmk5rZU6xBgV8qEAedHxGmUMVB3p4yp+s7WcU2B91NmxfpaROwIXA5sTunacSWwf0u9bMsMbIskSRqNQfvQzqX/p8yp8tTVb7edTYA9WratVy0A1wMf+3tlMo+LiBdRApHXUqZXvZoSGH2t3YxjmfnRiLgE+ADw/yj9hC8CvpSZx7dJPz8iXgzsB7wZ+DDlIbnjgAMy849t8lxT3Z38LOVn9JcBtwJfAw7KzLva5LEtM7AtkiRp6g0aXB7JYAHtlMrMA5l8xIXWPOdQApN+8hwBHNFH+nmU6VF7nvo3M2+kabrWHvPYlhnYFkmSNLUGCmgz8x0110OSJEkayOL8UJgkSZIWAQa0kiRJGmu1PqAVEf8IvBRYC1i2TZLMzHfVWaYkSZIWb7UEtBERwGzgrZQ57ZPH5ran6XUCBrSSJEmqTV1dDj4AvA34b8q0pQEcCmwFfBK4HziKx4bPkiRJkmpRV5eDPYArGqMflBu23JOZ5wHnRcTJlClBT6VpxiZJkiRpWHXdod2Qx897D03Bcmb+DjieMtOSJEmSVJu6AtoA7m16/SBlytFmVwHPrqk8SZIkCagvoL2ZMrJBw7XApi1pnkUJdCVJkqTa1BXQXsDjA9iTgBdGxKcj4rkRsRfwKko/WkmSJKk2dQW0PwGWjIh1q9cHA9cDBwGXAF8H7gH2q6k8SZIkCahplIPMPA44run1XRHxAuA9wPrAXODIzLy1jvIkSZKkhlpnCmuWmfcCX56q40uSJElQU5eDiNglIurqviBJkiT1rK4g9ATgxog4OCKeV9MxJUmSpEnVFdB+G1gO+Bjw+4j4bUR8ICLWqOn4kiRJUlu1BLSZ+V5gTeANlCG7NgYOA26OiJ9GxCsjYsr660qSJGnxVVu/18x8JDOPzsxdgbWBfwGuAHYDjgVuiYhD6ypPkiRJghoD2maZeXtmHpKZGwMvAL4GrAJ8cCrKkyRJ0uJrSkcmiIgNgNcDrwGWnsqyJEmStHiqvV9rRKwKvBHYA3ghEMB9wH8Bs+suT5IkSYu3WgLaagzaXShB7CuAZYAETqMEsT/NzIfrKEuSJElqVtcd2luAJ1Huxl4JHEGZ6vbmmo4vSZIktVVXQLsc8B1gdmaeV9MxJUmSpEnVFdA+JTPn13QsSZIkqWd1TaxgMCtJkqRpMaXDdkmSJElTzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY62WgDYi3h4R/zBJmudFxNvrKE+SJElqqOsO7Wxgt0nSvAr4fk3lSZIkScBouxwsCeQIy5MkSdJiYJQB7QbA3SMsT5IkSYuBgae+jYjvtWzaLSJmtUm6JPAMYBvghEHLkyRJktoZOKAF3tH07wQ2qZZ2Ejgf+PAQ5UmSJEkLGSagXbdaB3AtcChwWJt0fwPuzswHhyhLkiRJamvggDYzr2/8OyIOAs5o3iZJkiSNwjB3aP8uMw+q4ziSJElSv2oJaBsiYklgQ2A1ysNgC8nMs+ssU5IkSYu32gLaiPg05aGvVSZJ2jbQlSRJkgZRS0AbER8HDgLuBf4buBFYUMexJUmSpG7qukP7HuBmYCIz/1LTMSVJkqRJ1TVT2NOB4wxmJUmSNGp1BbR/puYHzCRJkqRe1BXQ/hh4SUQsW9PxJEmSpJ7UFdB+BrgVOCYi1p0ssTTOImLtiPheRNwSEfMjYm5EHBoRq0133SRJWhzV1U3gD8DSwNOAl0XEvcA9bdJlZq5fU5nSyEXE+sC5wJOBnwF/Al4I7A3sHBFbZ+ad01hFSZIWO3UFtEtQhum6oWlbtEnXbps0Tv6DEsx+KDO/3tgYEYdQxmH+PPDeaaqbJEmLpbqmvp1Vx3GkmSwi1gN2AuYC32zZfQDw/4C3RcRHM/PBEVdPkqTFVl19aKXFwQ7V+pTMfLR5R2beD5wDPAHYYtQVkyRpcTYlQ21VD8esmJk3TsXxpWmyYbW+ssP+qyh3cDcATut0kIiY02HXswevmiRJi6/aAtqIWJEy/e1bgCcB2Th+RGxO+Un2U5l5UV1lSiO2SrW+t8P+xvZVp74q9Zr7xZe3fd26bpd21n4ntN3eyaz9Tug5bacyJUlqVktAGxGrAL8BngtcDNwBbNSU5FJgG+BNgAGtFlWNhx6zW6LM3LRt5nLndqLuSkmStKirqw/t/pRg9h2ZOQEc3bwzMx8CzgJ2rKk8aTo07sCu0mH/yi3pJEnSCNQV0L4GODkzj+yS5npgrZrKk6bDFdV6gw77n1WtO/WxlSRJU6CugHZt4JJJ0jxA5ztb0jg4o1rvFBGP++xExErA1sA84LxRV0ySpMVZXQHt/ZTB5rtZl9K3VhpLmXkNcAowC9irZfdBwArAkY5BK0nSaNU1ysFvgV0jYqVqPM7HiYg1gZcBx9dUnjRd3k+Z+vZrEbEjcDmwObA9pavB/tNYN0mSFkt13aE9DFgDODEimkc3oHp9NLAc8LWaypOmRXWXdjNgNiWQ/SiwPuXa3jIz75y+2kmStHiqa+rbkyPiQOBA4DLgrwARcQewGmU4o30z89w6ypOmUzVhyJ7TXQ9JklTUNvVtZn6WMizXz4G7gb9RxuM8EXhxZn6prrIkSZKkhlqnvs3MM3jsSXBJkiRpytV2h1aSJEmaDga0kiRJGmsDdTmIiEeBR4HnZOaV1euu89dXMjNr7eYgSZKkxdugweXZlAD2oZbXkiRJ0kgNFNBm5nbdXkuSJEmjYh9aSZIkjbVaAtqIWD4inhERy3TYv2y1f7k6ypMkSZIa6rpD+xngCmDFDvtXAP4EfLKm8iRJkiSgvoB2F+BXmXlXu53V9l8Bu9ZUniRJkgTUF9DOAq6cJM2VVTpJkiSpNnUFtEtTxqXtJgH70EqSJKlWdQW01wIvmiTNdsD1NZUnSZIkAfUFtD8HNo2Ij7fbGRH7ARPAcTWVJ0mSJAGDzxTW6svAW4B/i4jXA6cANwNrAS8FNgFuAA6uqTxJkiQJqCmgzcy7I2I74IfAlpS7sQlEleRc4K2ZeXcd5UmSJEkNdd2hJTPnAltHxASwBbAqcA9wXmZeVFc5kiRJUrPaAtqGKng1gJUkSdJI1PVQmCRJkjQtBrpDGxGfofSR/WZm3lW97kVm5r8OUqYkSZLUzqBdDg6kBLQ/Au6qXvciAQNaSZIk1WbQgHb7an1Dy2tJkiRppAYNaO8GbsvMhwEy86z6qiRJkiT1btCHwn4HvLfxIiJOj4i311MlSZIkqXeDBrSPAks2vd4OmDVsZSRJkqR+DRrQ3kSZzlaSJEmaVoP2of0F8IGIuBy4tdr2jmr6224yM3ccsExJkiRpIYMGtPsDywAvB15EGY5rFpN3O8gBy5MkSZLaGqjLQWben5nvzcynZ+aSQAAHZuYSkyxLTnZsSZIkqR8DBbQRsXJELNO06Sxgbi01kiRJkvow6ENhdwP7Nb2eC9wzbGUkSZKkfg0a0Calm0HDHjjqgSRJkqbBoAHtrcAz66yIJEmSNIhBRzk4HXhLRDyRx4bt2i0iZk2SLzPzXQOWKUmSJC1k0ID248BTgJdQ7vImpcvBJpPkS8CAVpIkSbUZKKDNzD8DO0fE0sCalIfCDgUOq61mkiRJUg8GvUMLQGb+FbghIq4H5mbm9fVUS5IkSerNUAFtQ2auW8dxJEmSpH7VEtA2VF0QdgQ2AlbMzH+tti8HrAzckZmP1lmmJEmSFm+DDtu1kIjYmdKX9gTgK8CBTbs3oYyG8Ia6ypMkSZKgpoA2IjYDjqOMYvBh4H+a92fmecB1wKvrKE+SJElqqOsO7aeBh4DNMvNrwFVt0vwW2Lim8iRJkiSgvoB2a+C4zLytS5obKUN8SZIkSbWpK6BdEbhjkjRPqLE8SZIkCagvwLwZeO4kaTYBrq2pPEmSJAmoL6A9CXhpRPxTu50RsQuwFXB8TeVJkiRJQH0B7b8B9wCnRMS/A88BiIiXV6+PpgzbdUhN5UmSJElATQFtZt4M7ATcAvwLsDsQwM+r17cCO2fmZP1sBxYRL4yIf4uIkyLitojIiLiph3xrR8T3IuKWiJgfEXMj4tCIWK1Lnq0i4sSIuCsiHoqISyJin4hYskuePSLigoh4ICLujYgzI2LXLumXj4iDIuKKiHg4Im6PiB9HxEa2ZbzaIkmSplZtD2ll5kXAhsBuwL8D36Xckd0d2CgzL62rrA7eDOxHmansz71kiIj1gTnAnsAFwFcp/Xz3Bv4vItZok+dVwNnAtsCxwDeBZaq8R3Uo58vAbMooD98BfgA8H/hFRHygTfplgVOBzwD3AYcBv6KM43thRGxuW8ajLZIkaQQyc5FYKA+dvQBYpnqdwE2T5Dm5SvfBlu2HVNsPb9m+MnA7MJ8y5m5j+3LAuVWeN7bk2arafjWwWtP2WcCdwMPArJY8n6jyHA0s0bT9VdX2PzRvty0zty19XsNzJiYmctyss+/xuc6+x/eVXpKkzMyJiYkE5uSQceCUDKMVEStHxNMjYuWpOH47mXlxZv4uMx/pJX1ErEfpJjGXcjev2QHAg8DbImKFpu2vA54EHJWZFzaV/TDwqerl+1qO9d5q/fnMvLspT6PcZSl3Ihv1iqY8H8/MR5vy/Az4NaWP8otsy8xuiyRJGo3aAtqIWDIi9ouIq4G7KQHJ3RFxdbV9qbrKqskO1fqU5uAEIDPvB86hjJ27RZs8v2xzvLMps6VtVf003Uuek1rSAKwPPAO4MjOv6zGPbelcr+lsiyRJGoFaAtqIWIbSt/DzlJ9sb6T0fbyxev154FdVupliw2p9ZYf9jel7N+glT2YuAK4DlgLWA6juIq4FPJCZtw5bxqjy2Jba6tVWRMxptwDPniyvJElaWF13aD8CbAecQHkAbFZmbpmZsyiBwC+Abap0M8Uq1freDvsb21cdIs8oyhhVnplar0HyjKpekiRpBOrqBvBm4DJgtzY/E18TEa8BLgbeAnxxkAIi4sA2m2dXfR6nQlTrnOI8oyjDtvSfZ8rKyMxN2x6g3KWd6KNMSZJEfQHtM4GvtwazDZn5aEScBHxwiDIOaLPtTEpf3UE07qit0mH/yi3pBskzWfp2d/1GUa9B8tiW/uslSZJGoK4uB48AK06SZgXgr4MWkJnRZjlz0OMBV1TrTn0en1Wtm/tMdsxTPfS2LrCAMmYqmfkgcDOwYkSsOWwZo8pjW2qrlyRJGoG6AtpLgNdFxJPa7YyIJ1KGVvp9TeXV4YxqvVNEPO48RMRKwNbAPOC8pl2nV+ud2xxvW8rT9+dm5vwe8+zSkgbgGuAGYIOIWLfHPLalc72msy2SJGkE6gpov0EZB/SCiHhXRKxXTRG6bkTsCZxf7f9GTeUNLTOvAU6hjMKwV8vugyh3lI+s7uY1HAPcAbwxIjZrbIyI5YDPVS+/1XKsw6v1/s3TtkZEo9z5wPeb6pVNeQ5uDuqq2bC2Af4InGVbZnZbJEnSaNTShzYzfxwRm1Cmnv12myQBHJyZP66jvHYi4tlV+c1Wi4jZTa8/lpl3NL1+P2Umqa9FxI7A5cDmwPaUn473bz5YZt4XEe+hBFBnRsRRwF3AKymjORwD/Kglz7kRcQhlhIdLIuIYypSsbwBWp8yGNbel3ocAu1Luap8fEadRxkDdnTKm6jvb9Fe2LTOzLZIkaaoNO9VY80IZ7P47wIWUcTkvrF5vWWc5HcrejvKEebdlVpt8T6fcibuV0hf4euAwYPUuZW0NnEiZQGIecCnwYWDJLnn2AH5LmenqfsqdvF27pF+eckfyKsrdwr9Qplx9Tpc8tmUGtqWPa9ipbyVJi5W6pr6NzH5HMpI0FSJizsTExMScOXOmuyp9mbXfCQDM/eLLe07fa1pJ0qJt00035aKLLrooOwxp2avapr6VJEmSpsPAAW1ELBsRF0TEaRGxdJd0y1RpzuuWTpIkSRrEMHdo3wJsCnwlMzuOL5uZjwBfAl5Y5ZEkSZJqM0xA+xrg2sw8cbKEmflLykM0uw9RniRJkrSQYQLaF1Cmnu3V2cAmQ5QnSZIkLWSYgPaJwJ/7SP9nYI0hypMkSZIWMkxAOw9YsY/0KwIPD1GeJEmStJBhAtobgX/sI/1mwA1DlCdJkiQtZJiA9kxgi4jYbLKEEbEpsBVwxhDlSZIkSQsZJqD9BmU62aMjYqNOiSLi2ZSpQf8G/McQ5UmSJEkLWWrQjJl5RUR8FjgQ+F1EHAOcDtxECXTXBnYEXgssC3wmM68YusaSJElSk4EDWoDM/GxELAAOAN4MvKklSQB/BfbPzH8bpixJkiSpnaECWoDM/EJE/BB4J7A1sCYlkL0F+A3w/cy8fthyJEmSpHaGDmgBqoD1gDqOJUmSJPVjmIfCJEmSpGlnQCtJkqSxZkArSZKksWZAK0mSpLFmQCtJkqSxZkArSZKksWZAK0mSpLFmQCtJkqSxZkArSZKksWZAK0mSpLFmQCtJkqSxZkAraaTmfvHl010FSdIixoBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY82AVpIkSWPNgFaSJEljzYBWkiRJY22RCGgjYumIeHVE/FdEXBYR90XEQxFxaUR8NiJW6pJ37Yj4XkTcEhHzI2JuRBwaEat1ybNVRJwYEXdV5VwSEftExJJd8uwRERdExAMRcW9EnBkRu3ZJv3xEHBQRV0TEwxFxe0T8OCI2si3j1RZJkjS1FomAFlgf+CnwBuA64FvA94HlgU8DF0bEE1szRcT6wBxgT+AC4KvAtcDewP9FxBpt8rwKOBvYFjgW+CawTJX3qHaVi4gvA7OBNYHvAD8Ang/8IiI+0Cb9ssCpwGeA+4DDgF8Br67asrltGY+2SJKkEcjMsV+AtYD3Ayu0bF8GOB5I4Ott8p1c7ftgy/ZDqu2Ht2xfGbgdmA9s1rR9OeDcKs8bW/JsVW2/Glitafss4E7gYWBWS55PVHmOBpZo2v6qavsfmrfblpnblj6v4zkTExM5btbZ9/hcZ9/jp7sakqQxNDExkcCcHDYWHPYAM31pClwubdm+XrX9ujYB1UrAA8CDNAXJwDurPEe0KWeHat9ZLduPrLbv2SbPZ6t9BzVtC+D6avu6bfKcXe3b3rbM7LYMcK2OZUArSdKg6gpoF5UuB938tVovaNm+Q7U+JTMfbd6RmfcD5wBPALZok+eXbco5G3gI2Kr6abqXPCe1pIHSfeIZwJWZeV2PeWxL53pNZ1skSdIILA4B7TurdWvgsmG1vrJDvquq9Qa95MnMBZS7iktR7jISEStQukM8kJm3DlvGqPLYltrq1VZEzGm3AM+eLK8kSVrYIh3QRsQrgX8GbgIObtm9SrW+t0P2xvZVh8gzijJGlWem1muQPKOqlyRJGoGlprsCvYqIA9tsnp2Zczuk3wr4H0p/y9dm5t39Flmtc4rzjKIM29J/nikrIzM3bXuAcpd2oo8yJUkSYxTQAge02XYmMLd1Y0RsSenT+CiwS2Ze0CZv447aKm32QXlyvjndIHkmS9/urt8o6jVIHtvSf70kSdIIjE2Xg8yMNsuZrekiYhseG/Zpp8w8p8Mhr6jWnfo8PqtaN/eZ7JgnIpYC1qU8fHZtVecHgZuBFSNizWHLGFUe21JbvSRJ0giMTUDbi4jYgXJndgHwksw8r0vyM6r1ThHxuPMQZWaxrYF5QPMxTq/WO7c53raUp+/Pzcz5PebZpSUNwDXADcAGEbFuj3lsS+d6TWdbJEnSCCwyAW1E7ESZROFhYMfM/G239Jl5DXAKZSD9vVp2HwSsABxZ3c1rOAa4A3hjRGzWVPZywOeql99qOdbh1Xr/5mlbI6JR7nzKrGaNemVTnoObg7pqNqxtgD8CZ9mWmd0WSZI0GlG+p8dbRGwIXEyZGeonwGXt0mXmgS351qfMJPVk4GfA5cDmwPaUn463ysw7W/LsRgmgHqZMqXoX8ErKsE7HAK/PlpMaEV8BPkIZbeEYygxmbwDWoMyG9Y2W9MtS7vRtBVwInEYZA3V34BFgh8w837bM/Lb0IyLmTExMTMyZM2fQQ0iSNFY23XRTLrrooos6PTDds2FnZpgJC7Adpc9s16VD3qdT7sTdSglKrgcOA1bvUt7WwInA3ZSfvy8FPgws2SXPHsBvKaMu3E+5k7drl/TLU+5IXkW5W/gXypSrz+mSx7bMwLb0cR07U5gkabFS10xhi8QdWmlR4B1aSdLipq47tAa00gwREXcuv/zyq2+00UbTXRVJkkbi8ssvZ968eXdl5hrDHMeAVpohIuI6yni2c2s8bGM63T/VeEz1xnM/PTzv08dzPz3G/bzPAu7LzHYjCPXMgFZahFWzjzHsTznqn+d+enjep4/nfnp43otFZtguSZIkLZ4MaCVJkjTWDGglSZI01gxoJUmSNNYMaCVJkjTWHOVAkiRJY807tJIkSRprBrSSJEkaawa0kiRJGmsGtJIkSRprBrSSJEkaawa0kiRJGmsGtJIkSRprBrTSIigi1o6I70XELRExPyLmRsShEbHadNdtUVCdz+yw3NYhz1YRcWJE3BURD0XEJRGxT0QsOer6z2QR8bqI+HpE/Doi7qvO6Q8mydP3uY2IPSLigoh4ICLujYgzI2LX+ls0Pvo59xExq8tnICPiqC7leO6bRMQaEfHuiDg2Iq6OiHnVeflNRLwrItrGal73j7fUdFdAUr0iYn3gXODJwM+APwEvBPYGdo6IrTPzzmms4qLiXuDQNtsfaN0QEa8CfgI8DPwIuAt4BfBVYGtg9ymr5fj5FLAx5TzeBDy7W+JBzm1EfBn4aHX87wDLAG8EfhERH8zMb9TVmDHT17mv/B44rs32y9ol9ty3tTvwLeBW4AzgBuApwGuA7wK7RMTu2TQTltd9G5np4uKyCC3AyUACH2zZfki1/fDpruO4L8BcYG6PaVcGbgfmA5s1bV+O8odHAm+c7jbNlAXYHngWEMB21fn5QV3nFtiq2n41sFrT9lnAnZQAYdZ0n4cxOPezqv2z+zi+5779edmBEowu0bL9qZTgNoHXNm33um+z2OVAWoRExHrATpSA65stuw8AHgTeFhErjLhqi7PXAU8CjsrMCxsbM/Nhyh0xgPdNR8Vmosw8IzOvyurbdhKDnNv3VuvPZ+bdTXnmUj4zywJ7Dlj9sdbnuR+E576NzDw9M3+RmY+2bL8NOLx6uV3TLq/7NgxopUXLDtX6lDb/Od4PnAM8Adhi1BVbBC0bEW+NiE9GxN4RsX2HvmuN9+SXbfadDTwEbBURy05ZTRddg5zbbnlOakmjyT0tIv65+hz8c0T8Q5e0nvv+/bVaL2ja5nXfhn1opUXLhtX6yg77r6Lcwd0AOG0kNVp0PRX475Zt10XEnpl5VtO2ju9JZi6IiOuA5wLrAZdPSU0XXX2d2+qXibWABzLz1jbHu6pabzAVlV1EvaRa/i4izgT2yMwbmrZ57vsUEUsBb69eNgeiXvdteIdWWrSsUq3v7bC/sX3Vqa/KIu37wI6UoHYF4PnAf1L6o50UERs3pfU9mTr9nlvfi/o8BPwrsCmwWrW8iPJQ03bAaS1dmzz3/fsi8DzgxMw8uWm7130bBrTS4iWq9VT1kVssZOZBVb+3P2fmQ5l5WWa+l/Lg3fLAgX0czvdk6gx6bn0vJpGZt2fmZzLzosy8p1rOpvwCdD7wTODdgxy61oqOqYj4EGVEgj8Bb+s3e7VerK57A1pp0dL4S3uVDvtXbkmnejUe4Ni2aZvvydTp99xOln6yO1maRGYuoAw1Bf19Djz3lYjYCzgM+COwfWbe1ZLE674NA1pp0XJFte7UF+pZ1bpTH1sN5/Zq3fxTa8f3pOojty7lgY9rp7Zqi6S+zm1mPgjcDKwYEWu2OZ6fj3r8pVr//XPgue9NROwDfIMyju/21UgHrbzu2zCglRYtZ1TrnVpnl4mIlSgDbs8Dzht1xRYTW1br5uD09Gq9c5v021JGnTg3M+dPZcUWUYOc2255dmlJo8E0RlFp/SPNc99FROxLmRjhYkowe3uHpF73bRjQSouQzLwGOIXycNJeLbsPotwxObL6i10DiIjnRsTqbbavQ7mzAtA8XegxwB3AGyNis6b0ywGfq15+a4qqu6gb5Nw2uoXsH01TQUfELMpnZj7loT91ERGbR8QybbbvAHy4etk6ba7nvoOI+DTlIbA5wI6ZeUeX5F73bcTUjZ8saTq0mfr2cmBzyixAVwJbpVPfDiwiDgT2o9wNvw64H1gfeDllpp4TgVdn5iNNeXajfAk9DBxFmabylZThd44BXj+Fg9mPlepc7Va9fCrwUsqdvl9X2+7IzI+1pO/r3EbEV4CPUKYAPYYyBegbgDUoM+yN9xSgA+rn3FdDcz0XOJNyHgH+gcfGMv10ZjaCq+YyPPctImIPYDbwN+DrtO/LOjczZzfl2Q2v+8eb7qnKXFxc6l+Ap1P+2r4VeAS4nvKQwerTXbdxXyhDE/0v5enjeygDn/8FOJUyZmR0yLc1Jdi9m9Lt41LKnawlp7tNM2mhjBCRXZa5dZxbYA/gt5TZ8+4HzgJ2ne72j8u5B94FHE+ZlfAByh2+G4AfAdtMUo7nvr/znsCZbfJ53Tct3qGVJEnSWLMPrSRJksaaAa0kSZLGmgGtJEmSxpoBrSRJksaaAa0kSZLGmgGtJEmSxpoBrSRJksaaAa0kSZLGmgGtJEmSxpoBrSRJksaaAa0kSZLGmgGtJEmSxpoBrSRJksaaAa0kLeYi4kMR8ceImBcRGRH7dNsXEbOqf88eosyhjzFdImLZiPhERFxanZd5EXFxRLyrhmN3fC9GKSI+UpX/pukoX+rXUtNdAUlanETEs4G9gO2BpwPLA3cAvwN+CvwwMx8eYX3eCBxWlX8oMB84b7J94yoiZgHXAUdk5jsGyL8McAqwLXAxcDjlPXwT8N2IuCkzTx6wbjPpfG9WreeMqsCIeB3wImATYGNgJcrn4a2jqoPGlwGtJI1IRHwGOIDy69h5wBHAA8BTgO2A7wLv47FgYhR2bawz85Ze9kXE0sBGwL1DlHtzDceYDntTgtn/BN6XmQkQEWcDPwT+CRgooKX7ezFq+wGfBa4aYZmfogSyDwA3Ac8eYdkacwa0kjQCEfFJ4CDgRmD3zDy/TZpdgY+OuGpPA+gQQLXdl5l/Bf40TKF1HGOavBd4CPhoI5itLKjWdw5x7G7vxUhl5g3d9kfEfwG/B07KzLqC3g9TAtmrKXdqz6jpuFoM2IdWkqZY9TP3gcBfgZe1C2YBMvN4YOeWvK+PiLMj4t6qX+WlVf/NZbuUt3lEHBMRt0XEIxFxY0T8Z0Q8rSnNgRGRlK4PVP0lG0vHfY32dOv/GhEvjIgfRcTNETE/Im6NiFMi4vXN56TTMXqpf7vjVP8+KiLuiIiHI+LC6o+Ev7eZ0t0AYI+WNr+j0/lsyr8OsB5wWmY+2LK70bbTJztOm+N2Pd9N6Xq6FlrOyQbVe3F7RDwaEdv1UJ/tq/wHd0m2DaV7xJURcXVEfD0iXhYRy/fb/obMPCMzr2r5Q0HqiXdoJWnq7QksDRyVmZd1S5iZ8xv/jogvAJ+g9LH9H8pPsbsAXwBeGhEvqe500pRnT+A7lP6XP6fcEX4W8G7gFRGxRXX37cwqyzuAdSh3jxu67esqIt4DfAv4W1X+VcCTKd0o3g/8eJL8vda/1TrABcC1wH8DqwNvAH4WES/OzDOqdq1K6Tbwe+C4pvwX99C8RleQv/9BEhEBfAh4LfCrzLykh+O0OrNav4MO53uQawFYv6rrlZTuEMsD9/VQn4lq/bsuaTamBOAvq+rxgWp5OCLOBE6i3ru3UneZ6eLi4uIyhQtwGpDAu/vIs2WV5wbgqU3blwJ+Ue37ZEueDYBHKD/ZrtWybwdKkHlsy/Yzy1dB2zq03QfMqsqf3bL9OZS70HcBz22Tb+1uxxiw/o3jJHBAy76XVttPnKzuPb4nX6jy7kwJ5r5D6TaRlID4KUNeJ53Od1/XQss5+cIA9fhhlXfDPvJsSOkycCrlj5FG+VcDXwc277MO21X5fzDMOXVZfBa7HEjS1FuzWt/UR553VuvPZeZtjY2ZuYDSz/ZRyl3LZu+j3AneOzNvbt6RmadT7ni+IiJW6qMe/XgfJcj618z8Q+vOzJys/cPU/3rgcy15TqYEgS/suQXdbVqtLwT+mXL+N6y2/QlYsjVDRHwzIn46ZLmDXAsAf6aPu+tNJih3gHu+u5qZV2TmVzPzJZS743tT/rBZn3Ln9oMD1EPqmV0OJGnqRbXup29g42ffhfpkZuaVEXETsG5ErJqZ91S7tqzWL4qIf2xzzCdTgq4NmJrhmLao1icNmH+Y+l+cmX9rk+fGpuMOawK4PjPviIi3UIZfex4lWHsD5Q71P7Tk+RTlrvWw5UJ/1wLA77OpC0svImIFyvk9NzMf7SPfMpR+tbtUy3OqXX8FzuXx3Tuk2hnQStLUu4UyBNHafeRZpVrf2mH/rcAzqnT3VNvWqNb/MsmxV+yjHv1YtVrf3C1RF8PU/54OaRdQwwPQ1QNhTwTOAqiC5zur12dFxMXAxhGxXmZe28iXmXcPWzaDXQsAt7VN3d3GlPN10WQJq4cdd6YEsDsCK1S7bgG+B5wInJqZvfTblYZiQCtJU+83lD6gOwL/1WOexvisTwWuabN/zZZ0zf9eZZqCiHuq9VoMNiTXdNe/m+buBu00Atf7GxsiYm3KHeJnZ+YVQ5Q9yLUA/f0i0NDLA2FUD369qHq5gDKu8omUB8EuHqBcaSj2oZWkqfd9yk+vr42I53RL2DQEUyOg2K5NmmdS7vZe1/ITc2NWqW2GqewQGuXvMmT+qax/o1vCQv1dJ9EIaBfqqhERqwNbA5dm5l+adm1CGbN22Cf9B7kWBtUIaCe7Q7ssMJvS1eJJmblNZv6bwaymiwGtJE2xzJxLGYd2GeCEiGg7E1hE7Mxj/U+/V60/FRFPakqzJPBlyv/frXd7v0EJnL8aERu0Of4yETGVweK3KHfrPt0ucK/uWHYzivrfTblz+Yw+8zUC2jdUQ3X9vU6UWcOWBr7Skmdj4JJ++qJ2MMi1MKgJ4GHgj90SZeaWmblnZv64pkBaGopdDiRpBDLzCxGxFGXq299GxLmUn68bU99uSxlv9cIq/bnVwPYfBy6LiGOAByl3P59H6cbwpZYy/hQR76QEQH+IiF9SxiBdmhLAbQP8hSmaUjQz/xgR7wcOB34XET+j3J1cgzKG6/1Ukwd0yD/l9c/MByLifGCbiPhhdfy/AT/P7mPINgLadwGbRMTpwErATpTJFmZn5hEteTaht/FtJ6tz39fCIKpfB55DecBuwSRpT6N0LenVzzPz45Mcczdgt+rlU6v1lk2Tb9yRmR/ro0wtRgxoJWlEMvOzEXE0ZYKB7SkTLixHebjoYuDfgR80pd83In5HGfbo7ZTA7hrKk/NfycxH2pTxg4j4PWU4p+0pAdeDlAd1jgF+NFXtq8r/TkRcBnyM8hP5bpTJAC4BvttD/lHU/23AVykPNL2JMgrFTVUdFxIRz6A8EHYqpZ/wDsBHKH1WLwL2y8yj22TdmIXv2g5kkGthAM+rjjvpA2GU4bjW6ePYC83y1sYmwB4t29arFihDsxnQqq3IdIY5SZI6iYhXAz8FPp6ZPd0JrYa/ug/YKjtMdSypPvahlSSpu0Z3g17uXDY0xqO9tOa6SGrDgFaSpO4aAW3XoaxabAxcmZkPTUF9JLWwy4EkSV1ExO3AQ5k5a7rrIqk9A1pJkiSNNbscSJIkaawZ0EqSJGmsGdBKkiRprBnQSpIkaawZ0EqSJGmsGdBKkiRprBnQSpIkaawZ0EqSJGmsGdBKkiRprBnQSpIkaawZ0EqSJGmsGdBKkiRprBnQSpIkaawZ0EqSJGmsGdBKkiRprP1/2FVCNYlbjsIAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "image/png": { "height": 199, "width": 346 }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plotbeta(lm_beta, \"OLS\", yrange=None)\n", "#plt.tight_layout(); plt.savefig(\"ames-ols-2.png\",dpi=150,bbox_inches=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regularization\n", "\n", "Both L1 and L2 regularization require normalizing variables; sklearn has an option to do this as part of the training process, but let's do it explicitly as a way to get started." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from pandas.api.types import is_numeric_dtype\n", "def normalize(X): \n", " for colname in X.columns:\n", " if is_numeric_dtype(X[colname]):\n", " u = np.mean(X[colname])\n", " s = np.std(X[colname])\n", " X[colname] = (X[colname] - u) / s" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "cols_with_missing = df_ames.columns[df_ames.isnull().any()]\n", "cols = set(df_ames.columns) - set(cols_with_missing)\n", "\n", "X = df_ames[cols].drop('SalePrice', axis=1)\n", "y = df_ames['SalePrice']\n", "\n", "normalize(X) # do this before getting dummy variables\n", "X = pd.get_dummies(X) # make sure to normalize before you get the dummy variables" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=999)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OverallQual3SsnPorchMoSoldWoodDeckSFFullBathLowQualFinSFMSSubClassBsmtFinSF1BsmtUnfSFLotArea...LandContour_BnkLandContour_HLSLandContour_LowLandContour_LvlRoofStyle_FlatRoofStyle_GableRoofStyle_GambrelRoofStyle_HipRoofStyle_MansardRoofStyle_Shed
1441-0.071836-0.116339-0.4891100.437009-1.026041-0.1202421.4922820.555685-0.942327-0.610435...0001010000
533-3.688413-0.116339-1.969111-0.752176-1.026041-0.120242-0.872563-0.973018-1.284176-0.552908...0010010000
1078-0.071836-0.116339-0.4891100.365179-1.026041-0.1202421.4922820.478921-0.863090-0.609533...0001010000
\n", "

3 rows × 216 columns

\n", "
" ], "text/plain": [ " OverallQual 3SsnPorch MoSold WoodDeckSF FullBath LowQualFinSF \\\n", "1441 -0.071836 -0.116339 -0.489110 0.437009 -1.026041 -0.120242 \n", "533 -3.688413 -0.116339 -1.969111 -0.752176 -1.026041 -0.120242 \n", "1078 -0.071836 -0.116339 -0.489110 0.365179 -1.026041 -0.120242 \n", "\n", " MSSubClass BsmtFinSF1 BsmtUnfSF LotArea ... LandContour_Bnk \\\n", "1441 1.492282 0.555685 -0.942327 -0.610435 ... 0 \n", "533 -0.872563 -0.973018 -1.284176 -0.552908 ... 0 \n", "1078 1.492282 0.478921 -0.863090 -0.609533 ... 0 \n", "\n", " LandContour_HLS LandContour_Low LandContour_Lvl RoofStyle_Flat \\\n", "1441 0 0 1 0 \n", "533 0 1 0 0 \n", "1078 0 0 1 0 \n", "\n", " RoofStyle_Gable RoofStyle_Gambrel RoofStyle_Hip RoofStyle_Mansard \\\n", "1441 1 0 0 0 \n", "533 1 0 0 0 \n", "1078 1 0 0 0 \n", "\n", " RoofStyle_Shed \n", "1441 0 \n", "533 0 \n", "1078 0 \n", "\n", "[3 rows x 216 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train.head(3) # These values should be normalized" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Note that the sklearn model constructors call the parameter `alpha` not `lmbda`.\n", "lmbda = 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## L1 (Lasso) regularization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**1. Train an L1 (lasso) linear regression model using lmbda**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lasso = ...\n", "# fit model to X,y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "
\n",
    "lasso = Lasso(alpha=lmbda, tol=.1)\n",
    "lasso.fit(X_train, y_train)\n",
    "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2. Examine the training and testing scores**\n", "\n", "Note that the sklearn model constructors call the parameter `alpha` not `lmbda`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"{lasso.score(X_train, y_train):.2f} R^2 on training set\")\n", "print(f\"{lasso.score(X_test, y_test):.2f} R^2 on test set\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Why is the testing score much worse than the training score?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "Because the lambda hyperparameter is small, 5, the coefficients are similar to those of the OLS model which do not generalize well; the coefficients are overfit to the training set, which makes them performed poorly on the unseen test set.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Why is the training score about the same for OLS and regularized regression?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "Regularization is mostly about generalization and training scores tell us nothing about generalization.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3. Count how many $\\beta_{1..p}$ coefficients are (close to) zero**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum(...) # how many close to 0?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "
\n",
    "sum(np.abs(lasso.coef_) < 1e-5) # how many close to 0?\n",
    "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4. Plot the $\\beta_{1..p}$ coefficients using `plotbeta`**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotbeta(lasso.coef_, \"Lasso $\\lambda=\"+str(lmbda)+\"$: \")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## L2 (Ridge) regularization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**1. Train an L2 (Ridge) linear regression model using lmbda**\n", "\n", "Note that the sklearn model constructors call the parameter `alpha` not `lmbda`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ridge = ...\n", "# fit to X,y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "
\n",
    "ridge = Ridge(alpha=lmbda)\n",
    "ridge.fit(X_train, y_train)\n",
    "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2. Examine the training and testing scores**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"{ridge.score(X_train, y_train):.2f} R^2 on training set\")\n", "print(f\"{ridge.score(X_test, y_test):.2f} R^2 on test set\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Why are the R^2 scores for L2 similar to L1?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "L1 and L2 Regularization is solving the same problem: constraining the size of the coefficients. The only difference is the metric used. There both effective and so we would expect there scores to be similar.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3. Count how many $\\beta_{1..p}$ coefficients are (close to) zero**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum(np.abs(ridge.coef_) < 1e-5) # how many close to 0?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4. Plot the $\\beta_{1..p}$ coefficients using `plotbeta`**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotbeta(ridge.coef_, \"Ridge $\\lambda=\"+str(lmbda)+\"$: \")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Compare the standard deviation of the coefficients for L1 and L2. What do you notice visually?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "The L2 coefficients are smaller and have smaller standard deviation; they are more tightly grouped. Also note that there are fewer L2 coefficients at 0.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Effect of $\\lambda$ on regularization and R^2 scores\n", "\n", "The goal of the following code snippets is to help you visualize how the $\\lambda$ regularization parameter affects model parameters and associated training and testing scores. There are a number of important questions to answer following the code snippets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig,axes = plt.subplots(1,7,figsize=(13.5,1.8), sharey=True)\n", "for i,lmbda in enumerate([1,10,100,200,500,1000,2000]):\n", " lasso = Lasso(alpha=lmbda, tol=.1)\n", " lasso.fit(X_train, y_train)\n", " r2 = lasso.score(X_train, y_train)\n", " r2t = lasso.score(X_test, y_test)\n", " print(f\"lambda={lmbda:4d}: Zeros {sum(np.abs(lasso.coef_) < 1e-5):3d}: train R^2 {r2:.2f} test R^2 {r2t:.2f}\")\n", " plotbeta(lasso.coef_, f\"$\\\\lambda={lmbda}$\\n\", ax=axes[i], fontsize=9, xlabel=False, ylabel=i==0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Why does the training R^2 score go down as we increase lambda?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "We are trading a bit of bias for improve generality. As we increase $\\lambda$, we are restricting how close to the true minimum loss function location we can get.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Why does the L1 testing R^2 score go up we increase lambda?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "As we constrain the coefficients, we reduce the overfitting to the training data and hence generality improves.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Describe what is happening to the L1 coefficients as we increase lambda?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "The average magnitude of the coefficients is not changing much, but they are becoming a tighter group; the standard deviation is shrinking significantly. Many more coefficients are going to zero. Note that about 80% of the coefficients have gone to zero when we get the best test score.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig,axes = plt.subplots(1,7,figsize=(13.5,1.8), sharey=True)\n", "for i,lmbda in enumerate([1,10,100,200,500,1000,2000]):\n", " ridge = Ridge(alpha=lmbda, tol=.1)\n", " ridge.fit(X_train, y_train)\n", " r2 = ridge.score(X_train, y_train)\n", " r2t = ridge.score(X_test, y_test)\n", " print(f\"lambda={lmbda:4d}: Zeros {sum(np.abs(ridge.coef_) < 1e-5)}: train R^2 {r2:.2f} test R^2 {r2t:.2f}\")\n", " plotbeta(ridge.coef_, f\"$\\\\lambda={lmbda}$\\n\", ax=axes[i], fontsize=9, xlabel=False, ylabel=i==0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Why does the training R^2 score go down as we increase lambda?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", "We are introducing bias as we did for L1, in exchange for increased generality.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Describe what is happening to the L2 coefficients as we increase lambda?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", " The average magnitude of the coefficients goes down and they become a tighter group, but not as quickly as L1. We don't get any more coefficients going to zero.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q.** Characterize how the magnitudes of L1 and L2 coefficients differ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Solution\n", " The L2 coefficients are in general much tighter group than the L1, even as we increase $\\lambda$. The L1 regularization drops many of the coefficients to zero for the same value of $\\lambda$.\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }