{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn import tree, model_selection, datasets, metrics, ensemble, linear_model, neighbors\n", "import graphviz as gv\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. In the lab, we applied random forests to the Boston data using mtry=6 and using ntree=25 and ntree=500. Create a plot displaying the test error resulting from random forests on this data set for a more comprehensive range of values for mtry and ntree. You can model your plot after Figure 8.10. Describe the results obtained." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>CRIM</th>\n", " <th>ZN</th>\n", " <th>INDUS</th>\n", " <th>CHAS</th>\n", " <th>NOX</th>\n", " <th>RM</th>\n", " <th>AGE</th>\n", " <th>DIS</th>\n", " <th>RAD</th>\n", " <th>TAX</th>\n", " <th>PTRATIO</th>\n", " <th>B</th>\n", " <th>LSTAT</th>\n", " <th>Price</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>0.00632</td>\n", " <td>18.0</td>\n", " <td>2.31</td>\n", " <td>0.0</td>\n", " <td>0.538</td>\n", " <td>6.575</td>\n", " <td>65.2</td>\n", " <td>4.0900</td>\n", " <td>1.0</td>\n", " <td>296.0</td>\n", " <td>15.3</td>\n", " <td>396.90</td>\n", " <td>4.98</td>\n", " <td>24.0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>0.02731</td>\n", " <td>0.0</td>\n", " <td>7.07</td>\n", " <td>0.0</td>\n", " <td>0.469</td>\n", " <td>6.421</td>\n", " <td>78.9</td>\n", " <td>4.9671</td>\n", " <td>2.0</td>\n", " <td>242.0</td>\n", " <td>17.8</td>\n", " <td>396.90</td>\n", " <td>9.14</td>\n", " <td>21.6</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>0.02729</td>\n", " <td>0.0</td>\n", " <td>7.07</td>\n", " <td>0.0</td>\n", " <td>0.469</td>\n", " <td>7.185</td>\n", " <td>61.1</td>\n", " <td>4.9671</td>\n", " <td>2.0</td>\n", " <td>242.0</td>\n", " <td>17.8</td>\n", " <td>392.83</td>\n", " <td>4.03</td>\n", " <td>34.7</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>0.03237</td>\n", " <td>0.0</td>\n", " <td>2.18</td>\n", " <td>0.0</td>\n", " <td>0.458</td>\n", " <td>6.998</td>\n", " <td>45.8</td>\n", " <td>6.0622</td>\n", " <td>3.0</td>\n", " <td>222.0</td>\n", " <td>18.7</td>\n", " <td>394.63</td>\n", " <td>2.94</td>\n", " <td>33.4</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>0.06905</td>\n", " <td>0.0</td>\n", " <td>2.18</td>\n", " <td>0.0</td>\n", " <td>0.458</td>\n", " <td>7.147</td>\n", " <td>54.2</td>\n", " <td>6.0622</td>\n", " <td>3.0</td>\n", " <td>222.0</td>\n", " <td>18.7</td>\n", " <td>396.90</td>\n", " <td>5.33</td>\n", " <td>36.2</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\n", "0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 \n", "1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 \n", "2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 \n", "3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 \n", "4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 \n", "\n", " PTRATIO B LSTAT Price \n", "0 15.3 396.90 4.98 24.0 \n", "1 17.8 396.90 9.14 21.6 \n", "2 17.8 392.83 4.03 34.7 \n", "3 18.7 394.63 2.94 33.4 \n", "4 18.7 396.90 5.33 36.2 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "boston_df = datasets.load_boston()\n", "boston_df = pd.DataFrame(data=np.c_[boston_df['data'], boston_df['target']], columns= [c for c in boston_df['feature_names']] + ['Price'])\n", "boston_df.head()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def run_cv(df, n_splits, extract_x, extract_y, fit_model, loss_fn):\n", " cv_result = pd.DataFrame()\n", " for train_idx, test_idx in model_selection.KFold(n_splits=n_splits).split(df):\n", " train, test = df.iloc[train_idx], df.iloc[test_idx]\n", " errors = []\n", " # train\n", " train_X = extract_x(train)\n", " train_y = extract_y(train)\n", " model = fit_model(train_X, train_y)\n", " # test\n", " test_X = extract_x(test)\n", " test_y = extract_y(test)\n", " preds = model.predict(test_X)\n", " errors.append(loss_fn(preds, test_y))\n", " cv_result = cv_result.append(pd.Series(errors), ignore_index=True)\n", " return cv_result" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "rnge = range(1, 50)\n", "\n", "sqrt_rmses = [run_cv(boston_df,\n", " 5,\n", " lambda df: df.drop('Price', axis=1),\n", " lambda df: df.Price,\n", " lambda x, y: ensemble.RandomForestRegressor(n_estimators=i, max_features='sqrt').fit(x,y),\n", " lambda preds, true: np.sqrt(metrics.mean_squared_error(true, preds))).mean()[0] for i in rnge]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "log2_rmses = [run_cv(boston_df,\n", " 5,\n", " lambda df: df.drop('Price', axis=1),\n", " lambda df: df.Price,\n", " lambda x, y: ensemble.RandomForestRegressor(n_estimators=i, max_features='log2').fit(x,y),\n", " lambda preds, true: np.sqrt(metrics.mean_squared_error(true, preds))).mean()[0] for i in rnge]" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "all_fx_rmses = [run_cv(boston_df,\n", " 5,\n", " lambda df: df.drop('Price', axis=1),\n", " lambda df: df.Price,\n", " lambda x, y: ensemble.RandomForestRegressor(n_estimators=i, max_features=None).fit(x,y),\n", " lambda preds, true: np.sqrt(metrics.mean_squared_error(true, preds))).mean()[0] for i in rnge]" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1128a1278>" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEKCAYAAAARnO4WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzs3Xdc1WX7wPHPfdiHPVVAQXHvvUeWuyyztLK9tK1PZetp711aWdrza5dllrPc5R6FCm5FRGRP2euM+/fHURSRmYjB9X69eKnnu64DeHFzf6/7+iqtNUIIIRo+Q30HIIQQ4uKQhC+EEI2EJHwhhGgkJOELIUQjIQlfCCEaCUn4QgjRSEjCF0KIRkISvhBCNBKS8IUQopGwr+8Azubn56dDQ0PrOwwhhPhX2blzZ7rW2r+q/S6phB8aGkp4eHh9hyGEEP8qSqnY6uwnUzpCCNFISMIXQohGQhK+EEI0EpLwhRCikZCEL4QQjYQkfCGEaCQk4QshRCPRIBJ+gamATyM+JSI1or5DEUKIS9YltfCqtgzKwJzIOdgb7Oke0L2+wxFCiEtSgxjhO9s74+7oTlphWn2HIoQQl6wGkfAB/F38SSuQhC+EEBVpOAnf6C8jfCGEqETDSfgu/qQXptd3GEIIcclqUAk/tSAVrXV9hyKEEJekhpPwjf6YrCZySnLqOxQhhLgkNZyE72Lr/S83boUQ4vwaTML3c/EDILUwtZ4jEUKIS1ODSfj+RtsIX27cCiHE+TWchC9TOkIIUakGk/CNDkZcHVxlhC+EEBVoMAkfzpRmCiGEKK9hJXyjLL4SQoiKNKiE7+fiJ+0VhBCiAg0q4Z9uoCarbYUQorwGlfADjAEUWYrIM+XVdyhCCHHJaVAJ//TiK5nWEUKI8hpUwj9di59eIDduhRDiXA0q4fsZpb2CEEJUpEEl/ACXAEBG+EIIcT4NKuG7OrjiYu8ic/hCCHEeDSrhK6VstfjST0cIIcppUAkfTtXiywhfCCHKaXgJX9orCCHEeTW8hC8N1IQQ4rzqPOErpf6jlNqvlNqnlJqvlHKuy+v5G/0pMBdQYCqoy8sIIcS/Tp0mfKVUEPAI0Ftr3RmwA26sy2uWPghF5vGFEKKMizGlYw+4KKXsASOQWJcXK22vIJU6QghRRp0mfK11AvAucAJIArK11qvr8poBRtviKxnhCyFEWXU9peMNXAO0BAIBV6XULefsM1UpFa6UCk9L++dJWkb4QghxfnU9pTMCiNFap2mtTcCvwMCzd9Baz9Na99Za9/b39//HF/Rw9MDR4CilmUIIcY66TvgngP5KKaNSSgFXAAfr8oJKKfyN/tJATQghzlHXc/g7gIXALmDvqevNq8trgq1SRxqoCSFEWfZ1fQGt9QvAC3V9nbP5G/2Jzoq+mJcUQohLXoNbaQtIAzUhhDiPBpnwA4wB5JpyKTIX1XcoQghxyWiQCV+ebSuEEOU1yIRf+mxbKc0UQohSDTPhG20JX7pmCiHEGQ0z4csIXwghymmQCd/LyQt7g71U6gghxFkaZMJXSsmjDoUQ4hwNMuHDqWfbyghfCCFKNdiE7+fiJyN8IYQ4S4NN+P5GmdIRQoizNdyE7+JPdnE2JZaS+g5FCCEuCQ034RulNFMIIc7WYBO+tFcQQoiyGmzCL322rVTqCCEE0IATvozwhRCirAab8H2cfbBTdjLCF0KIUxpswjcoA74uvjLCF0KIUxpswgekvYIQQpylwSd8eZi5EELYNOiE72eU9gpCCHFag074AS4BZBZlYrKa6jsUIYSodw064fsZbaWZGYUZ9RyJEELUvwad8OXJV0IIcUbDTvjybFshhCjVsBO+jPCFEKJUg074Ps4+KJRU6gghBA084dsb7G2rbaW9ghBCNOyED7LaVgghTmv4Cd/oLzdthRCCRpDwA10DSchLqO8whBCi3jX4hB/kFkRuSS45JTn1HYoQQtSrBp/wA90CAUjKS6rnSIQQon41+IQf5B4EQHxefD1HIoQQ9avhJ3xXW8JPzEus50iEEKJ+1SrhK6W8lVJdL3QwdcHTyROjvVESvhCi0at2wldKrVdKeSilfIBdwOdKqferOKadUirirI8cpdSMfxp0TSilCHIPkikdIUSjZ1+DfT211jlKqXuAb7TWLyil9lR2gNb6MNAdQCllByQAi2odbS0FuQbJCF8I0ejVZErHXinVDJgMLK/Fta4AorXWsbU49h8JdAskMS8RrfXFvrQQQlwyapLwXwZWYUvafyulWgFRNTj+RmB+TYK7UILcgsgz5UktvhCiUat2wtda/6y17qq1vv/Uv49pra+rzrFKKUfgauDn82ybqpQKV0qFp6XVTc+bIDdbpY6suBVCNGY1uWnbVim1Tim179S/uyqlnq3m4WOBXVrrlHM3aK3naa17a617+/v7VzecGjm9+Erm8YUQjVlNpnQ+B54GTABa6z3Ypmmq4ybqaToHziy+khG+EKIxq0nCN2qt/zrnNXNVBymlXIGRwK81CexC8nD0wN3BXRK+EKJRq0lZZrpSKgzQAEqp64EqG9RorfMB39qFd+GcrtQRQojGqiYJ/0FgHtBeKZUAxAC31ElUdSDQLZC43Lj6DkMIIepNtRO+1voYMOLUFI1Ba51bd2FdeEFuQWxP2o7WGqVUfYcjhBAXXU2qdKYrpTyAAuADpdQupdSougvtwgpyC6LQXEhWcVZ9hyKEEPWiJjdt79Ja5wCjsM3J3wq8WSdR1QEpzRRCNHY1Sfin50HGYeuls/+s1y55pxdfSRM1IURjVZOEv1MptRpbwl+llHIHrHUT1oUnI3whRGNXkyqdu7F1vjymtS5QSvkCd9ZNWBeeu6M7Ho4eUosvhGi0alKlY1VKmYGhSqmzj6u0RfKlJMgtSBK+EKLRqnbCV0p9AXQF9nNmKkdTjytoayrILYhj2cfqOwwhhKgXNZnS6a+17lhnkVwEgW6BbE7YLLX4QohGqSY3bbcppf7VCT/ILYgiSxEZRRn1HYoQQlx0NRnhf4Mt6ScDxdhKMrXW+l/xMHM4U5qZmJeIn4tfPUcjhBAXV00S/v9hW2y1l39ROebZzi7N7Or/r/k5JYQQF0RNEn6a1nppnUXyD5gtVqLT8vE2OhDg4VzhfrL4SgjRmNVkDn+3UuoHpdRNSqmJpz/qLLIayC0yM/rDjSyJqHxRldHBiLeTtyy+EkI0SjUZ4btgm7s/u2HaJVGW6e3qiK+rI0dT86rcV/riCyEaq2olfKWUHbBHa/1BHcdTa2H+bkSnVZ3wg9yCOHLyyEWISAghLi3VmtLRWluwPZf2khUW4MbRtDy01pXuF+QWRGJeIlb9r7zvLIQQtVaTOfwtSqmPlVJDlFI9T3/UWWQ1FObvSlaBicz8kkr3C3QLpMRaQkah1OILIRqXmszhdz/158tnvaaByy9cOLXXOsANgKOpefi6OVW43+nSzIS8BPyN/hclNiGEuBTUpHna8Mq2K6Vu11p//c9Dqp0wf1vCj07Lp1+rip+ZHuwWDNgSfveA7hXuJ4QQDU1NpnSqMv0CnqvGgrxccHYwVFmp08ytGSB98YUQjc+FTPj12o3MYFC08qu6UsfF3gUfZx9pkyyEaHQuZMKvvDzmImgd4FatWvxgt2BJ+EKIRqfBjPDBNo+fkFVIYYml0v1k8ZUQojG6kAl/ywU8V62crtSpalon0C2QxPxELNbKfzAIIURDUu2Er5T6Vinleda/Q5RS607/W2v90IUOrqaqm/CD3IIwW82kFaZdjLCEEOKSUJMR/mZgh1JqnFLqXmAN8GHdhFU7oX5GDAqiq5jHP7svvhBCNBY1qcOfq5TaD/wJpAM9tNbJdRZZLTjZ29HCx0h0Wn6l+529+Kpnk0tmsbAQQtSpmkzp3Ap8AdwGfAX8rpTqVkdx1VqYf9WVOmcnfCGEaCxq0lrhOmCw1joVmK+UWoQt8feoi8Bqq3WAG5ui0rFYNXaG8xcOOdk54e/iL1M6QohGpcoRvlLqrVN//eFUsgdAa/0X0K+uAqutMH83SixW4jILKt0v0C1QRvhCiEalOlM645RSCnjq3A1a68pbU9aDsBpU6kjCF0I0JtVJ+CuBk0BXpVTOWR+5SqmcOo6vxlr7n+maWZkgtyBS8lMwW80XIywhhKh3VSZ8rfVMrbUX8JvW2uOsD3ettcdFiLFGPI0O+Lk5VWvxlVmbSS1IrXQ/IYRoKKpdpaO1vqay7Uqpbf88nAsjzN+1yhF+sPuZNslCCNEYXMjWCs4X8Fz/SOsAN6LT8it93OHpvvjxufEXKywhhKhXdd4tUynlpZRaqJQ6pJQ6qJQacAGveebiWqPNtvn4MH83sgtNpOdVfE+5qWtT7JQdcblxdRGOEEJcci5kwq/ILGCl1ro90A04eKEvYM7IIGrAQLIWLgSq11PH3mBPU9emxOfJCF8I0TjUaXvkU83WhgL/B7YyTq111gW8JgB2Pj5oi4WiQ4eAM6WZ1ZnHT8iVOXwhRONwIRP+red5rSWQBnyplNqtlPqfUsr17B2UUlOVUuFKqfC0tNp1r1RK4dSuLcWHjwDQzMMZo6NdlZU6wW7BMsIXQjQa1Vlpm3tO/f156/C11vvOc7g90BP4VGvdA8jnnAVcWut5WuveWuve/v7+tX4jzu3aU3z4MNpqtT3usJqVOplFmRSYKl+VK4QQDUF16vDdz6m/r0kdfjwQr7XecerfC7H9ALjgnNq1xVpQgCnBNkXT2t+NY1V0zTxdmimjfCFEY1CdEb5PZR+VHXuqfXKcUqrdqZeuAA5cgLjLcW7fHuDMPP6pxx3mF1e8klZKM4UQjUl1umXuxFZyefZN2dP/1kCrKo5/GPheKeUIHAPurEWcVXJq3RqUss3jjxxZWqkTk55P5yDP8x4jCV8I0ZhUmfC11i1P//3UiL4NNVhkpbWOAHrXKroaMBiNOIaEUHzYNsJvfValTkUJ39PJEzcHN5nSEUI0CtXuh6+UugeYDgQDEUB/YCu2aZpLglP79hQdsM0Yhfi6YmdQlVbqKKVspZnSXkEI0QjUpCxzOtAHiNVaD8f24JPsOomqlpzbtcV04gSWvHwc7Q2E+BirrtRxC5YpHSFEo1CThF+ktS4CUEo5aa0PAe2qOOaicmpnu3FbHGWrx2/l71Z1Lf6pEb5VW+s8PiGEqE81SfjxSikvYDGwRim1BIitm7Bqx7ldWwCKDx8GbPP4Men5mC0VJ/Ngt2CKLcWkF6YDYLVqXl52gL3xl9QvL0II8Y/VpD3ytVrrLK31i8Bz2NolTKirwGrDPjAQg7v7WaWZrpgsmriThRUeE+QeBJyp1NlwJI0vtsTw9bbjdR2uEEJcVLVqraC13qC1XnqpPeLw3BYLravRU6e0NPNUpc43244DsOVoeqXtlYUQ4t/mYnTLvKjObrFQnefbBroFolAk5CYQm5HP+iNptPAxkpRdREx65St1hRDi36TBJfyzWyx4ODsQ4O5U6Qjf0c6RJq5NiM+L57vtsdgpxTvXdwVso/yz7UzZyfq49XUZvhBC1JkGl/DPbbHQpokb4cczMVVx4zY2J46f/o5jdOem9G3pQ5CXC1uOZpTZ7/3w93l528t1F7wQQtShBpfwy7RYAG4bEMrxjAK+3nq8wmOC3YM5dvIEOUVmbh8QilKKQa192RqdjsVqm8cvthRzIPMAaYVp8uBzIcS/UoNL+Oe2WBjVsQmXtfPnw7VRpOQUnfeYILcgcs0ZtGvqRJ9QbwAGtfYjp8jMvgRbeebBjIOYrbZGbAcy6qT/mxBC1KkGl/ABnNq1o+jUCF8pxUtXd6LEYuW1387/dEVTsS3Jj+vpjFK2HnEDw/wA2BJtm8ePSI2wnQ/F/oz91Y5lf2I2hSWW2r0RIYS4gBpkwndu3660xQLY+urcNyyMpZGJbI1OL7d/eJQtyXdsbip9zd/difZN3Utv3EamRRLkFkRr79bsT69ewt8WncFVH23miV/2/NO3JIQQ/1iDTPjntlgAeOCyMJr7uPD8kv2UmM/cwE3NLWLrYds8fVpRUpnzDAzz4+/jJyksMROZFkk3/2508u3E/oz9VdboZxeaeGxBBAalWBaZSGTcBX+UrxBC1EiDTPjntlgAcHaw48XxnTiamseXW2JKX//xrzhMJa442TmXa5M8uI0vJWYra44cIq0wrTThZxZlklKQUmkMLy7dT0puMd/c1RdfV0de//2gLOQSQtSrBpnwS1ssnJXwAa7o0IQRHQKYtS6KpOxCTBYrP+w4wZA2/jR3L981s29LX+wNihVHbU9o7BZgS/hApdM6y/cksmh3Ag9f3ppBrf2YPqINO2Iy+fOwVPcIIepPg0z4pS0WDh0ut+2F8Z2wWDWvLj/ImgMpJOcUcfuAUILdg8uN8N2c7One3Is9aZE42znT1rstbX3aYq/sK7xxm5xdxH8X7aNbcy8eHN4agJv6tqClnytv/H6o0kZuQghRlxpkwoeyLRbO1tzHyIPDW/Pb3iReXX6AIC8XhrcPKO2Lf+60y6DWfpy0RNHeuyMOBgec7Jxo493mvAnfatXMXBhJidnKhzd0x8HO9ul1sDPwxOh2RKXm8csu6b0vhKgfDTbhn91i4VxTh7Yi1NdIYnYRt/QPwc5ge/JVobmQk8Uny+zbt5U7BudEvO3blr7W0bfjeW/cfr3tOJui0nn2qg609HMts21M56b0bOHF+2uOUFBS8YPVhRCirjTYhH9ui4Uy2xzseGNiV7o39+LGPs2Bih9obu+SgFJWCnKCSl/r5NeJ7OLsMlNAUSm5vLniEJe3D2BK3xblrqmU4plxHUjJKeaLzTHltgshRF1rsAn/3BYL5xoQ5sviBwfh7eoI2NorQPmEfyBzLwBH43xLXyu9cXtqWqfEbGXGTxG4Otnz5nVdShdvnat3qA+jOzXhsw3HSM8r/gfvTgghaq7BJvxzWyxUJdAtEKDcjduI1Ag87ZtyPNVAYpbtQSptvNrgYHDgQPoBLKfm7fcn5vDGxC4EuDtXep0nxrSn0GTho3VRtXhXQghRew024UPZFgtVcbF3wd/Fv8wIX2tNZFokXf27AWfaJTvYOdDOux370vfx1C97WBKRyJNj2jO6U9MqrxPm78ZNfZvz/Y4T0m9fCHFRNeiEf26LhaoEuQWRkHfmJm9CXgIZRRkMad4LPzdHtkafaZfc0bcju1P28fPOE0y/og33XxZW7bimX9EWR3sD764uXzYqhBB1pUEnfKd27YCyLRYqE3zO4qvItEgAugd0Z0CYH5tPPfZQa82xBB/MFDJlkCszRrSpUVz+7k5M6hXMuoMpFJulsZoQ4uJo0Anf+XTCP1y9kXSwezDJBcmYLLYmapFpkbjYu9DGuw2DW/uSllvM0dQ83l19mA17nQAY2LGgwpu0lRnY2o8ik5U98dk1PlYIIWqjQSf8ilosVCTYLRirtpKUb2uiFpkWSWe/ztgb7BnU2tYu+T8LIvjkz2gmd+uNk50TBzJr1xu/X0sflILt0RlV7yyEEBdAg074Simc27WjYPsOtMlU5f5nl2YWmgs5knmE7v7dbdu8jYT4GtmXkMN1PYN5fUI32vu0r3ar5HN5GR1p39SD7TGS8IUQF0eDTvgA3rffRklMDBlffFnlvqWLr/Li2Z++H7M20+1UhQ7AQ8NbM21YK96+visGg6KTbycOZh7EYq3dPHz/Vj7sjD0p8/hCiIuiwSd8j5EjcR8zhvSPP6Y4OrrSff2N/jgaHInPiy+9YdvVv2vp9km9m/P02A7YGWxz9p38OlFoLuR4zvFaxda/la/M4wshLpoGn/ABmj77XwxGI0n/fRZtqXg0bVAGAt0Cic+1JfwQjxC8nb0r3P/cFbc1JfP4QoiLqVEkfHs/P5r89xkKIyI4+f33le57ujTz9BOuKhPqEYqLvYvM4wsh/hUaRcIH8Bg/HtdhQ0n94ENK4uIq3C/YLZgjJ4+QWZRZZcK3M9jRwadDrUf4IPP4QoiLp9EkfKUUzV58EWUwkPTc8xU+bjDYPRiLtiXfqhI+2ObxD2UewmytXctjmccXQlwsjSbhAzg0a0bAzJkUbN9O1s8/n3ef06WZRnsjrb1aV3nOTr6dKLYUE51V+Q3hisg8vhDiYmlUCR/Aa/IkjH37kvr2O5iSk8ttP12a2cWvC3YGuyrP909v3Mo8/j9jtWr+Pp6JxSoPiBeiKnWe8JVSx5VSe5VSEUqp8Lq+XpXxGAw0e/UVtNlM8osvlZvaCXYPxl7Z07NJz2qdr4VHC9wc3Gp94xZkHv+f+G5HLJM+28bnm47943PlleTx7OZnSS2Qh82LhulijfCHa627a617X6TrVcqxRQv8Z0wnb/16MubOLbPN1cGV78Z9xx2d7qjWuQzKUPrIw9qSefzayS408cGaIygFH62LIjWn6B+db3PCZpZEL+HHQz9eoAiFuLQ0uimd03xuvx2P8eNJ+3AW2UuWlNnWya8TRgdjtc/VybcTh08epsRSUqtYZB6/dj758yhZhSbmTOmJyaJ5c2X1HnZTkd2puwFYfmw5Vm29ECEKcUm5GAlfA6uVUjuVUlMvwvWqRSlF4GuvYuzXj8T/Pkv+tm21Ple3gG6YrWZ2JO2o1fH1MY+/PWk7cbkVl6de6mIz8vlqy3Gu7xnM2C7NuHdoS37dlcDO2JNVH1yBiLQInOycSMpPYmfKzgsYrRCXhouR8AdrrXsCY4EHlVJDz96olJqqlApXSoWnpaVdhHDOurajI8EfzcapZSjxDz9S7adjnWto0FD8XPyYf2h+rWO5mPP4yfnJTFszjRuW3cDWxK11fr268NbKQ9gZFI+PtrXAfuCy1jTxcOLFpfux1uIGboGpgMOZh5ncbjKuDq4sjV56oUMWot7VecLXWiec+jMVWAT0PWf7PK11b611b39//7oOpxw7Dw+az5uHwWgkburU81buVMXBzoFJbSexOWEzJ3JO1CqOizmPv/joYqzaiq+LLw+sfeBfN2f99/FMft+bzH3DwmjiYXuGsKuTPc+M68DehGx+3lnz31z2pe/Doi30b9afUSGjWH18NYXmwgsduhD1qk4TvlLKVSnlfvrvwChgX11eszYcmjWj+efzsOblETd1Gpbc3BqfY1LbSdgpu1qP8i/WPL5VW1l8dDH9mvbjx6t+ZFDQIF7b8Rqv73i91ovHLiSTxVrhojiwlWG+uvwATT2cuXdoyzLbru4WSO8Qb95eeZjswqrbYZ/t9Px9N/9ujA8bT4G5gHUn1tX8DQhxCavrEX4TYLNSKhL4C/hNa72yjq9ZK87t2hE0exbFx46RMH06uqRmN2D9jf6MDB3J4qOLKTAV1Pj6F2se/6/kv0jIS2Bim4m4Orgye/hsbut4G/MPzeehdQ+RW1LzH3YVic3I50BiTrX3LzFbGfH+Bq76aDN74rPOu8/SyEQi47OZObodRkf7MtuUUrx4dScyC0qYtTaqRrFGpEUQ5hmGp5MnvZr0IsgtiGXRy2p0DiEudXWa8LXWx7TW3U59dNJav1aX1/un3AYNotkrr5C/dRvJr9Y81Cntp5BnymP5seW1uv7FmMf/9civeDh60MatP8VmC3YGO2b2mckLA15gR9IObvn9ln98M/doah7/+SmC4e+u57pPt5KZX70fnr/tTSQ2o4ATmQVM+GQLLy3bT17xmd86CkssvLXyEJ2DPLi2R9B5z9E5yJOb+rbg623HiUqp3g8vq7YSmRZJ9wDbw24MysBVra5ie9J2UvJTqnUOIf4NGm1ZZkW8rp2A77RpZC1YwMmfFtTo2G7+3ejg04H5h+ZXOi1Rkbqcx0/LLeaHvw+y8vgaCjK7MfL9bTz4/e7SOK9vez1zR84lvTCdW36/hZjsmBpf40hKLo/M383IDzawcl8yk3s3p9Bk4dttsVUeq7Xm/zbH0DrAjc1PXs6Ufi34autxRr2/gTUHbEn3/zYfIym7iGev7IhSEJMdw/xD83l1+6tkF5/5nD0+qh2ujna8uGx/tb4Ox7KOkVuSW5rwAcaHjceqrfwe83uNPw9CXKok4Z+H/yMP4zp0CMmvvkrBrt3VPk4pxZQOUziadZS/kv+q8XX7tfRBGUr4Zf/6Wv3AOFdOkYk3VhxkzIcb6fPaWl7441s0Zrp6juLGPs1ZezCFL7ccL92/b7O+fDv2WwDuWX1PtUf6R1JyefD7XYz+cCNrD6YwbWgYm54czpvXdeWK9gF8ve04hSWV/9by9/GT7EvI4c5BoXi6OPDqhC4svG8g7s4O3PtNOFO/CWfOpt307BjNssT3GbFwBFcvvprXd7zOT4d/YmXMmZlCH1dHHhvVji1HM1i1v+oRekRaBAA9AnqUvhbiEUI3/24sjV5a5dfiQnythLgYJOGfh7KzI+idd3Bo1oz46Y9gSqn+UvuxLcfi7eTNDwd/qNE1tdb8nbYBzzYf8Fv6C/zfvv+radhlWKyah37Yzf82xeBtdOTxUW1p03o/HX078fUtE3hjYhdGdGjCGysOlpkvb+XVis9HfU6xpZh7Vt1DUl5SpddZfziV8Z9/w4boKB68rDWbn7ycp8a2x8/NCYBpw8LIzC+psnLmi80xeLo4MLFHcOlrvUK8Wf7IYB4d1YrN2Z9gF/oKUfpzNsRvoLt/d57r/xy/Xfsbga6BbEncUuZ8N/drQSt/V+ZurLqp3e7U3fg4+9DCvUWZ168Ou5qjWUc5lFnxgq6P/4hi1AcbycgrrvI6QtQ3SfgVsPP0JPjjj7DmF5AwfTrWat7EdbJz4rq217E+fj2JeYnVOiY2J5b7197Po+sfxc3BHWt+O2bvms3mhM21jv+tlYfYeCSNV67pzPyp/bmsazFxece4rs1EwPbbyDvXd8XPzYmH5+8mt+hMVUtb77bMHTmX3JJc7ll9T4W9ZRZERvDAugdxbP4ZLq3exTtwM+7Oqsw+fUK96dnCi883HcNsOf/q1bjMAlYfSGZKvxa4OJZtWFdgzmV3ydvYeYQztvmNLLhqARtv2Mh7l73H5HaTaeHRgoFBA/kr+S9M1jPvwd7OwJS+Ldh9IqvKufzTD7tRqmzso0NH42BwqLAmf8exDN5bc4So1Dwe+zmyWvX/WmvmbYzm6V/38sOOE+yJz2rjN31CAAAgAElEQVR0PZQa8yrm/GJzvf5GKAm/Es5t2xL4+msURkSQ8trr1T5uctvJAPx0+KdK9ys0FzJ712yuXXItkWmRPNX3KV7s9T/y424mwDmUJzY+QVxOzW+g/rornnkbj3Fr/xCm9LONWn+N+hVnO2fGthxbup+3qyOzb+pB/MlCnv51b5lvxE6+nZgzYg5phWncu/peMgrPVA8VmYt4fO3bvLzrTuyNMUzr8hCDggbx4a4PuX7Z9fyd/Hfpvkoppg0LIy6zkBX7zr/G4eutxzEoxW0DQsq8Hp8bz60rbiUyLZI3h7zJ25f/lw6+HTCost+2gwIHkW/KJzI1sszrE3oEYW9Q/PR3xZ/DjMIMYnNiy8zfn+bp5MllzS/j95jfy/wwAVsfn0cXRBLiY+Tpse1ZfzitWg3cPv7jKK//foglEQk8s2gvV3+8hc4vrOLK2Zt4cuEeVuyt/Deq6krMKmRJRMIFOdeF9NOhn7h8weUcy/7nze5MFhMHMw5egKgujsz8Evq/sY4PalhBdiFJwq+Cx5gx+N57L1k//cTJBdW7idvMrRmXN7+cX6J+ochcvqGXxWphRcwKJiyewOd7P2d06GiWTljKzR1u5vJ2TekS6E9e7M0oFI/8+UiNyjwj47J46te99Gvpw/PjOwK2VaQrYlYwKnQU7o7uZfbvE+rDoyPbsnxPEj+ekxi7B3Tnkys+ITEvkalrppJdnM36uPWM/nk8qxK+xc3Sg1/HL+GhntP4cPiHfHz5xxRbirlr1V08velp0gvTARjZoQmt/F35bEN0udFNXrGZn/6OY1yXZjTzdCl9fU/aHm7+/WYyCjOYN3IeV7a6ssL33K9ZP+yUXblVw35uTozo0IRFuxMoMZ9/VHn6YfXd/csnfIDxrcaTWZTJtsSyrTdeWLKP5JwiPrihO1OHtmJs56a8s+owu06cae1w7nv9ZWc87605wsQeQex7cTQbZw5nzs09uWdIK3xcHVl9IJn7v9/1j5O+1prHFkQy/ceIf9Rq4kJLyU/h/Z3vk1GUwcwNMym2/LNpsBe3vcjk5ZNZFLXoAkVYt+b/dYLcIjOfrj/KsbS8eolBEn41+M+YjuvgwSS/8iq569djycqq8teyKR2mkF2czYqYFaWvmawmFh9dzIQlE3hi4xMYHYx8OfpL3hjyBv5G2ypjg0Hx9Lj2JGe6cZn3oxzLPsZzW56r1q+BqTlFTP02HH83J+bc3BMHO9uXd03sGvJMeVzb+trSfYsOHcKSZZu7v39YGEPa+PHi0v0cTi47/dGnaR9mDZ9FTHYMVy66kof/eJj0XCvNCv7Dqilzae17pjxyWPNhLLpmEVO7TmXl8ZVcvehqFhxegFIwbWgr9ifmsOVo2XUGP4fHkVts5q7BZxZRrYtdx12r7sJob+S7cd/Ru2nlTVbdHd3p6t/1vG0ibujTnIz8Ev44dP6btxFpEdgb7Onk1+m82wcHDcbbyZslR8802FsSkcDiiEQeubwNPVp4o5TipQnt8PNLZtriWcxc/zTXLL6GAfMHlNbybz2azpO/7GFgmC9vXtcVg0HRwtfIuC7NeHJMe769ux87nhlBt+ZePPHLHuIyq/4hr7WmwFRQ7ntj1f5kth3LsHUR/aP+RpPnei/8PcxWM0/3fZojJ4/w7t/v1vpc606sY2n0UnycfXh528u17mN1sZgsVr7dFkv35l44O9jxwtLqVZBdaPZV7yKUnR1B775DzKTJxN93v+01JyfsAwJOffhj7+uHcnBA2duBwY4QOwP3RnlxInIWeTO68VvB33yx7wsS8xNp592Od4e9y4gWI877kJWBYX5c3j6Apdszuf/qh/l07yy+2PcFd3e5u8IYi80Wpn23k5xCM7/cPxDfUzdNwTadE+IRQq8mvQDIXbuW+EemY+/vT/CsD3Hp3p33J3dn7KxNPPjDLpY+NKjMoqaBQQP54LIPeGHL65hSB9PWOI5v7xqAp4tDuThc7F14uMfDXNXqKl7b8RqvbH+F3am7earPf3l3tRNzN0YzuI0fYLux/NXW4/Rs4UX35l5YtZWv9n/Fhzs/pItfF2ZfPhtfF99qfY0GBg5kTsQcThadxNvZu/T1IW38aOLhxE9/xzGmc7Nyx0WkRtDRtyNOdk7ltoGtbcbYlmNZeGQh2cXZ5BU68uyS3XQMzSIgaCcvbv2eAxkHiDoZhdnXtmZgXawHA4K7Y7Q38vzW5ykp9uDFBUW08nfl01t64Wh//nGWo72Bj2/qwbjZm3h4/m5+vm8ADnYGCs2FfLDzA2JzYsktySWnJMf2Z3EOZm2mo29H3h76NiEeIRSZLLz2+0HaNnHjqq6BvL/mCJFxWXRr7lWtz2Nd+SvpL1YcX8ED3R5gSocpJOQl8M2Bb+jXrB8jQkbU6FwZhRm8vO1lOvh0YO7Iudy58k7+s/4/fDfuO1p5tqqjd1BWgamAVcdX4efix5DgIVXuv2JfMsk5Rbx2bWfiMgt4cdkBVu5LZmyX8t+TdUldSiVlvXv31uHh9f6MlAqZMzMp2LEDc2oqptRUzCmpmFNPfWRmos1mMJvRViuYzywYyjEqXp9swL1rd6Z2ncqQoCHlbhCeZkpKInfNGtLbdWPM4nhuGxBCgec3rDy+kk9HfMqgoEHljtFa88TCPfy8M545N/dk3FnfRDHZMVy9+Gpm9JzB3V3upiA8nBN334NT69ZYcnIwJSfT9Jmn8brxRrYczeDWL3bQJcgTb6MjBSVm8oot5BeZaBOzh27HdhJ52UQ+ePQqPJzLJ/tzWbWVeXvmMSdiDu192tPLZQafrj3J8ocH0znIkzUHUrj3m3A+ntKD9s0LeHHbi0SmRTIyZCSvD34dZ3vnan9tTk8BvTXkLca1Gldm2zurDvHp+mi2PnUFTT3PnLPEUsKAHwZwU/ubeLzP4xWee3/Gfm5cfiO9AnqxLzmZIpWEUrYpIk8nTzr4dKCzX2c6+3YmItqVj1an8/I1nZnQy4eblt/CiewUnFKns2TatQR5uVR4ndN+35vEA9/vYtrQVjw9rgP/2/s/Zu2aRSffTng6eeLh6IGHowfuju442jny/cHvMVvNvDDgBY6faMvbKw/z3d396Nbck8Fv/UmfUB/+d3v9PYrCZDVx/dLrKbGUsOiaRTjbO2OymLh1xa2cyD3BwvELCXQLrNa5tNZM/3M6WxK28NNVP9HauzUJeQlM+W0KRnsj31/5PT7OPnX2Xk7knODHwz+yOGoxuaZcvJy8+GPyHzgYKv//cO2cLZzML+GPxy7DqjXjP95CVkEJ6x4bVm7FeG0opXZW63kjWutL5qNXr166obBarTqvMEff8ekIvW1AN72/Wzeds2FDpcfk/PmnPty3nz7Qrr0+0K69/mPMRH3Hra/qg3GJeuKSiXrgDwP1gfQD5Y6b8+dRHfLkcv3eqkPltr0X/p7u9nU3nVaQpgsPHdaH+vTVR8eM1abMTG3OytKxU6fqA+3a64Qnn9KWwkL9xeZj+or31uvxH23SN87dpp9/5Tu9bsTVpTEdGjBQF+zZW6PPxYa4DXrA9wP0oB8G605vzNYP/bBLa631DXO36v6vr9Czds7W3b/prgfPH6yXHF2irVZrjc6vtdZmi1kPmj9IP7PpmXLbYtLydMiTy/XHf0SVeX13ym7d+avOes3xNZWe22q16tt+v033/Wawbv/xdfq+5a/qtcfX6oTchHKxWixWfccXO3SbZ37XO45l6JGzf9Wd/q+fvuKn0TqzMLPa7+eZX/fokCeX6+X7juoBPwzQD619qMJ9E3MT9S2/3aI7f9VZd5p9l77jq82l2z5Yc1iHPLlc70/Irva1L7Qv936pO3/VWa8/sb7M6yeyT+h+3/fTN/92sy6xlFTrXIujFuvOX3XWX+37qszrEakRuuc3PfWtv9+qi8xFNYovr8ikX1y6Tw94fa1eFplQbrvFatEb4jbo+9bcpzt/1Vl3/7q7nrl+pp4TMUd3/qqz3hK/pdLz7z5xUoc8uVx/sflY6Wt/x2TokCeX6zdXHKxRrBUBwnU1cqyM8C8CU2oqcVOnUXz0KM1efQWvCRPKbNcmE2mzZpHxv//DqX17mr34AgXh4aT/8CPWxAQKXD3wnDSOmZ6rOWrMYVDgIG7rdBv9m/bng7VRfPTHUa7s2oyPbuyBwXDmNweT1cTIn0fS1b8r77V7kuM3TQEgdP4POATZ5t611Ur6nE9J/+QTnNq3J/ij2TgGB1N04ACpH35I/sZN2Pv74/fgA7j07En8/Q9gPnmS4A8/wG1omU7XlYrNiWXGnzOIzoqmKHUss8bM4KFffyGw9XKyzAlc1eoqZvaZ+Y9GZ49veJxdKbtYN2ldud+gbpi7jeScIv587LLSz9HX+7/m3fB3+XPyn/i5+FV67v2J2Uz4ZAtXtG/Cp7f0rPA3NLBVY4ydtZHU3GIMSvHfiUbmHHqcjr4d+XzU5xVOH52tyGRhwidbSDIswuq5loXjF9LOp12F+5usJq6b/wIx5mW0cGvF7CveJ8wrjOwCE4Pf+oMhbf2Yc3OvCo/PLcll1q5ZJOcn08a7DW2929LWuy0hHiHYG2o/Ak3JT2H84vH0a9qPj674qNz2FTEreGLjE9zd+W6uazmVQC8X7Azn/9wm5SUxcelE2nq35YvRX5SbDl15fCUzN8xkXMtxvDnkzUq/RqdtjkrnqV/3kJCbhG/QJvLMOTTzMtDc154SaxGF5kKyirPILMrEz8WPyW0nc33b6/E3+lNsKeayny5jZMhIXh70coXXmP7jbtYdTGXb05fjftZvxo//HMmSiARWTB9K6wC3KmOtTHVH+JLwLxJLXh7xDz9Mwbbt+D/6KL733oNSClNSEgmPPkbh7t143XgDTZ5+GoOTLSFoq5VvP1qAedFC+qceRGlNYRNPYlwLSPAwke3lTbS1A+07jObRu8Zg52YkKiuK8ORwwlPC2ZWyi5PFJ/mk15sEzfwEc3o6Id99h3O7tuXiy9uwgYSZT4BSGPv0Jm/tOgyenvjdew/eN9+MwcU2FWFOS+PEtGkUHz5Cs5dfwuu66877frXFQv727ZTEHMehaRPsmzXD7O/FkxFvsynpD6xFgRicE2lmDOSFgc+fd6qqphZFLeL5rc+fNzn+uiueRxdE8uPU/vRvZbsvMOPPGRzOPMyK61aUO1ex2cLe+Gz+Op5J+PGT/BWTidHRjlUzhuLt6li6nyUnh8KICKzFxeiSEnSJCV1SQkxiJj+Gx9P39uu5flR3Vh1fxeMbHmdsy7G8NeStaiWj8LgT3LF2Al66Bxvu+F+FiRBgT3wWV3+8hfH9c9hT/BkFpgKe6PsEE1tP5IM1R/lk/VFWzxhKmybu5Y6NSI3gyY1PkpSfjKd9IHmWZMzaNiXpaHAkzCuMILcgii3FFJgLKDAVUGgupMBUgMlqYnToaO7rdl+5+y3mkyeZ9cvjJO8PZ5rnlTieSKH42DGwWPB95VWiWnQiPPYkC2PfJ53NFMTdRVuPXjx3ZQcGti77A9iqrUxdPZW96XtZePVCmrs3P+/n4fM9nzN792zu73Y/D3R/oMLPV3ahidd+O8CC8HhC/Yz4tf6CmNwDuBj8ycwDR+VMx6Z+NPXwwGhvZFDQIEa0GIGDXdmpm2c2PcP6+PVsmLyh3DaAlJwiBr35B7cOCOGF8WULA9Lzihn+7nq6BXvx7d19q/U9UZHqJny5aXuR2Lm50WLuXBKffoa099/HnJKC66CBJD39DNpsJuj99/AYV3buWRkMTLzvei7L9WWbYzGvu8VRcjQKz9gThByOxrUgE9gCG7YQNe95jgbbE97SSkQrhSksiCHBQxjq15fmz39DcUICLb74v/MmewC3YcNoufBn4h+ZTv7Wbfjefx++d92FnXvZBGHv70/IN9+SMH06Sf99FlNyMn4PPFD6zVpy/DhZixeTvXgJ5vM8W+ARZ2dudDVyzDeedeNHMnfK6zV6nGRlBgQOAGBr4tZyCX9s52a8sGQ/C/6Oo38rX7TWRKRGMDBwYOk+ecVm5m08xvboDCLis0pLOcP8XRnfrRm39g8tTfbWoiJOfv896fM+x5pdvveRC3AnoA6uIO3o3Yy84w6m95zOrF2zaOHegod6PFTl+1mb9AMGg4WEo0P56I8oZow4/9dOa83Lyw7g5+bI62Oup8g6gqc3Pc3L217mm/3fcHO7u3FxcODjP4/ydm93Mv73f/hOnYpdaHM+3/M5c/fMxc7qTV7sNHIKQ/jf7d1p3iSPIyePEHUyiiNZR4jJjsHZ3hmjg5EAYwAu9i4YHYwUmgr5+cjPLI1eyh0db+dGSy9K/thI7po1mOLiOP0dbTauQoeGkhjaEWvUYYrvu48Fna7kl9bDaN3kWtz8j+ESupCMdDumfJnEiLat+O+VHWjp5wrA/EPz2ZG8gxcGvFBhsge4p8s9HM85zqeRn9LKqxVjQseU22fV/mSeW7yPjPwS7r8sjNYtD/HKlt287nErQ5sN5mh6AR9vOEbS3yaG9wzmxoFhuAW3R50noY9pOYZlx5axLWkbQ4PL/8b73fZYLFpzx8BQAA5nHibYPRhXB1f83JyYObodzy/Zz+97k7mya93fwJUR/kWmrVZS336HzK++AsCpQweCP3gfx9DQCo/5bnsszy7ex7xbezGkjT/3fbeTDUfSeOay5nQzHmTD3z/jFpVE92NWvE/YSi3t/P1wGzQYU3IyBX/9RfDsWbiPqLoaQpvNWIuKsXNzrXw/k4mkZ58je8kSvCZNwqVbV7IWLaZw504wGHAdPAiva6/FpWcvzGlpmJISMSclYUpMIut4HCWbNuA8YhRtZr1X7c9ddVy75Fr8XPz4fNTn5bY9s2gvv+6KZ+MIN3L+XMFzagnXTnqWyR1vIim7kLu+Cudwcg5dgr3oG+pN71Afeod4l6l40mYz2YsXk/bxJ5iTk3EdOgTfO+/EzssL5eh46sMJ5eiAJTOTtFmzyV29GvuAAPymP8KsJhH8Er2Ylwe+zLVtri0X42mJeYlctegqrg67mpy4CSyKSODW/iFM6tWczkEeZUaDSyMTeWT+bt6c2IUb+9oW2lm1lbWxa/lsz2dEnYzCXTVh0Gp/7tpzAEpKsOvbg1dvtGN3WgQ+uj+xR8bw8vhefLstltwiM6v+M/S8VVjn0hYL0RuWs3vBHJrtPIFvLljt7XDt34+FHkdI8DcwfuD7LIyzsuZgGiaLpncTZ+7b/gNBkVtxHjOOkDdfI7owjttX3E6uyVYWrEv8sBY1p3fT7tzQoysvbH+Kvk378kT3t4mMzyYyLps98Vmk5hYzpI0fYzo3pX8rXxzsDJgsJu5cdSdHs46y4KoFtPBoQUJWIWsPpLBiXxLbj2XSoZkHb1/XlRb+cPuX43jolyKC4iouhTWFhhH6zpt4dOlc9nWLiWELhjG8+XBeG1y2w26RycKgN/+gRwsv/nd7H45lHWPCkgmlU3vuju5YrJqrP95MRp7tBq6rU+3G4DKlc4k7+eOPmBIS8XvowdIpnIqYLVZGfbgRtG117O4TJ3n92jP/uc9mSk0lf8tW8jdtJG/LVqzZ2TR9+SW8J0++4O9Ba2279/DZXAAcW7bEc+K1eF59DQ5NAio9NvWDD8mYO5eQ+T9g7NGj0n1r4p2/3+HHQz+y+abNuNiXrYiJjMvi+Re/4vUdX2Awn1o56+WBdeBw3isOYqdPGLNv7cOwtuWfvKa1Jm/dOlI/+JCS6Gicu3Yl4LHHcO3Xt9y+5yrYtYuUt96iKHIPju3a8uMIJ37yOMTzA57n+rbXn/eY57c8z/Jjy/l94u+42/vx3OJ9LN+bRInZSvum7lzfK5gJPYJwdbTnivfW42V0ZNnDg8tN+1i1lY1//UzBC2/T8ngBu9o44dmlM2G/7uSDm1xJbnEbew6Hlf6w2BOfxbVztnJN90Den3z+xWinJf21i/iHHsYtJxOzvQOJ7duyoW0O60ISwdWDAksuLhn3kpoaho+rIxN7BHFDn+a0aeKO1pqMufNImzUL5w4dCP74I0z+XuzP2M+etD38nRRBeHIExfrUAEa7ohMeJzvX9jV1djDQOdATL6MjW6PTKSix4OFsz4gOTRjduSltAk1M+f0GnJU/xvTpHEi0Pb2slZ8rk3o3554hLXGwMzB33n30+HQD7gYjzZ59DscWzdEWC1itYLUSfiydxWsjuS5iOV7FeUQMGo/f/fcztEtQaXXNc1ueY23sWtbfsL7M/ZkF4XE8sXAP39/Tj0Gt/Xh287OsPL4Si7bQ2bczc0fOxehgZGfsSa77dCvThrXi6bEdqvx+Oh9J+A3M6v3JTP12J452Bj68sXuZ0suKaIsFc1oaDk2b1mlseZs2YefujnO38v1oKmLNzyd67DjsAwIIXfATynBh1gBuTdjKtLXTmHPFnHL10QW7dnPktjs56eHLzmf6ELNjDXck9cNu+1ZczMXg7oHHsKEoR0esBQVY8/NL/7RkZWFOTsaxZUv8/zMD95EjazTnqrUmd+VKUt97H1N8PNFd/XjjspM8MPxpbu5wc5l9Y7JjmLBkAlPaT+HJvk9SHBND5tdfY/ELINw9hG+zXNmZlI+9QRHm78bhlFx+mtqffq18y10za8HPpLz1FspgYMXoYcwL3oejUzwff2GPVXtyx6DHeP367mUGD++tPsxHfxzl89t6M7Jjk/O+n4QNW0h56CFyHVxY1v86tvi1I9mk0Fpj53YQJ/81WIub0NftYW7s05wRHZqcd/1B7vr1JD4+0/Z86dmzMPY+k7O01myOieLdDWvIz/emT2BnugZ70S3Yi7ZN3LA/tbCwyGRhU1Q6K/cls/ZgCtmFJgwKDK77cWn+LV6my7m5zSOM7NiEMH/bzVFtsbDvnecxfPUrec196PH59xX+ll1itrItMobs996lza71nHAP4JM+N9JsQB8mdA/CzTuaB9bdz+zhsxneYnhp7ONmb8Zq1aycMYSUghRu+GYMz29pgsugAdxv/JXeTfvwyRWf4GzvzBMLIzmeUcD8e/tXeq+mIpLwGxitNXM3HqNHc69y/7H/rbKXLSNx5hM0e+3VCm/+npb+2Vwyv/8OTKfWOVgstpGYxYJydMTnjjvwnTaVEoOVwT8OZlLbSTzZ98nS44sOHCD29jvId3Hn7p73EDjkJ7C4c2D3jfRs6sqssCIMm/4kb+tWlMEOg9GIwdXV9nHq7679++E5YQLKvva3vqwlJWR+/TXpH31MgSN8NNrM4BsfLbOobuaGmWyI38DvVy+DH5eS/tHHAKVPYVOOjuj2HTkU0JplOoBW7Vvy2LAWWAuL0EWFtj+Li8j+7TfyN27COKA/ga+9RoarD0Pf/oPLu9jTPDKKiYs+Ju6Ohxn1VNmbmyVmK9d8soW03GLW/KfsTWqAuBVryHz8MVKN3nh88hn9+tpGpRarJqfQRGZBCSfzSwj0ciGwGusOio8dI/7BhyiJi8NzwjV4X399jQYPZzNZrPwVk8nGqDTC/NzYX/wNv0b/yIfDP+SKFlcAtpvJ8Y8/TuGWrWzvYWTS3NW4eVTv/1T2ho3EP/scKj2NVR0u47NWI2nW1IOCps8zrPkg3hn2NgDbj2Vw47ztvDGxCzf1bcGsFc/T+ZWFBGba8m3usO481GsfPVsNYvbw2VgsdjjZG8pU2dWEJHxxydNaEzvlZkpOnCBs5YpyN4hPy/zue1JefRXXgQNsozCDHcrOYPvT3o6S47HkrlmDY8uWNH3pRR7Ltq1oXjrB1uWy+OhRYm+9DeXijNe8Lxn4dTjOYS9RnH4FI5vdynuTu+HsUH7Fc10qjooi4YknKD54iD+7KuxmTGXqgOkcPnmYScsm8Zj7tQz9bh/FBw7iPnIkTZ9/DuztKdy1i4K/wynYuZOiAwfAUnGnTeXsTMDjj+M95abS36CeW7yPb7fHgtb8fOgbPNMTCVu1Eju3smWBBxJzuOaTzYzp3IyPbjoz5Rb7yxJyn32G416BNPn0M/p0D7sgnw9Lbi6p77xL9rJl6MJCHFuH4XXd9XheczX2PtUv1dUWC4W7dpG3YQPWkhK0gz2LTvxGpiWHm7vdiaeLNxlffklJeirzRmiunjGLK2q40teSl0fqe++RNf9HTE0C+XzgzaxusgVHz0juC/2G2wa04fGfI9kRk8m2p64g7/gBDt16Ax4ldrT5/EsKwneSNns2xc28eWZMFm17j+CdYe9UuXirMpLwxb9C4b79HJ80CZ877qDJk0+U256zchUJ//kPbsOHEzx7VoWj67xNm0l+6SVM8fGkXd6VJ7vs59db1+CbaSL25lvQaEK//RbH0FBu+u5r9lneZYTXc7w3flKtR1X/lC4pIfWTT0ifN490D4h5+Cp2+uYQ8stfXLnNhJ23N02few6P0aPOe7wlL5/CyAisuXkYXJxRzi6n/nTG4OKCnbdPuZvviVmF3P7FX9w9uCXXuORwfNIkfKdNI+A/M8qd/6N1Uby35kjp6u2Yb+ZT8MYrHPZtSci8z+jVseJqmdqy5OWTs+J3shf+QmFkJDg44D58OO5XXI5jaCiOISHYeZVtE6HNZgrCw8lZtYrcNWuxpKfb2pw4O6NPlcuezRDUjGfHZtO0x0BmXz671uWQ+Tv+Ium//8WUkEDiqKE80WUz2Sm34mLqTn6xmWnDwpjeysCh226iuLgAn08/oE3/MaXHJjz+GKbsLD4bacXlmit5Y/Ab5221Uh2S8MW/RuKzz5K9eAmtli7FqdWZJmoFf//NibvvwblTJ1p88X+lawEqYi0sJH3OHDK++IJsJysFt15Fi2W7sBYU0OKbr3Fu25asoiweX/8kf6VsZ9uUrbg6VF6NdDHk797NgRnTMKbkctIdfHPBc+JEmjwxs1xyu9ASZj5B7urVhK1cgUOzsveFzBYr187ZSkJWIV+7HcV+7mx2N2tPh7lz6NG27ksIi6OiyFr4C9lLl2I5eabrp52nJ4ZzXPMAAAmYSURBVA6hITiGhKDsHcj7808sJ0+iXFxwGzYMj9GjcBs6FIOr7WurrVZWRS3nhfXPcHvYTUTpZLak7mDxNYur3dKhItb8fFLfe5+TP/xAiq8df0zpSbrXdHbFZvHzcE8KZjxEhjWHlTP68/LNX5Q51pyeTsLMmRRs2876LoqU+6/h+eGvlWv/XR2S8MW/hjkjg+jRY3Dp2YMW8+YBUHT4CLG33IK9vz+hP3xfo8RXeOgQfz40mZbxJgxubrT46itcOndibeza0uffzug1g9s73V5Xb6nGLPn5rH3qdpz2RtPlpXfxHXbFRbmuKTGR6DFjcR8zmqC33y63/fC+aNY9+iLDT4SzrXl3es+bTdeW5auY6pI2mSg5cYKS2FhKjsfa/jz1Yc3Lw23oUNxHj8JtyJBKBwUvbXuJhUcW8v/t3Xts1eUdx/H3p7UtxDI6mBpsEQRUYExYzJwGdE4HlsnEOF10QLzMuDhcvOLdkM15YfO2P5BtIko2VBR1EmRxxLmMoWK9bSq6TRF1KFRRx0WU0n73x+9H1iFCgdOenj6fV0LO7/ecX9PnG55+z+885zzfB+CCQy7gzGFnFqyPG55ayisXT6b7+xvoOWkCNUd+g5Xnnc/GHpVceOI6bvn+HIbvNfyzsTU38/5tM3jvtul82LeGry/4M3tU7ngl9tac8K2krLnzLhqnTaPuVzPodtBBrDjlVIj4vzIQO2Pq4qtZu3AhP5s4m08H7st1S6/j0RWPMqTXEK4Zec12yxSkpvGmm1lz++30nzeP7sOy1aAtGzeyZtYs1sy8g+amzSwcegxjb7yaYfu1X2Gy9vbJ5k+Y9IdJCDHnuDm7NWe+LQ3LF7Pkyh9y7PNZTq0YMIBLv/sxX6jtz531d273Z9f/dQmb3llJr138+rQTvpWU2LSJ5eNPIFqaUUUFm1etpt+cOZ+7MnhHttRVOesrZ/HAPx9gXdM6zhl+DmcMO6Pgf+ilrnn9el4fcyxVAwey3+y7WPvIIzTedDObV62iR309e198ERW1tbu19L+zaGppoiVa2lTPaGc1tzQzet5oxq6pY+LqATSMG8SVL09jxrdmMKp2VMF/X2ttTfjeAMU6BVVWss8Vl9P05ls0vfkWddOn73KyBzi8z+GUqYyZL86ktrqW+8fdz9kHn+1kvw3l1dXs9eNz+bihgeXHjeOdKZewR+/e9Pvdb6m79RYq6+q6RLIHqCiraJdkD1BeVs7ofqOZ22MZ1VdcxO1vz2Vwr8GM3Hf360QVimvpWKdRfcQR7D1lClUHHtimFazb07OqJ5NHTKaqvIoJQybsVsXHFNScfDIf3juX5g8+oM/119Nz/PEFWwyXkvr967n71buZ+sRUVqxdwS+O/EWnerH0lI6ZAVlBOJWVocrKHV9s29QSLYyZN4bVH6+mb4++zD9hfofcbHhKx8x2Slm3bk72u6lMZYzpn62bOP3Lp3e6d5adqzdmZiVu0pDsm0DjB40vdlc+wwnfzKyA+lT3YcrXphS7G9vkKR0zs0Q44ZuZJcIJ38wsEU74ZmaJcMI3M0uEE76ZWSKc8M3MEuGEb2aWiE5VS0fSe8CbO7jsS8D7HdCdzirl+FOOHdKO37FvX7+I2OHONJ0q4beFpGfaUiSoq0o5/pRjh7Tjd+yFid1TOmZmiXDCNzNLRCkm/N8UuwNFlnL8KccOacfv2Aug5Obwzcxs15TiHb6Zme2Ckkr4kuol/UPSa5IuK3Z/2pukWZIaJb3Uqq2XpEWS/pU/frGYfWwvkvpKelzSMkkvSzovb+/y8UvqJulpSX/LY/9J3r6/pKX5+J8rqctuTyWpXNLzkhbk5ynFvkLSi5JekPRM3laQcV8yCV9SOTAdGAsMBU6VNLS4vWp3dwH1W7VdBjwWEQcAj+XnXdFm4KKIGAocBkzO/79TiP9T4OiIGA6MAOolHQZMA26JiEHAh8APitjH9nYe8Eqr85RiB/hmRIxo9XXMgoz7kkn4wKHAaxGxPCI2AfcCnW8PsQKKiL8AH2zVPB6YnR/PBk7o0E51kIh4NyKey4/Xkf3x15JA/JFZn59W5P8COBqYl7d3ydgBJNUBxwEz83ORSOzbUZBxX0oJvxZ4u9X5v/O21OwTEe/mx6uAfYrZmY4gqT/wVWApicSfT2m8ADQCi4DXgY8iYnN+SVce/7cClwAt+Xlv0okdshf3P0p6VtLZeVtBxr33tC1hERGSuvTXrCRVAw8A50fE2uxmL9OV44+IZmCEpBrgIWBwkbvUISSNAxoj4llJRxW7P0UyKiJWStobWCTp1dZP7s64L6U7/JVA31bndXlbalZL6gOQPzYWuT/tRlIFWbKfExEP5s3JxA8QER8BjwOHAzWSttykddXxPxI4XtIKsmnbo4FfkkbsAETEyvyxkezF/lAKNO5LKeE3AAfkn9ZXAqcA84vcp2KYD5yWH58GPFzEvrSbfN72DuCViLi51VNdPn5Je+V39kjqDowm+wzjceCk/LIuGXtEXB4RdRHRn+xv/E8RMYEEYgeQtKekHluOgTHASxRo3JfUwitJ3yab3ysHZkXEtUXuUruSdA9wFFm1vNXAVOD3wH3AfmSVRb8XEVt/sFvyJI0CFgMv8r+53CvI5vG7dPySDib7YK6c7Kbsvoj4qaQBZHe9vYDngYkR8Wnxetq+8imdiyNiXCqx53E+lJ/uAdwdEddK6k0Bxn1JJXwzM9t1pTSlY2Zmu8EJ38wsEU74ZmaJcMI3M0uEE76ZWSKc8C1Jkmok/ajY/TDrSE74lqoa4DMJv9VqTrMuxwnfUnUDMDCvOd4gabGk+cAyAEkT85r0L0j6dV6eG0ljJD0p6TlJ9+e1fpB0Q167/++SbixeWGafzwuvLEl5Bc4FETEsX9H5CDAsIt6QNAT4OXBiRDRJug14ClgIPAiMjYgNki4Fqsj2aXgCGJwXtqrJa+CYdSp++2qWeToi3siPjwEOARry6pzdyYpVHUa2+c6SvL0SeBL4D/AJcEe+Q9OCju26Wds44ZtlNrQ6FjA7Ii5vfYGk7wCLIuLUrX9Y0qFkLxQnAeeSVXk061Q8h2+pWgf0+JznHgNOyuuRb9lPtB/ZtM5ISYPy9j0lHZjP4/eMiIXABcDw9u++2c7zHb4lKSLWSFqibIP4jWTVSLc8t0zSVWS7DpUBTcDkiHhK0unAPZKq8suvInvxeFhSN7J3Bxd2ZCxmbeUPbc3MEuEpHTOzRDjhm5klwgnfzCwRTvhmZolwwjczS4QTvplZIpzwzcwS4YRvZpaI/wJs8xvSKS6URQAAAABJRU5ErkJggg==\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df = pd.DataFrame({'sqrt_rmses' : sqrt_rmses,\n", " 'log2_rmses' : log2_rmses,\n", " 'all_fx_rmses': all_fx_rmses,\n", " 'trees' : rnge})\n", "sns.lineplot(x='trees', y='sqrt_rmses', data=df, color='tab:blue')\n", "sns.lineplot(x='trees', y='log2_rmses', data=df, color='tab:green')\n", "sns.lineplot(x='trees', y='all_fx_rmses', data=df, color='tab:red')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable." ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Sales</th>\n", " <th>CompPrice</th>\n", " <th>Income</th>\n", " <th>Advertising</th>\n", " <th>Population</th>\n", " <th>Price</th>\n", " <th>ShelveLoc</th>\n", " <th>Age</th>\n", " <th>Education</th>\n", " <th>Urban</th>\n", " <th>US</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>9.50</td>\n", " <td>138</td>\n", " <td>73</td>\n", " <td>11</td>\n", " <td>276</td>\n", " <td>120</td>\n", " <td>0</td>\n", " <td>42</td>\n", " <td>17</td>\n", " <td>True</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>11.22</td>\n", " <td>111</td>\n", " <td>48</td>\n", " <td>16</td>\n", " <td>260</td>\n", " <td>83</td>\n", " <td>2</td>\n", " <td>65</td>\n", " <td>10</td>\n", " <td>True</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>10.06</td>\n", " <td>113</td>\n", " <td>35</td>\n", " <td>10</td>\n", " <td>269</td>\n", " <td>80</td>\n", " <td>1</td>\n", " <td>59</td>\n", " <td>12</td>\n", " <td>True</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>7.40</td>\n", " <td>117</td>\n", " <td>100</td>\n", " <td>4</td>\n", " <td>466</td>\n", " <td>97</td>\n", " <td>1</td>\n", " <td>55</td>\n", " <td>14</td>\n", " <td>True</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>4.15</td>\n", " <td>141</td>\n", " <td>64</td>\n", " <td>3</td>\n", " <td>340</td>\n", " <td>128</td>\n", " <td>0</td>\n", " <td>38</td>\n", " <td>13</td>\n", " <td>True</td>\n", " <td>False</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Sales CompPrice Income Advertising Population Price ShelveLoc Age \\\n", "0 9.50 138 73 11 276 120 0 42 \n", "1 11.22 111 48 16 260 83 2 65 \n", "2 10.06 113 35 10 269 80 1 59 \n", "3 7.40 117 100 4 466 97 1 55 \n", "4 4.15 141 64 3 340 128 0 38 \n", "\n", " Education Urban US \n", "0 17 True True \n", "1 10 True True \n", "2 12 True True \n", "3 14 True True \n", "4 13 True False " ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "car_df = pd.read_csv('carseats.csv')\n", "car_df = car_df.drop(car_df.columns[0], axis=1)\n", "car_df['Urban'] = car_df['Urban'] == 'Yes'\n", "car_df['US'] = car_df['US'] == 'Yes'\n", "car_df['ShelveLoc'] = car_df['ShelveLoc'].map({'Bad' : 0, 'Medium': 1, 'Good' : 2})\n", "car_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(a) Split the data set into a training set and a test set." ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [], "source": [ "train, test = model_selection.train_test_split(car_df, test_size=0.2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(b) Fit a regression tree to the training set. Plot the tree, and interpret the results. What test MSE do you obtain?" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test rmse: 2.2041731241021694\n" ] } ], "source": [ "reg = tree.DecisionTreeRegressor(max_depth=3) \n", "train_x = train.drop('Sales', axis=1)\n", "train_y = train.Sales\n", "reg.fit(train_x,train_y)\n", "test_x = test.drop('Sales', axis=1)\n", "test_y = test.Sales\n", "preds = reg.predict(test_x)\n", "rmse = np.sqrt(metrics.mean_squared_error(test_y, preds))\n", "print('test rmse: {}'.format(rmse))" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n", " -->\n", "<!-- Title: Tree Pages: 1 -->\n", "<svg width=\"512pt\" height=\"534pt\"\n", " viewBox=\"0.00 0.00 512.14 534.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 530)\">\n", "<title>Tree</title>\n", "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-530 508.1377,-530 508.1377,4 -4,4\"/>\n", "<!-- 0 -->\n", "<g id=\"node1\" class=\"node\">\n", "<title>0</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"96.9985,-322 .0005,-322 .0005,-258 96.9985,-258 96.9985,-322\"/>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-306.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[5] <= 1.5</text>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-292.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 8.017</text>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-278.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 320</text>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-264.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 7.507</text>\n", "</g>\n", "<!-- 1 -->\n", "<g id=\"node2\" class=\"node\">\n", "<title>1</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"231.3545,-356 134.3564,-356 134.3564,-292 231.3545,-292 231.3545,-356\"/>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-340.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[4] <= 105.5</text>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-326.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 5.571</text>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-312.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 247</text>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-298.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 6.682</text>\n", "</g>\n", "<!-- 0->1 -->\n", "<g id=\"edge1\" class=\"edge\">\n", "<title>0->1</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M97.0051,-302.2748C105.837,-304.5098 115.1367,-306.8631 124.2243,-309.1628\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"123.4167,-312.5688 133.9698,-311.629 125.1341,-305.7827 123.4167,-312.5688\"/>\n", "<text text-anchor=\"middle\" x=\"112.4956\" y=\"-320.2297\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n", "</g>\n", "<!-- 8 -->\n", "<g id=\"node9\" class=\"node\">\n", "<title>8</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"232.5688,-234 133.1422,-234 133.1422,-170 232.5688,-170 232.5688,-234\"/>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-218.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[4] <= 109.5</text>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-204.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 6.21</text>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-190.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 73</text>\n", "<text text-anchor=\"middle\" x=\"182.8555\" y=\"-176.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 10.297</text>\n", "</g>\n", "<!-- 0->8 -->\n", "<g id=\"edge8\" class=\"edge\">\n", "<title>0->8</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M97.0051,-258.23C106.2128,-252.1991 115.929,-245.8353 125.3832,-239.643\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"127.5221,-242.426 133.9698,-234.019 123.6867,-236.5702 127.5221,-242.426\"/>\n", "<text text-anchor=\"middle\" x=\"109.496\" y=\"-224.7167\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n", "</g>\n", "<!-- 2 -->\n", "<g id=\"node3\" class=\"node\">\n", "<title>2</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"364.7817,-465 272.355,-465 272.355,-401 364.7817,-401 364.7817,-465\"/>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-449.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[6] <= 48.5</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-435.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 5.253</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-421.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 89</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-407.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 8.034</text>\n", "</g>\n", "<!-- 1->2 -->\n", "<g id=\"edge2\" class=\"edge\">\n", "<title>1->2</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M222.8398,-356.1141C237.754,-368.0927 254.8425,-381.8176 270.4151,-394.3249\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"268.4125,-397.2056 278.4009,-400.7388 272.7959,-391.748 268.4125,-397.2056\"/>\n", "</g>\n", "<!-- 5 -->\n", "<g id=\"node6\" class=\"node\">\n", "<title>5</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"367.0674,-356 270.0693,-356 270.0693,-292 367.0674,-292 367.0674,-356\"/>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-340.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[5] <= 0.5</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-326.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 4.14</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-312.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 158</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-298.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 5.92</text>\n", "</g>\n", "<!-- 1->5 -->\n", "<g id=\"edge5\" class=\"edge\">\n", "<title>1->5</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M231.4676,-324C240.6278,-324 250.298,-324 259.7314,-324\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"259.8395,-327.5001 269.8395,-324 259.8394,-320.5001 259.8395,-327.5001\"/>\n", "</g>\n", "<!-- 3 -->\n", "<g id=\"node4\" class=\"node\">\n", "<title>3</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"500.4946,-526 408.0679,-526 408.0679,-476 500.4946,-476 500.4946,-526\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-510.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 3.917</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-496.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 30</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-482.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 9.395</text>\n", "</g>\n", "<!-- 2->3 -->\n", "<g id=\"edge3\" class=\"edge\">\n", "<title>2->3</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M364.8935,-456.2116C375.6916,-461.6221 387.3014,-467.4392 398.4622,-473.0315\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"397.1957,-476.3116 407.7042,-477.6622 400.3316,-470.0533 397.1957,-476.3116\"/>\n", "</g>\n", "<!-- 4 -->\n", "<g id=\"node5\" class=\"node\">\n", "<title>4</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"500.4946,-458 408.0679,-458 408.0679,-408 500.4946,-408 500.4946,-458\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-442.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 4.511</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-428.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 59</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-414.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 7.342</text>\n", "</g>\n", "<!-- 2->4 -->\n", "<g id=\"edge4\" class=\"edge\">\n", "<title>2->4</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M364.8935,-433C375.3677,-433 386.6055,-433 397.4566,-433\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"397.7042,-436.5001 407.7042,-433 397.7041,-429.5001 397.7042,-436.5001\"/>\n", "</g>\n", "<!-- 6 -->\n", "<g id=\"node7\" class=\"node\">\n", "<title>6</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"500.4946,-390 408.0679,-390 408.0679,-340 500.4946,-340 500.4946,-390\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-374.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 3.626</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-360.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 47</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-346.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 4.706</text>\n", "</g>\n", "<!-- 5->6 -->\n", "<g id=\"edge6\" class=\"edge\">\n", "<title>5->6</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M367.1805,-338.6861C377.2094,-341.7159 387.8496,-344.9305 398.1212,-348.0336\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"397.2393,-351.4233 407.8243,-350.965 399.2638,-344.7225 397.2393,-351.4233\"/>\n", "</g>\n", "<!-- 7 -->\n", "<g id=\"node8\" class=\"node\">\n", "<title>7</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"502.2549,-322 406.3076,-322 406.3076,-272 502.2549,-272 502.2549,-322\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-306.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 3.469</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-292.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 111</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-278.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 6.435</text>\n", "</g>\n", "<!-- 5->7 -->\n", "<g id=\"edge7\" class=\"edge\">\n", "<title>5->7</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M367.1805,-314.3286C376.551,-312.4644 386.4553,-310.4939 396.0938,-308.5764\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"396.8837,-311.9879 406.0085,-306.6038 395.5178,-305.1224 396.8837,-311.9879\"/>\n", "</g>\n", "<!-- 9 -->\n", "<g id=\"node10\" class=\"node\">\n", "<title>9</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"368.2817,-234 268.8551,-234 268.8551,-170 368.2817,-170 368.2817,-234\"/>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-218.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[0] <= 124.0</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-204.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 3.189</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-190.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 26</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-176.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 12.212</text>\n", "</g>\n", "<!-- 8->9 -->\n", "<g id=\"edge9\" class=\"edge\">\n", "<title>8->9</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M232.6193,-202C241.1015,-202 249.9905,-202 258.7015,-202\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"258.8106,-205.5001 268.8105,-202 258.8105,-198.5001 258.8106,-205.5001\"/>\n", "</g>\n", "<!-- 12 -->\n", "<g id=\"node13\" class=\"node\">\n", "<title>12</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"364.7817,-125 272.355,-125 272.355,-61 364.7817,-61 364.7817,-125\"/>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-109.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[2] <= 11.5</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-95.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 4.729</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-81.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 47</text>\n", "<text text-anchor=\"middle\" x=\"318.5684\" y=\"-67.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 9.237</text>\n", "</g>\n", "<!-- 8->12 -->\n", "<g id=\"edge12\" class=\"edge\">\n", "<title>8->12</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M222.8398,-169.8859C237.754,-157.9073 254.8425,-144.1824 270.4151,-131.6751\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"272.7959,-134.252 278.4009,-125.2612 268.4125,-128.7944 272.7959,-134.252\"/>\n", "</g>\n", "<!-- 10 -->\n", "<g id=\"node11\" class=\"node\">\n", "<title>10</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"503.9816,-254 404.5809,-254 404.5809,-204 503.9816,-204 503.9816,-254\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-238.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 2.701</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-224.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 18</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-210.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 11.612</text>\n", "</g>\n", "<!-- 9->10 -->\n", "<g id=\"edge10\" class=\"edge\">\n", "<title>9->10</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M368.3322,-211.9005C376.9066,-213.6064 385.8967,-215.3949 394.6984,-217.146\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"394.0327,-220.5821 404.5234,-219.1007 395.3986,-213.7166 394.0327,-220.5821\"/>\n", "</g>\n", "<!-- 11 -->\n", "<g id=\"node12\" class=\"node\">\n", "<title>11</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"503.9946,-186 404.5679,-186 404.5679,-136 503.9946,-136 503.9946,-186\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-170.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 1.654</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-156.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 8</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-142.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 13.562</text>\n", "</g>\n", "<!-- 9->11 -->\n", "<g id=\"edge11\" class=\"edge\">\n", "<title>9->11</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M368.3322,-186.9659C376.9066,-184.3755 385.8967,-181.6595 394.6984,-179.0005\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"395.9629,-182.2748 404.5234,-176.0323 393.9385,-175.5739 395.9629,-182.2748\"/>\n", "</g>\n", "<!-- 13 -->\n", "<g id=\"node14\" class=\"node\">\n", "<title>13</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"500.4946,-118 408.0679,-118 408.0679,-68 500.4946,-68 500.4946,-118\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-102.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 3.674</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-88.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 34</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-74.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 8.529</text>\n", "</g>\n", "<!-- 12->13 -->\n", "<g id=\"edge13\" class=\"edge\">\n", "<title>12->13</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M364.8935,-93C375.3677,-93 386.6055,-93 397.4566,-93\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"397.7042,-96.5001 407.7042,-93 397.7041,-89.5001 397.7042,-96.5001\"/>\n", "</g>\n", "<!-- 14 -->\n", "<g id=\"node15\" class=\"node\">\n", "<title>14</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"503.9816,-50 404.5809,-50 404.5809,0 503.9816,0 503.9816,-50\"/>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 2.75</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 13</text>\n", "<text text-anchor=\"middle\" x=\"454.2813\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 11.088</text>\n", "</g>\n", "<!-- 12->14 -->\n", "<g id=\"edge14\" class=\"edge\">\n", "<title>12->14</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M364.8935,-69.7884C374.6526,-64.8985 385.0747,-59.6765 395.2291,-54.5885\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"397.0802,-57.5759 404.4528,-49.967 393.9444,-51.3175 397.0802,-57.5759\"/>\n", "</g>\n", "</g>\n", "</svg>\n" ], "text/plain": [ "<graphviz.files.Source at 0x112746588>" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot_dat = tree.export_graphviz(reg, out_file=None, rotate=True)\n", "graph = gv.Source(dot_dat)\n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(c) Use cross-validation in order to determine the optimal level of tree complexity. Does pruning the tree improve the test MSE?\n", "\n", ".. omitting this as tree pruning isn't easily available in python world, the python community prefer to control variance with boosting, bagging and random forest." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(d) Use the bagging approach in order to analyze this data. What test MSE do you obtain? Use the importance() function to de- termine which variables are most important." ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "training rmse: 1.5726550643736208\n" ] }, { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x11289b940>" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4IAAAHICAYAAAAbVVRCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xu8bXVdL/zPl014SY9p7MoLuNHQQk3ULVaamoeMnlKsNEEzPI8nsiN28fSc43nsUaTj69Gup5OampG3CgXTtkYaeSG0TDaKICiJeIOsUMxLKgR+zx9zbJku1t6sDWvstff+vd+v13qtOX7jMr/zt8YaY37muMzq7gAAADCOAza6AAAAAPYsQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwmAM3uoD1cvDBB/eWLVs2ugwAAIANcd55532muzevZdr9Jghu2bIl27dv3+gyAAAANkRVfWKt0zo1FAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYA7c6AL2pAf8P6/a6BL2Guf9xs9sdAkAAMAGcUQQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYzKxBsKqOqapLqurSqnrmKuOfWlUXVtX5VfWuqjpiadz/mOa7pKp+eM46AQAARjJbEKyqTUlelORHkhyR5PjloDf5k+6+T3cfmeTXk/z2NO8RSY5Lcq8kxyR58bQ8AAAAbqY5jwgeleTS7r6su69JclqSY5cn6O4vLA1+c5KeHh+b5LTuvrq7P5bk0ml5AAAA3EwHzrjsOyf51NLw5UketHKiqnpakmckOSjJI5bmfc+Kee88T5kAAABj2fCbxXT3i7r77kn+e5Jf3Z15q+rEqtpeVduvvPLKeQoEAADYz8wZBK9IcsjS8F2mtp05Lcljdmfe7n5Zd2/t7q2bN2++meUCAACMYc4geG6Sw6vqsKo6KIubv2xbnqCqDl8a/NEkH5keb0tyXFXdoqoOS3J4kvfOWCsAAMAwZrtGsLuvraqTkrw1yaYkp3b3RVV1SpLt3b0tyUlVdXSSf0/yuSQnTPNeVFWvS3JxkmuTPK27r5urVgAAgJHMebOYdPeZSc5c0fbspce/uIt5n5fkefNVBwAAMKYNv1kMAAAAe5YgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAM5sCNLoB91ydPuc9Gl7BXOPTZF250CQAAsFscEQQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwmFmDYFUdU1WXVNWlVfXMVcY/o6ourqoLquptVXXXpXHXVdX508+2OesEAAAYyYFzLbiqNiV5UZIfSnJ5knOralt3X7w02fuTbO3uL1fVzyf59SSPn8Z9pbuPnKs+AACAUc15RPCoJJd292XdfU2S05IcuzxBd7+ju788Db4nyV1mrAcAAIDMGwTvnORTS8OXT20785Qkf7k0fMuq2l5V76mqx8xRIAAAwIhmOzV0d1TVTyfZmuRhS8137e4rqupuSd5eVRd290dXzHdikhOT5NBDD91j9QIAAOzL5jwieEWSQ5aG7zK1fYOqOjrJs5I8uruv3tHe3VdMvy9L8s4k91s5b3e/rLu3dvfWzZs3r2/1AAAA+6k5g+C5SQ6vqsOq6qAkxyX5hrt/VtX9krw0ixD4L0vtt6+qW0yPD07y4CTLN5kBAADgJprt1NDuvraqTkry1iSbkpza3RdV1SlJtnf3tiS/keQ2SU6vqiT5ZHc/Osl3J3lpVX0ti7D6/BV3GwUAAOAmmvUawe4+M8mZK9qevfT46J3M97dJ7jNnbQAAAKOa9QvlAQAA2PsIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYzaxCsqmOq6pKqurSqnrnK+GdU1cVVdUFVva2q7ro07oSq+sj0c8KcdQIAAIxktiBYVZuSvCjJjyQ5IsnxVXXEisnen2Rrd39PkjOS/Po07x2SPCfJg5IcleQ5VXX7uWoFAAAYyZxHBI9Kcml3X9bd1yQ5LcmxyxN09zu6+8vT4HuS3GV6/MNJzuruq7r7c0nOSnLMjLUCAAAMY84geOckn1oavnxq25mnJPnLmzgvAAAAa3TgRheQJFX100m2JnnYbs53YpITk+TQQw+doTIAAID9z5xHBK9IcsjS8F2mtm9QVUcneVaSR3f31bszb3e/rLu3dvfWzZs3r1vhAAAA+7M5g+C5SQ6vqsOq6qAkxyXZtjxBVd0vyUuzCIH/sjTqrUkeWVW3n24S88ipDQAAgJtptlNDu/vaqjopiwC3Kcmp3X1RVZ2SZHt3b0vyG0luk+T0qkqST3b3o7v7qqr6tSzCZJKc0t1XzVUrAADASGa9RrC7z0xy5oq2Zy89PnoX856a5NT5qgMAABjTrF8oDwAAwN5HEAQAABiMIAgAADAYQRAAAGAwgiAAAMBg1hwEq+qu05e/p6puVVW3na8sAAAA5rKmIFhVP5vkjCy+/D1J7pLkjXMVBQAAwHzWekTwaUkenOQLSdLdH0nybXMVBQAAwHzWGgSv7u5rdgxU1YFJep6SAAAAmNNag+DZVfX/JrlVVf1QktOTvGm+sgAAAJjLWoPgM5NcmeTCJD+X5MwkvzpXUQAAAMznwDVOd6skp3b3HyRJVW2a2r48V2EAAADMY61HBN+WRfDb4VZJ/nr9ywEAAGBuaw2Ct+zuL+0YmB7fep6SAAAAmNNag+C/VdX9dwxU1QOSfGWekgAAAJjTWq8R/KUkp1fVPyapJN+R5PGzVQUAAMBs1hQEu/vcqvquJPecmi7p7n+frywAAADmstYjgknywCRbpnnuX1Xp7lfNUhUAAACzWVMQrKpXJ7l7kvOTXDc1dxJBEAAAYB+z1iOCW5Mc0d09ZzEAAADMb613Df1gFjeIAQAAYB+31iOCBye5uKrem+TqHY3d/ehZqgIAAGA2aw2CJ89ZBAAAAHvOWr8+4uy5CwEAAGDPWNM1glX1vVV1blV9qaquqarrquoLcxcHAADA+lvrzWJemOT4JB9Jcqsk/znJi+YqCgAAgPmsNQimuy9Nsqm7r+vuP0pyzHxlAQAAMJe13izmy1V1UJLzq+rXk3w6uxEiAQAA2HusNcw9aZr2pCT/luSQJD8xV1EAAADMZ61B8DHd/dXu/kJ3P7e7n5Hkx+YsDAAAgHmsNQiesErbk9exDgAAAPaQXV4jWFXHJ3lCkrtV1balUbdNctWchQEAADCPG7tZzN9mcWOYg5P81lL7F5NcMFdRAAAAzGeXQbC7P1FVlyf5anefvYdqAgAAYEY3eo1gd1+X5GtVdbs9UA8AAAAzW+v3CH4pyYVVdVYWXx+RJOnuX5ilKgAAAGaz1iD4Z9MPAAAA+7g1BcHufmVVHZTkHlPTJd397/OVBQAAwFzWFASr6uFJXpnk40kqySFVdUJ3/818pQEAADCHtZ4a+ltJHtndlyRJVd0jyZ8mecBchQEAADCPG71r6OSbdoTAJOnuf0jyTfOUBAAAwJzWekRwe1W9PMlrpuEnJtk+T0kAAADMaa1B8OeTPC3Jjq+LOCfJi2epCAAAgFmt9a6hV1fVC5O8LcnXsrhr6DWzVgYAAMAs1nrX0B9N8pIkH83irqGHVdXPdfdfzlkcAAAA62937hr6g919aZJU1d2T/EUSQRAAAGAfs9a7hn5xRwicXJbkizPUAwAAwMx2566hZyZ5XZJO8rgk51bVTyRJd//ZTPUBAACwztYaBG+Z5J+TPGwavjLJrZI8KotgKAgCAADsI9Z619D/NHchAAAA7BlrvWvoYUmenmTL8jzd/eh5ygIAAGAuaz019I1J/jDJm7L4HsE1qapjkvxukk1JXt7dz18x/qFJ/leS70lyXHefsTTuuiQXToOfFDoBAADWx1qD4Fe7+3/vzoKralOSFyX5oSSXZ3FzmW3dffHSZJ9M8uQkv7LKIr7S3UfuznMCAABw49YaBH+3qp6T5K+SXL2jsbvft4t5jkpyaXdfliRVdVqSY5N8PQh298encWs+yggAAMDNs9YgeJ8kT0ryiFx/amhPwztz5ySfWhq+PMmDdqO2W1bV9iTXJnl+d79xN+YFAABgJ9YaBB+X5G7dfc2cxaxw1+6+oqruluTtVXVhd390eYKqOjHJiUly6KGH7sHSAAAA9l0HrHG6Dyb5lt1c9hVJDlkavsvUtibdfcX0+7Ik70xyv1WmeVl3b+3urZs3b97N8gAAAMa01iOC35Lkw1V1br7xGsFd3cnz3CSHT189cUWS45I8YS1PVlW3T/Ll7r66qg5O8uAkv77GWgEAANiFtQbB5+zugrv72qo6Kclbs/j6iFO7+6KqOiXJ9u7eVlUPTPKGJLdP8qiqem533yvJdyd56XQTmQOyuEbw4p08FQAAALthTUGwu8++KQvv7jOTnLmi7dlLj8/N4pTRlfP9bRY3qAEAAGCd7TIIVtW7uvshVfXFLO4S+vVRSbq7/8Os1QEAALDudhkEu/sh0+/b7plyAAAAmNta7xoKAADAfkIQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADGbWIFhVx1TVJVV1aVU9c5XxD62q91XVtVX12BXjTqiqj0w/J8xZJwAAwEhmC4JVtSnJi5L8SJIjkhxfVUesmOyTSZ6c5E9WzHuHJM9J8qAkRyV5TlXdfq5aAQAARjLnEcGjklza3Zd19zVJTkty7PIE3f3x7r4gyddWzPvDSc7q7qu6+3NJzkpyzIy1AgAADGPOIHjnJJ9aGr58apt7XgAAAHZhn75ZTFWdWFXbq2r7lVdeudHlAAAA7BPmDIJXJDlkafguU9u6zdvdL+vurd29dfPmzTe5UAAAgJHMGQTPTXJ4VR1WVQclOS7JtjXO+9Ykj6yq2083iXnk1AYAAMDNdOBcC+7ua6vqpCwC3KYkp3b3RVV1SpLt3b2tqh6Y5A1Jbp/kUVX13O6+V3dfVVW/lkWYTJJTuvuquWqFjfbg33vwRpew13j309+90SUAAOz3ZguCSdLdZyY5c0Xbs5cen5vFaZ+rzXtqklPnrA8AAGBE+/TNYgAAANh9giAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwRy40QUArLezH/qwjS5hr/Gwvzl7o0sAAPZCjggCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYNw1FIBdeuF/fdNGl7DXOOm3HrXRJQDAunBEEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADCYAze6AAAYxfN++rEbXcJe41mvOWOjSwAY2qxHBKvqmKq6pKourapnrjL+FlX12mn831fVlql9S1V9parOn35eMmedAAAAI5ntiGBVbUryoiQ/lOTyJOdW1bbuvnhpsqck+Vx3f2dVHZfkBUkeP437aHcfOVd9AAAAo5rziOBRSS7t7su6+5okpyU5dsU0xyZ55fT4jCT/sapqxpoAAACGN2cQvHOSTy0NXz61rTpNd1+b5PNJvnUad1hVvb+qzq6qH5ixTgAAgKHsrTeL+XSSQ7v7s1X1gCRvrKp7dfcXlieqqhOTnJgkhx566AaUCQAAsO+Z84jgFUkOWRq+y9S26jRVdWCS2yX5bHdf3d2fTZLuPi/JR5PcY+UTdPfLuntrd2/dvHnzDC8BAABg/zNnEDw3yeFVdVhVHZTkuCTbVkyzLckJ0+PHJnl7d3dVbZ5uNpOquluSw5NcNmOtAAAAw5jt1NDuvraqTkry1iSbkpza3RdV1SlJtnf3tiR/mOTVVXVpkquyCItJ8tAkp1TVvyf5WpKndvdVc9UKAAAwklmvEezuM5OcuaLt2UuPv5rkcavM9/okr5+zNgAAgFHN+oXyAAAA7H0EQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBHLjRBQAA3BQfet7bN7qEvcZ3P+sRG10CsI9xRBAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYA7c6AIAANh4J5988kaXsNfQF4zAEUEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGc+BGFwAAAPuT151+1EaXsNf4qce9d6NLYCcEQQAAYK913zPeutEl7DU+8NgfXrdlzXpqaFUdU1WXVNWlVfXMVcbfoqpeO43/+6rasjTuf0ztl1TV+r1iAACAwc0WBKtqU5IXJfmRJEckOb6qjlgx2VOSfK67vzPJ7yR5wTTvEUmOS3KvJMckefG0PAAAAG6mOY8IHpXk0u6+rLuvSXJakmNXTHNskldOj89I8h+rqqb207r76u7+WJJLp+UBAABwM80ZBO+c5FNLw5dPbatO093XJvl8km9d47wAAADcBNXd8yy46rFJjunu/zwNPynJg7r7pKVpPjhNc/k0/NEkD0pycpL3dPdrpvY/TPKX3X3Giuc4McmJ0+A9k1wyy4tZXwcn+cxGF7Ef0Z/rS3+uH325vvTn+tKf60t/rh99ub705/raF/rzrt29eS0TznnX0CuSHLI0fJepbbVpLq+qA5PcLsln1zhvuvtlSV62jjXPrqq2d/fWja5jf6E/15f+XD/6cn3pz/WlP9eX/lw/+nJ96c/1tb/155ynhp6b5PCqOqyqDsri5i/bVkyzLckJ0+PHJnl7Lw5Rbkty3HRX0cOSHJ7El5AAAACsg9mOCHb3tVV1UpK3JtmU5NTuvqiqTkmyvbu3JfnDJK+uqkuTXJVFWMw03euSXJzk2iRP6+7r5qoVAABgJLN+oXx3n5nkzBVtz156/NUkj9vJvM9L8rw569sg+9SprPsA/bm+9Of60ZfrS3+uL/25vvTn+tGX60t/rq/9qj9nu1kMAAAAe6c5rxEEAABgLyQIrqKqnlVVF1XVBVV1flU9qKo+XlUH78YyHl5Vb76Jz//kqnrhTZl3f1BV1039/sGqOr2qbr2T6c6sqm/Z0/VtlKr6jqo6rao+WlXnTa//HjM918Or6vPT3+FDVfWcnUx3p6o6Y7VxI6iqx1RVV9V3bXQtG+XG+qCqXjF9ndB6PNeTq+pOS8Mvr6ojdjH9KVV19Ho8996kqr600TXsK9a6P7kZy7/R/fW0Pf3+peGnVtXPrGcde9JSn+74eeYq09zk90C7eN79qh/XW1Vtmb6Wbbnt5Kr6lar63qr6+6V9+skbVOZeYVd9tcq067YP2xvNeo3gvqiqvi/JjyW5f3dfPYW/gza4rNF8pbuPTJKq+uMkT03y2ztGVlVlcVrz/7VB9e1x02t+Q5JXdvdxU9t9k3x7kn+Y6WnP6e4fq6pvTnJ+Vb2pu9+3VNOB3f2PWdzxd1THJ3nX9HvVsDyAPdIHVbUpyZOTfDDJPybJju+p3Znla9IZ1i73J3vIw5N8KcnfJkl3v2QPP/96+3qf7mEPz/7Vj3vSK5P8VHd/YNqW3nOjC9oXTF9tt19zRPCG7pjkM919dZJ092emN7tJ8vSqel9VXbjj0++q+uaqOrWq3ltV76+qY5cXVlUHTEcTv2Wp7SNV9e1VtbmqXl9V504/D95VYVV1/PTcH6yqFyy1HzPV9YGqett6dcRe4pwk3zl9enNJVb0qizeChywfpa2qn5mO4H6gql49te1W/+7lfjDJvy/v+Lr7A0neVVW/Ma0TF1bV45Ovf3J6dlX9eVVdVlXPr6onTuvphVV192m6V1TVS6pqe1X9Q1X92Mon7u5/S3JeFn+HJ1fVtqp6e5K3LX+qVlWbquo3p1ouqKqnT+0PmGo5r6reWlV3nL239oCquk2ShyR5SqY7Hk//7y+uqg9X1Vm1OGr72GncftcPO+mDqqoXTv+vf53k26b2Y6rq9KV5v37EoKoeWVV/N23HTp+Wm+l//AVV9b4sgubWJH9ci0+1b1VV76yqrdO694ql/4Nfnub/+ie507Keu8o2fPP0t7qoFkcYP1G7cfbHRpr68J1Vdca0zv1xVdU07oFV9bfTNvG9VXXbqrplVf3R9PrfX1U/OE375Kp649QPH6+qk6rqGdM076mqO0zT3b2q3jKtw+fUvnck/Jwk35kk0+v74PTzS1PblqV+/NDUr7eexi3vb7ZW1TtXLryqHlWLoy7vr6q/rsV+fksW4fOXp/X2B2rpyENVHTn18QVV9Yaquv3U/s5p3X9vLbbNPzB/99w80//4h6f/159Yav+GIy1Tn2+ZHq+27x66H2fybUk+nSTdfV13X7zB9ey1pnXmf1XV9iS/ODUfXSveJ03bi3Omfcr7ajpavavt8t5IELyhv8oiZPxDLd7QPWxp3Ge6+/5Jfj/Jjo3as7L4/sOjsniz/hu1OIKSJOnuryX58yQ/niRV9aAkn+juf07yu0l+p7sfmOQnk7x8Z0XV4nSoFyR5RJIjkzywFqdkbU7yB0l+srvvm53chXVfVItPYn4kyYVT0+FJXtzd9+ruTyxNd68kv5rkEVMf7PjHXXP/7gPunUUYW+knslgf7pvk6CzWvx0B475Z7Di/O8mTktxjWk9fnuTpS8vYkuSoJD+a5CVVdcvlJ6iqb03yvUkumprun+Sx3b38v5EkJ07LOrK7vyeLN+zflOT3pukfkOTU7D93Az42yVu6+x+SfLaqHpDF32NLkiOy6PPvS5L9uB9W64Mfz+LT5iOS/EySHady/XWSBy1tHx+f5LTpzfWvJjl62r5uT/KMpef4bHffv7tfM417Yncf2d1fWZrmyCR37u57d/d9kvzRTupdbRv+nCy24fdKckaSQ29aV2yY+yX5pSz6+25JHlyL7+59bZJfnLaJRyf5SpKnJempj45P8sql//d7Z7H+PjCLdfPL3X2/JH+Xxd8xWdwt7+nTOvwrSV68B17fuljen0zr6X9K8qAstm0/W1X3mya9Zxb7me9O8oUk/2U3nuZdSb536rfTkvy37v54kpdksS86srvPWTHPq5L892mbeWG+8aj6gdM2+5eyd51xcKv6xlNDHz+tR3+Q5FFJHpDkO25sIbvYd4/Sj3vS7yS5ZArJP7dyP88NHNTdW7v7t6bhLbnh+6R/SfJD0z7l8Un+99L8N9gu76nCd9d+f8hzd3X3l6adxA9kEexeW9ef//5n0+/zcv2nXY9M8uilT7tumRu+kXhtkmdn8ebkuGk4Weycj1j6oOA/1PRJ+CoemOSd3X1l8vVTXB6a5Lokf9PdH5vqv2r3XvFe6VZVdf70+Jwsvm/yTlkE6PesMv0jkpze3Z9JvqEPVu3f7t6frq95SJI/nb5n85+r6uws1pUvJDm3uz+dJFX10Sw+5EgWO8kfXFrG66YPLD5SVZcl2fF78K6hAAAJbklEQVQp/w9U1fuTfC3J86fv93xgkrN2sp4dneQl3X1tsvg7VNW9s3iDedb0d9iU6VPJ/cDxWXzYkCzerByfxTb19Kk//6mq3jGNv2f2z37YWR/sWCf/sRZHj3d8t+xbkjyqFteV/miS/5bkYVnsLN899c1BWYSPHV6bG3dZkrtV1e8l+Ytcv66vtNo2/CGZPqjr7rdU1efW8Hx7k/d29+VJMm03tyT5fJJPd/e5SdLdX5jGPySLDyTS3R+uqk8k2XGd8Tu6+4tJvlhVn0/ypqn9wiTfM+2bvj/J6Uvb1FvM/NrWw2r7k59P8obpbIdU1Z9lsc/fluRT3f3uafrXJPmFJL+5xue6SxbvGe6YxXr8sV1NXFW3S/It3X321PTKJKcvTbK8vm5ZYw17wg1ODa2qI5N8rLs/Mg2/JosPB3dlZ/vuUfpxve3sawC6u0+Z3jc+MskTsthWP3xPFbYX2mlfTb9X7ndWe5/0sSQvnNb963L9tjRZfbv8rnWqfV0JgquY3sC8M8k7q+rCJCdMo66efl+X6/uusjgad8nyMqrq25cG/y6L0+o2J3lMkv85tR+QxadeX10x7zq9kn3WajuZJPm33VzOqv27j7oou38t3tVLj7+2NPy1fOP//soN4o7hc7r7BqeKZvf+DpXkou7+vt2YZ69Xi1PlHpHkPlXVWQS7zuI6zlVnyX7WDzehD5JFWDwpyVVJtnf3F6dTZs7q7uN3Ms+Nrm/d/blaXDP7w1kcBf+pJP/3KpOutg3f1y3/n9+c13Vj24sDkvzrBl0bdnPsbH+yMzvbHl6b68+i2tnRlN9L8tvdva2qHp7k5N2q9Ib2l/V1ue+SnfffDvrxpvlsktuvaLtDpiDd3R9N8vtV9QdJrqyqb+3uz+7hGvcWu+yr3HC/s9p24ZeT/HMWZ18dkGT5veZ6bZdn59TQFarqnlV1+FLTkUk+sbPpk7w1i2sHd1yXcb+VE/TiyxrfkMUF6h9a+sf7qyydojd9qrAz703ysKo6uBYX+h6f5Owk70ny0Ko6bFrGHW7kJe6P3p7kcdMpjMt9sDv9u7d7e5JbVNXXP2Gtqu9J8q9JHl+La6Q2Z3GU+L27uezH1eLatrtncQrDJTc2w06cleTnplOwdvwdLkmyuRY3YUpVfdN0OtC+7rFJXt3dd+3uLd19SBY7kKuS/OTUn9+e6z9x3R/7YWd98Nlcv07eMd949PnsLE4t/tksQmGy2IY9uKp2XLv1zbXzu+F+McltVzZOp5ce0N2vz+JUs/vvxut4dxbBMVX1yNzwzcG+6JIkd5yO4KcW1wcemMURsSdObffI4uyVNf2/T0cVP1ZVj5vmryl874vOSfKYqrr1dKryj09tSXLojv/TLI6c7PgU/+NZnPKYLC41WM3tklwxPT5hqX3V9ba7P5/kc3X9dWtPyuJ/ZF/04SRbpv1IsniPssPHM/1PVtX9kxw2te9s3z1yP95k09lOn66qRyRf789jsriXwI/ueJ+axWU212Xx/mFIu+qrncyy2vuk22Vx5sXXsljnNs1f+foTBG/oNllcN3FxVV2QxSlLJ+9i+l9L8k1JLqiqi6bh1bw2yU/nGw83/0KSrbW4uPniLD7J3uHJVXX5jp8sVrBnJnlHkg8kOa+7/3w6VfTEJH9WVR/I2k6j2q9090VZXNNy9tQHO+4It6v+3adMHyb8eBYXLH90Wtf+/yR/kuSCLNaJt2dxLcU/7ebiP5lFePzLJE+9GUdQXz4t64Lp7/CE7r4mi8Dwgqnt/Fx/zdi+7Pjc8MjX67O4LubyJBdncVrZ+5J8fj/th531wR2TfCSLPnhVlk7znM62eHMW12q9eWq7Mou7gf7ptM39u1x/evJKr8ji+ozzq+pWS+13zuIMjvOz6Pf/sRuv47lJHlmLmx49Lsk/ZfGGc581rW+PT/J70/p2VhZHYV6c5IDpTJfXJnlyTzdGW6MnJnnKtMyLsrhGdJ/Ti7sfvyKL7d7fJ3l5d79/Gn1JkqdV1Yey+FDg96f25yb53VrcQOK6nSz65CxOnT0vyWeW2t+U5Men9XblzUpOyOLa7guy+OD5lJvz2vaQldcIPn/ab5yY5C9qcbOYf1ma/vVJ7jDtt07KdKfrXey7T84Y/TiHn0ny/03bwrcnee50JPBJWVwjeH6SV2dxrfXO1uNR7KyvVrPa+6QXJzlhWne/K7t/1tpeoRbvL4ERVdUrkry5u4f9LsD1VtN1qNOn3O9N8uCbEM7ZQ6rqFkmum65h/L4kv78Pnv7IOqjFnSnf3N333uBSAPaIvfacVYB91Jtr8XUxByX5NSFwr3doktdV1QFJrsnitFUA2O85IggAADAY1wgCAAAMRhAEAAAYjCAIAAAwGEEQAG6mqvrS9PtOVbXLu/BW1S9V1a33TGUAsDo3iwGAVVTVprV+11ZVfam7b7PGaT+eZGt3f+bGpr0ptQDAWjgiCMBwqmpLVX24qv64qj5UVWdU1a2r6uNV9YLpS7EfV1V3r6q3VNV5VXVOVX3XNP9hVfV3VXVhVf3PFcv94PR4U1X9ZlV9sKouqKqnV9UvJLlTkndU1Tum6Y6flvPBqnrB0rK+VFW/NX1h8fftyf4BYP/newQBGNU9kzylu99dVacm+S9T+2e7+/5JUlVvS/LU7v5IVT0oyYuTPCLJ72bx5fOvqqqn7WT5JybZkuTI6Qvr79DdV1XVM5L8YHd/pqrulOQFSR6Q5HNJ/qqqHtPdb0zyzUn+vrv/6yyvHoChOSIIwKg+1d3vnh6/JslDpsevTZKquk2S709yelWdn+SlSe44TfPgJH86PX71TpZ/dJKXdve1SdLdV60yzQOTvLO7r5ym++MkD53GXZfk9TflhQHAjXFEEIBRrbxIfsfwv02/D0jyr9195BrnX29fdV0gAHNxRBCAUR1aVTuuvXtCknctj+zuLyT5WFU9Lklq4b7T6HcnOW56/MSdLP+sJD9XVQdO899hav9ikttOj9+b5GFVdXBVbUpyfJKzb97LAoAbJwgCMKpLkjytqj6U5PZJfn+VaZ6Y5CnTDVsuSnLs1P6L07wXJrnzTpb/8iSfTHLBNP8TpvaXJXlLVb2juz+d5JlJ3pHkA0nO6+4/v/kvDQB2zddHADCcqtqS5M3dfe8NLgUANoQjggAAAINxRBAAAGAwjggCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwfwfmXi8fCM53dAAAAAASUVORK5CYII=\n", "text/plain": [ "<Figure size 1080x540 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "reg = ensemble.RandomForestRegressor(n_estimators=100, max_features=None)\n", "train_x = train.drop('Sales', axis=1)\n", "train_y = train.Sales\n", "reg.fit(train_x,train_y)\n", "test_x = test.drop('Sales', axis=1)\n", "test_y = test.Sales\n", "preds = reg.predict(test_x)\n", "rmse = np.sqrt(metrics.mean_squared_error(test_y, preds))\n", "print('training rmse: {}'.format(rmse))\n", "\n", "_,_ = plt.subplots(figsize=(15, 7.5))\n", "bar_df = pd.DataFrame({'predictor': train_x.columns, 'importance' : reg.feature_importances_})\n", "sns.barplot(x='predictor', y='importance', data=bar_df.sort_values('importance', ascending=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(e) Use random forests to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained." ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "training rmse: 1.7675543885549883\n" ] }, { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x112b8ca58>" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4IAAAHICAYAAAAbVVRCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xm4ZXdZJ/rvm8QwCCKYqMwVEJAIEqAIKghIR4ytEFQiiYihmzZiExxob1/64oUQ2+eCQ3ttZsQogwokOBQYwcgQA4pJBUImiCRhSkQNBJkJJnn7j7WK7JycUzmVnF2nqn6fz/Ocp/b6rWG/+1drr7W/ew27ujsAAACMY7/NLgAAAIDdSxAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADOaAzS5goxx00EG9ZcuWzS4DAABgU5xzzjmf7u6D1zPtPhMEt2zZku3bt292GQAAAJuiqj6+3mmdGgoAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDAHbHYBu9ND/6/XbnYJe4xzfvNnNrsEAABgkzgiCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMJilBsGqOrKqLq6qS6rqOauMf3ZVXVRV51XVO6rqngvjrq2qc+e/bcusEwAAYCQHLGvBVbV/kpcm+cEklyc5u6q2dfdFC5N9IMnW7v5yVf18kt9I8uR53Fe6+7Bl1QcAADCqZR4RPDzJJd19WXd/Lckbkhy1OEF3v6u7vzwPvi/J3ZZYDwAAAFluELxrkk8uDF8+t63l6Un+amH41lW1vareV1VPXG2Gqjp+nmb7lVdeecsrBgAAGMDSTg3dFVX100m2Jnn0QvM9u/uKqrpXkndW1fndfenifN39qiSvSpKtW7f2bisYAABgL7bMI4JXJLn7wvDd5rYbqKojkjw3yRO6++od7d19xfzvZUneneTBS6wVAABgGMsMgmcnuU9VHVJVByY5JskN7v5ZVQ9O8spMIfBfF9rvWFW3mh8flOQRSRZvMgMAAMDNtLRTQ7v7mqo6Icnbk+yf5OTuvrCqTkqyvbu3JfnNJLdLckpVJcknuvsJSe6f5JVVdV2msPrCFXcbBQAA4GZa6jWC3X1aktNWtD1v4fERa8z3d0keuMzaAAAARrXUH5QHAABgzyMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYA7Y7ALYe33ipAdudgl7hHs87/zNLgEAAHaJI4IAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEsNQhW1ZFVdXFVXVJVz1ll/LOr6qKqOq+q3lFV91wYd1xVfWT+O26ZdQIAAIxkaUGwqvZP8tIkP5zk0CTHVtWhKyb7QJKt3f3dSU5N8hvzvHdK8vwkD09yeJLnV9Udl1UrAADASJZ5RPDwJJd092Xd/bUkb0hy1OIE3f2u7v7yPPi+JHebH/9QktO7+6ru/myS05McucRaAQAAhrHMIHjXJJ9cGL58blvL05P81a7MW1XHV9X2qtp+5ZVX3sJyAQAAxrBH3Cymqn46ydYkv7kr83X3q7p7a3dvPfjgg5dTHAAAwD5mmUHwiiR3Xxi+29x2A1V1RJLnJnlCd1+9K/MCAACw65YZBM9Ocp+qOqSqDkxyTJJtixNU1YOTvDJTCPzXhVFvT/K4qrrjfJOYx81tAAAA3EIHLGvB3X1NVZ2QKcDtn+Tk7r6wqk5Ksr27t2U6FfR2SU6pqiT5RHc/obuvqqpfyxQmk+Sk7r5qWbUCAACMZGlBMEm6+7Qkp61oe97C4yN2Mu/JSU5eXnUAAABj2iNuFgMAAMDuIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYzLqDYFXds6qOmB/fpqpuv7yyAAAAWJZ1BcGq+tkkpyZ55dx0tyR/vqyiAAAAWJ71HhF8ZpJHJPl8knT3R5J867KKAgAAYHnWGwSv7u6v7RioqgOS9HJKAgAAYJnWGwTPqKr/J8ltquoHk5yS5C3LKwsAAIBlWW8QfE6SK5Ocn+TnkpyW5FeXVRQAAADLc8A6p7tNkpO7+/eSpKr2n9u+vKzCAAAAWI71HhF8R6bgt8NtkvzNxpcDAADAsq03CN66u7+4Y2B+fNvllAQAAMAyrTcIfqmqHrJjoKoemuQryykJAACAZVrvNYK/lOSUqvqnJJXk25M8eWlVAQAAsDTrOiLY3Wcn+c4kP5/kGUnu393n3NR8VXVkVV1cVZdU1XNWGf+oqnp/VV1TVU9aMe7aqjp3/tu2vpcDAADATVnvEcEkeViSLfM8D6mqdPdr15p4vrPoS5P8YJLLk5xdVdu6+6KFyT6R5GlJfmWVRXyluw/bhfoAAABYh3UFwap6XZJ7Jzk3ybVzcydZMwgmOTzJJd192byMNyQ5KsnXg2B3f2wed92uFg4AAMDNs94jgluTHNrdvQvLvmuSTy4MX57k4bsw/62ranuSa5K8sLv/fBfmBQAAYA3rDYIXZLpBzKeWWMtK9+zuK6rqXkneWVXnd/elixNU1fFJjk+Se9zjHruxNAAAgL3XeoPgQUkuqqqzkly9o7G7n7CTea5IcveF4bvNbevS3VfM/15WVe9O8uAkl66Y5lVJXpUkW7du3ZWjlQAAAMNabxA88WYs++wk96mqQzIFwGOS/NR6ZqyqOyb5cndfXVUHJXlEkt+4GTUAAACwwrqCYHefsasL7u5rquqEJG9Psn+Sk7v7wqo6Kcn27t5WVQ9L8mdJ7pjk8VX1gu7+riT3T/LK+SYy+2W6RvCiNZ4KAACAXbDeu4Z+T5IXZwpoB2YKdl/q7m/a2XzdfVqS01a0PW/h8dmZThldOd/fJXngemoDAABg16zrB+WTvCTJsUk+kuQ2Sf5Lpt8IBAAAYC+z3iCY7r4kyf7dfW13/0GSI5dXFgAAAMuy3pvFfLmqDkxyblX9RqafkVh3iAQAAGDPsd4w99R52hOSfCnTz0L8+LKKAgAAYHnWGwSf2N1f7e7Pd/cLuvvZSX50mYUBAACwHOsNgset0va0DawDAACA3WSn1whW1bGZfgT+XlW1bWHU7ZNctczCAAAAWI6bulnM32W6McxBSX57of0LSc5bVlEAAAAsz06DYHd/vKouT/LV7j5jN9UEAADAEt3kNYLdfW2S66rqDruhHgAAAJZsvb8j+MUk51fV6Zl+PiJJ0t2/sJSqAAAAWJr1BsE/nf8AAADYy60rCHb3a6rqwCT3nZsu7u5/X15ZAAAALMu6gmBVPSbJa5J8LEkluXtVHdfdf7u80gAAAFiG9Z4a+ttJHtfdFydJVd03yZ8keeiyCgMAAGA5bvKuobNv2BECk6S7/zHJNyynJAAAAJZpvUcEt1fVq5O8fh5+SpLtyykJAACAZVpvEPz5JM9MsuPnIs5M8rKlVAQAAMBSrfeuoVdX1UuSvCPJdZnuGvq1pVYGAADAUqz3rqE/kuQVSS7NdNfQQ6rq57r7r5ZZHAAAABtvV+4a+gPdfUmSVNW9k/xlEkEQAABgL7Peu4Z+YUcInF2W5AtLqAcAAIAl25W7hp6W5E1JOsnRSc6uqh9Pku7+0yXVBwAAwAZbbxC8dZJ/SfLoefjKJLdJ8vhMwVAQBAAA2Eus966h/2nZhQAAALB7rPeuoYckeVaSLYvzdPcTllMWAAAAy7LeU0P/PMnvJ3lLpt8RBAAAYC+13iD41e7+30utBAAAgN1ivUHwd6vq+Un+OsnVOxq7+/1LqQoAAIClWW8QfGCSpyZ5bK4/NbTnYQAAAPYi6w2CRye5V3d/bZnFAAAAsHz7rXO6C5J88zILAQAAYPdY7xHBb07y4ao6Oze8RtDPRwAAAOxl1hsEn7/UKgAAANht1hUEu/uMZRcCAADA7rHTIFhV7+nuR1bVFzLdJfTro5J0d3/TUqsDAABgw+00CHb3I+d/b797ygEAAGDZ1nvXUAAAAPYRgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwB2x2AUDyiBc/YrNL2GO891nv3ewSAAD2eUs9IlhVR1bVxVV1SVU9Z5Xxj6qq91fVNVX1pBXjjquqj8x/xy2zTgAAgJEsLQhW1f5JXprkh5McmuTYqjp0xWSfSPK0JH+8Yt47JXl+kocnOTzJ86vqjsuqFQAAYCTLPCJ4eJJLuvuy7v5akjckOWpxgu7+WHefl+S6FfP+UJLTu/uq7v5sktOTHLnEWgEAAIaxzCB41ySfXBi+fG7bsHmr6viq2l5V26+88sqbXSgAAMBI9uq7hnb3q7p7a3dvPfjggze7HAAAgL3CMoPgFUnuvjB8t7lt2fMCAACwE8sMgmcnuU9VHVJVByY5Jsm2dc779iSPq6o7zjeJedzcBgAAwC20tCDY3dckOSFTgPtQkjd194VVdVJVPSFJquphVXV5kqOTvLKqLpznvSrJr2UKk2cnOWluAwAA4BZa6g/Kd/dpSU5b0fa8hcdnZzrtc7V5T05y8jLrAwAAGNFefbMYAAAAdp0gCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwB2x2AQAb7YxHPXqzS9hjPPpvz9jsEgCAPZAjggAAAIMRBAEAAAbj1FAAduol/+0tm13CHuOE3378ZpcAABvCEUEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABnPAZhcAAKP49Z9+0maXsMd47utP3ewSAIbmiCAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGL8jCADslT706+/c7BL2GPd/7mM3uwRgL+OIIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAaz1CBYVUdW1cVVdUlVPWeV8beqqjfO4/+hqrbM7Vuq6itVde7894pl1gkAADCSA5a14KraP8lLk/xgksuTnF1V27r7ooXJnp7ks939HVV1TJIXJXnyPO7S7j5sWfUBAACMaplHBA9Pckl3X9bdX0vyhiRHrZjmqCSvmR+fmuQ/VFUtsSYAAIDhLTMI3jXJJxeGL5/bVp2mu69J8rkk3zKPO6SqPlBVZ1TV96/2BFV1fFVtr6rtV1555cZWDwAAsI/aU28W86kk9+juByd5dpI/rqpvWjlRd7+qu7d299aDDz54txcJAACwN1pmELwiyd0Xhu82t606TVUdkOQOST7T3Vd392eSpLvPSXJpkvsusVYAAIBhLDMInp3kPlV1SFUdmOSYJNtWTLMtyXHz4ycleWd3d1UdPN9sJlV1ryT3SXLZEmsFAAAYxtLuGtrd11TVCUnenmT/JCd394VVdVKS7d29LcnvJ3ldVV2S5KpMYTFJHpXkpKr69yTXJXlGd1+1rFoBAABGsrQgmCTdfVqS01a0PW/h8VeTHL3KfG9O8uZl1gYAADCqPfVmMQAAACyJIAgAADCYpZ4aCgDA3uHEE0/c7BL2GPqCETgiCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYARBAACAwQiCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDCCIAAAwGAEQQAAgMEIggAAAIMRBAEAAAYjCAIAAAxGEAQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwmAM2uwAAANiXvOmUwze7hD3GTx591maXwBocEQQAABiMIAgAADAYQRAAAGAwgiAAAMBgBEEAAIDBCIIAAACD8fMRAADAHutBp759s0vYY3zwST+0YctyRBAAAGAwgiAAAMBgBEEAAIDBCIIAAACDEQQBAAAGIwgCAAAMRhAEAAAYjCAIAAAwGEEQAABgMIIgAADAYJYaBKvqyKq6uKouqarnrDL+VlX1xnn8P1TVloVx/2Nuv7iqfmiZdQIAAIxkaUGwqvZP8tIkP5zk0CTHVtWhKyZ7epLPdvd3JPmdJC+a5z00yTFJvivJkUleNi8PAACAW2iZRwQPT3JJd1/W3V9L8oYkR62Y5qgkr5kfn5rkP1RVze1v6O6ru/ujSS6ZlwcAAMAttMwgeNckn1wYvnxuW3Wa7r4myeeSfMs65wUAAOBmqO5ezoKrnpTkyO7+L/PwU5M8vLtPWJjmgnmay+fhS5M8PMmJSd7X3a+f238/yV9196krnuP4JMfPg/dLcvFSXszGOijJpze7iH2I/txY+nPj6MuNpT83lv7cWPpz4+jLjaU/N9be0J/37O6D1zPhAUss4ookd18Yvtvctto0l1fVAUnukOQz65w33f2qJK/awJqXrqq2d/fWza5jX6E/N5b+3Dj6cmPpz42lPzeW/tw4+nJj6c+Nta/15zJPDT07yX2q6pCqOjDTzV+2rZhmW5Lj5sdPSvLOng5RbktyzHxX0UOS3CfJWUusFQAAYBhLOyLY3ddU1QlJ3p5k/yQnd/eFVXVSku3dvS3J7yd5XVVdkuSqTGEx83RvSnJRkmuSPLO7r11WrQAAACNZ5qmh6e7Tkpy2ou15C4+/muToNeb99SS/vsz6NsledSrrXkB/biz9uXH05cbSnxtLf24s/blx9OXG0p8ba5/qz6XdLAYAAIA90zKvEQQAAGAPJAiuoqqeW1UXVtV5VXVuVT28qj5WVQftwjIeU1VvvZnP/7SqesnNmXdfUFXXzv1+QVWdUlW3XWO606rqm3d3fXuzqnpiVXVVfedm17JZbqoPquoP55+/2YjnelpV3WVh+NVVdehOpj+pqo7YiOfenarq26vqDVV1aVWdM78377uk53pMVX1u3kZ8qKqev8Z0d6mqU1cbt7erqi9udg17i/XuT27B8m9yfz2vs9+3MPyMqvqZjaxjd1ro0x1/z1llmpv9GWgnz7tP9eNGq6ot88+yLbadWFW/UlXfU1X/sLDdPHGTytwj7KyvVpl2wz4T7ImWeo3g3qiqvjfJjyZ5SHdfPYe/Aze5rNF8pbsPS5Kq+qMkz0jyv3aMrKrKdFrzf9yk+vZmxyZ5z/zvqh+gB7Bb+qCq9k/ytCQXJPmnJNnxu6prWbyGem8xvx//LMlruvuYue1BSb4tyT8u6WnP7O4frapvTHJuVb2lu9+/UNMB3f1Pme5Gzdh2uj/ZTR6T5ItJ/i5JuvsVu/n5N9rX+3Q3e0z2rX7cnV6T5Ce7+4Pzvul+m13Q3mD+abt9miOCN3bnJJ/u7quTpLs/PX+gSJJnVdX7q+r8HUcTquobq+rkqjqrqj5QVUctLqyq9puPJn7zQttHqurbqurgqnpzVZ09/z1iZ4VV1bHzc19QVS9aaD9yruuDVfWOjeqIPcSZSb5j/vbm4qp6baYP1ndfPEpbVT8zH8H9YFW9bm7bpf7d11XV7ZI8MsnTM9+hd14/X1ZVH66q0+cjOU+axz20qs6Yj/C8varuvInlb4g1+qCq6iXz+vU3Sb51bj+yqk5ZmPfr33BX1eOq6u/n990p83Izr5Mvqqr3ZwqaW5P80fwt7G2q6t1VtbWq9p+/Zbxgfk//8jz/1795nJf1glW2OQfP/1cX1nSE8eO1C2crLMEPJPn3xQ9l3f3BJO+pqt9ceI1PTr7ej2dU1V9U1WVV9cKqesq8DT2/qu49T/eHVfWKqtpeVf9YVT+68om7+0tJzsm0jXhaVW2rqncmeUctfOM79/dvzbWcV1XPmtv36nV87st3V9Wp83v4j6qq5nEPq6q/m7eJZ1XV7avq1lX1B3M/f6CqfmCe9mlV9efzevWxqjqhqp49T/O+qrrTPN29q+ptc3+dWXvfmQVnJvmOJJlf3wXz3y/NbVsW+vFDc7/edh63uL/ZWlXvXrnwqnp8TUddPlBVf1PTfn5LpvD5y/N24Ptr4chDVR029/F5VfVnVXXHuf3d87bkrHn9//7ld88tU9M288Pz9u/HF9pvcKRl7vMt8+PV9t1D9+OSfGuSTyVJd1/b3Rdtcj17rHmd+f+ranuSX5ybj1i5L5q3F2fWtI9+f81Hq3e2Xd4TCYI39teZQsY/1vQB+dEL4z7d3Q9J8vIkOzZqz830+4eHZ/pA9Js1fUudJOnu65L8RZIfS5KqeniSj3f3vyT53SS/090PS/ITSV69VlE1nV72oiSPTXJYkofVdIrbwUl+L8lPdPeDssZdWPdGNX0T88NJzp+b7pPkZd39Xd398YXpvivJryZ57NwHO9646+7fQRyV5G3d/Y9JPlNVD820s96S5NAkT03yvUlSVd+Q5MVJntTdD01ycvaNu/iu1gc/lunb0UOT/EySHace/U2Shy+8n5+c5A3zh8FfTXLEvD3YnuTZC8/xme5+SHe/fh73lO4+rLu/sjDNYUnu2t0P6O4HJvmDNepdbZvz/EzbnO9KcmqSe9y8rtgwD8gUxlb68Uyv80FJjsi0bdwRtB6U6UPd/TOtd/edt6GvTvKshWVsSXJ4kh9J8oqquvXiE1TVtyT5niQXzk0PybTOLm63k+T4eVmHdfd3Zwrn+8o6/uAkv5Rp/b1XkkfU9Nu9b0zyi/M28YgkX0nyzCQ9r3PHJnnNQp8+INP/2cMy9cOXu/vBSf4+0/sime6W96y5v34lyct2w+vbEIv7k/l9/5+SPDzT+vOzVfXgedL7ZdrP3D/J55P81114mvck+Z65396Q5L9398eSvCLTvuiw7j5zxTyvTfJ/z+vl+bnhWQoHzO+LX8qedQbHbeqGp4Y+eV6Pfi/J45M8NMm339RCdrLvHqUfd6ffSXLxHJJ/buW2lBs5sLu3dvdvz8NbcuN90b8m+cF5H/3kJP97Yf4bbZd3V+G7ap8/5LmruvuL807i+zMFuzfW9ee//+n87zm5/tuuxyV5wsK3XbfOjT+YvTHJ8zJ92DtmHk6mnfOhC18UfFPNRxZW8bAk7+7uK5Ovn+LyqCTXJvnb7v7oXP9Vu/aK90i3qapz58dnZvq9ybtkCtDvW2X6xyY5pbs/ndygD1bt3+4e9fqaYzOF42TauR6baRtwyvyFxT9X1bvm8ffL9MHw9Ln/9s/8beJebq0++JP5t0r/qaYjSjt+C/VtSR5f07VmP5Lkvyd5dKaN+3vnvjkw04flHd6Ym3ZZkntV1YuT/GWmL6BWs9o255GZv1jq7rdV1WfX8Xyb4ZG5vl//parOyLQd+3ySs7v7U0lSVZfm+td/fqbt7g5vmtfNj1TVZUl2HIH6/qr6QJLrkrxw/u3ZhyU5fY1t4BFJXtHd1yTTNqKqHpB9Yx0/q7svT5J5u7klyeeSfKq7z06S7v78PP6RmcJvuvvDVfXxJDuu5XxXd38hyReq6nNJ3jK3n5/ku+d90/clOWVhm3qrJb+2jbDa/uTnk/zZfEQ5VfWnmfb525J8srvfO0//+iS/kOS31vlcd8v0meHOmbYLH93ZxFV1hyTf3N1nzE2vSXLKwiSL7/8t66xhd7jRqaFVdViSj3b3R+bh12f6AmZn1tp3j9KPG22tnwHo7j5p/tz4uCQ/lWnf95jdVdhAPExOAAAIDUlEQVQeaM2+mv9duR9fbV/00SQvmdf9a3P9tjRZfbv8ng2qfUMJgquYP7i8O8m7q+r8JMfNo66e/7021/ddZToad/HiMqrq2xYG/z7TqUsHJ3likv85t++X6Vuvr66Yd4NeyV5rtZ1MknxpF5ezav+OqKZTux6b5IFV1Zk+9Hama7tWnSXJhd39vbupxKW7GX2QTGHxhCRXJdne3V+YT/E4vbuPXWOem1xPu/uzNV1H90OZjoz9ZJL/vMqkq21z9jQXZtevxbt64fF1C8PX5Yavc+XOesfwmd19o1NFs2vbiH1lHV/sy1uyntzU/8l+Sf5tk64NuyXW2p+sZa117ppcfxbVWkdTXpzkf3X3tqp6TJITd6nSG9sb3v/rsdh3ydr9t4N+vHk+k+SOK9rulDlId/elSV5eVb+X5Mqq+pbu/sxurnFPsdO+yo33JattF345yb9kOsNlvySLnzU3aru8dE4NXaGq7ldV91loOizJx9eaPsnbM107uOO6jAevnKCnH2v8s0wXqH9o4Y3311k4DWr+VmEtZyV5dFUdVNOFvscmOSPJ+5I8qqoOmZdxp5t4ifuidyY5ej5NbLEPdqV/93VPSvK67r5nd2/p7rtn2uBdleQnarpW8Nty/TeEFyc5uKabJ6WqvmE+jWdvtlYffCbJk2u6juzOueERqTMynW74s5lCYTK95x5RVTuuNfrGWvsOmV9IcvuVjfPppft195sznRr1kF14He/NFBxTVY/LjXdmu9s7k9yqqr7+7X9VfXeSf8v1/XpwpjMYztrFZR89r5v3znR6zcU3NcMaTk/yc/PpgTu2EfviOr7DxUnuPB8lTU3XBx6Q6YjYU+a2+2Y6e2VdfTofVfxoVR09z1/zlxl7ozOTPLGqbjuf+v1jc1uS3GPHOpHpyMmOb/E/lumUx2S61GA1d0hyxfz4uIX2VbcD3f25JJ+t669be2qmbc7e6MNJtszv1WT6jLLDxzJv46rqIUkOmdvX2neP3I8323y206eq6rHJ1/vzyEzXa//Ijs+pmS6zuTbTNnpIO+urNWZZbV90h0xnXlyXaZ3bf/mVbzxB8MZul+m6iYuq6rxMp4CduJPpfy3JNyQ5r6ounIdX88YkP50bHm7+hSRba7q4+aJMRwZ2eFpVXb7jL9MK9pwk70rywSTndPdfzKeKHp/kT6vqg1nfaWn7lO6+MNM1LWfMfbDjjnA769/RHJsbH/l6c6brOC5PclGm06Den+Rz3f21TMHpRXOfnpvrr53bW63VB3dO8pFMffDaLJzmOZ8d8NZM1xa9dW67MtPdQP9k3kb8fa4/ZXGlP8x0PcG5VXWbhfa7Zjrj4NxM/f4/duF1vCDJ42q6EcrRSf450wekTTF/0fVjmS6mv3TeDv5/Sf44yXmZtlfvzHSdzz/v4uI/kSk8/lWSZ9yCo/uvnpd13rw+/9Q+uo4nSebX9uQkL55f2+mZjsK8LMl+85kub0zytJ5vjLZOT0ny9HmZF2a65nav09MdZv8w07r1D0le3d0fmEdfnOSZVfWhTF+yvHxuf0GS363pBhLXrrHoEzOdOntOkk8vtL8lyY/N24GVNys5LtP1s+dl+uL5pFvy2naTldcIvnB+bx6f5C9rulnMvy5M/+Ykd5q3DSdkvpvwTvbdJ2aMflyGn0ny/877lncmecF8JPCpma4RPDfJ6zJdu77WejyKtfpqNavti16W5Lh53f3O7PpZa3uEmvbhwKhqvm5y/lb2rCSPuBkf2NlNqupWSa6dr2H83iQv3wtP17tJVfWHSd7a3fvkbwGy56npzpRv7e4HbHIpALvFHnvOKrDbvLWmnzc5MMmvCYF7vHskeVNV7Zfka5lOWwUA2CWOCAIAAAzGNYIAAACDEQQBAAAGIwgCAAAMRhAEgFuoqr44/3uXqtrpnU6r6peq6ra7pzIAWJ2bxQDAKqpq//X+1lZVfbG7b7fOaT+WZGt3f/qmpr05tQDAejgiCMBwqmpLVX24qv6oqj5UVadW1W2r6mNV9aL5R7GPrqp7V9Xbquqcqjqzqr5znv+Qqvr7qjq/qv7niuVeMD/ev6p+q6ouqKrzqupZVfULSe6S5F1V9a55umPn5VxQVS9aWNYXq+q35x8s/t7d2T8A7Pv8jiAAo7pfkqd393ur6uQk/3Vu/0x3PyRJquodSZ7R3R+pqocneVmSxyb53SQv7+7XVtUz11j+8Um2JDmsu6+pqjt191VV9ewkP9Ddn66quyR5UZKHJvlskr+uqid2958n+cYk/9Dd/20prx6AoTkiCMCoPtnd750fvz7JI+fHb0ySqrpdku9LckpVnZvklUnuPE/ziCR/Mj9+3RrLPyLJK7v7miTp7qtWmeZhSd7d3VfO0/1RkkfN465N8uab88IA4KY4IgjAqFZeJL9j+Evzv/sl+bfuPmyd82+0r7ouEIBlcUQQgFHdo6p2XHv3U0nesziyuz+f5KNVdXSS1ORB8+j3JjlmfvyUNZZ/epKfq6oD5vnvNLd/Icnt58dnJXl0VR1UVfsnOTbJGbfsZQHATRMEARjVxUmeWVUfSnLHJC9fZZqnJHn6fMOWC5McNbf/4jzv+UnuusbyX53kE0nOm+f/qbn9VUneVlXv6u5PJXlOkncl+WCSc7r7L275SwOAnfPzEQAMp6q2JHlrdz9gk0sBgE3hiCAAAMBgHBEEAAAYjCOCAAAAgxEEAQAABiMIAgAADEYQBAAAGIwgCAAAMBhBEAAAYDD/B1iKMYvXtjnGAAAAAElFTkSuQmCC\n", "text/plain": [ "<Figure size 1080x540 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "reg = ensemble.RandomForestRegressor(n_estimators=100, max_features='sqrt')\n", "train_x = train.drop('Sales', axis=1)\n", "train_y = train.Sales\n", "reg.fit(train_x,train_y)\n", "test_x = test.drop('Sales', axis=1)\n", "test_y = test.Sales\n", "preds = reg.predict(test_x)\n", "rmse = np.sqrt(metrics.mean_squared_error(test_y, preds))\n", "print('training rmse: {}'.format(rmse))\n", "\n", "_,_ = plt.subplots(figsize=(15, 7.5))\n", "bar_df = pd.DataFrame({'predictor': train_x.columns, 'importance' : reg.feature_importances_})\n", "sns.barplot(x='predictor', y='importance', data=bar_df.sort_values('importance', ascending=False))" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [], "source": [ "rnge = np.linspace(0.1, 1, 10)\n", "\n", "sqrt_rmses = [run_cv(car_df,\n", " 10,\n", " lambda df: df.drop('Sales', axis=1),\n", " lambda df: df.Sales,\n", " lambda x, y: ensemble.RandomForestRegressor(n_estimators=100, max_features=i).fit(x,y),\n", " lambda preds, true: np.sqrt(metrics.mean_squared_error(true, preds))).mean()[0] for i in rnge]" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1132a1940>" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAH9FJREFUeJzt3Xt4VPW97/H3d3K/J5AJZEg0gAIFJGgjWqWKba1QCdZ2u9U+3ba2PW73ttbedGv3aXtOT7u352zbamtr624t7d7War1UwHstFm9Vg3IJdxSFECAJkHvIJJnf+WOGIShIAllZyczn9Tw8M7NmJfmM8uTD77d+ay1zziEiIgIQ8DuAiIiMHCoFERGJUymIiEicSkFEROJUCiIiEqdSEBGROJWCiIjEeVYKZlZuZsvNbL2ZrTOzG46wj5nZT8xsq5mtMbMzvMojIiLHlurh9+4FvuGce93M8oCVZvaMc259v30WAKfG/pwF3BV7FBERH3hWCs65XcCu2PM2M9sATAD6l8IlwO9c9LTqv5lZoZmVxr72iIqLi11FRYVXsUVEEtLKlSubnHPBY+3n5UghzswqgNOBV9711gRgR7/XdbFtRy2FiooKampqhjihiEhiM7N3BrKf5weazSwXeAj4qnOu9Ti/xzVmVmNmNY2NjUMbUERE4jwtBTNLI1oI9zrnHj7CLjuB8n6vy2LbDuOcu9s5V+WcqwoGjzn6ERGR4+Tl6iMDfg1scM796Ci7LQGuiq1COhtoeb/jCSIi4i0vjymcC/wDsNbMVsW2fQs4CcA59wvgceATwFagE7jawzwiInIMXq4+egGwY+zjgOu8yiAiIoOjM5pFRCROpSAiInFJUwpbG9r43tL1hHsjfkcRERmxkqYUduzr4p4Xt7Fis85zEBE5mqQphXNPKaYwO40lq+v9jiIiMmIlTSmkpwZYMLOUZ9bvoTPc63ccEZERKWlKAWBRZYiunj6e3dDgdxQRkREpqUphzsQxlORlsFRTSCIiR5RUpZASMBbOCvHcpkZaunr8jiMiMuIkVSkAVFeWEu6L8NS63X5HEREZcZKuFGaXF3LSmGxNIYmIHEHSlYKZUV1Zyktv7qWpvdvvOCIiI0rSlQJAdWWIvojjibW6SreISH9JWQrTxuczZVyuTmQTEXmXpCwFgOpZIV57ez/1zV1+RxERGTGStxQqQwAsW6PRgojIQUlbChXFOcwqK9AUkohIP0lbChC97EXtzlbeamz3O4qIyIiQ1KWwcFYIM1i6WquQREQgyUthfEEmZ1aMYcnqnURvFy0iktySuhQgOoX0ZmMHG3a1+R1FRMR3SV8KC2aOJyVgOuAsIoJKgbG5Gcw9pZilq+s1hSQiSS/pSwGiU0g7m7t4fXuz31FERHylUgA+PmMc6akBXTlVRJKeSgHIy0zjI1NLeGztLvoimkISkeSlUoiprgzR2NbNK2/t9TuKiIhvVAoxH5lWQk56ilYhiUhSUynEZKWncOH0cTxRu5twb8TvOCIivlAp9LNodoiWrh6e39LodxQREV+oFPqZe0qQwuw0TSGJSNJSKfSTnhpgwczxPLN+D13hPr/jiIgMO5XCu1RXhugM9/Hsxj1+RxERGXYqhXc5a+JYSvIydCKbiCQllcK7pASMi2eVsnxTI60HevyOIyIyrDwrBTO7x8wazKz2KO8XmdkjZrbGzF41s5leZRmsRZUhwr0Rnqrd7XcUEZFh5eVIYTEw/33e/xawyjk3C7gKuMPDLIMyu7yQ8jFZLF2jO7KJSHLxrBSccyuAfe+zy3TgL7F9NwIVZjbOqzyDYWZUzwrx4tYm9rZ3+x1HRGTY+HlMYTXwKQAzmwOcDJT5mOcw1ZUh+iKOxzWFJCJJxM9SuBUoNLNVwPXAG8ARTw4ws2vMrMbMahobh+ds42nj8zi1JJelq7QKSUSSh2+l4Jxrdc5d7ZybTfSYQhB46yj73u2cq3LOVQWDwWHJZ2ZUV4Z49e191Dd3DcvPFBHxm2+lYGaFZpYee/klYIVzrtWvPEeyqDIEwGM64CwiScLLJan3AS8DU82szsy+aGbXmtm1sV0+ANSa2SZgAXCDV1mOV0VxDrPKCnQtJBFJGqlefWPn3JXHeP9lYIpXP3+oVM8K8YPHN7CtqYOJxTl+xxER8ZTOaD6GhZWlALrshYgkBZXCMZQWZDGnYgxLVtfjnO7fLCKJTaUwANWzQ2xtaGfj7ja/o4iIeEqlMACfmDmelIDpgLOIJDyVwgCMzc3g3FOKWaopJBFJcCqFAVpUGaJufxdv7Gj2O4qIiGdUCgP08RnjSE8NaBWSiCQ0lcIA5WemccHUIMvW7KIvoikkEUlMKoVBWFQ5gca2bl55a6/fUUREPKFSGISPTCshJz2FpWs0hSQiiUmlMAhZ6SlcOH0cj6/dTbg34nccEZEhp1IYpOrKEC1dPbywdXju6yAiMpxUCoP04VODFGSlsUQ33xGRBKRSGKT01AALZo7nmfV76Aof8UZxIiKjlkrhOCyqDNER7uMvGxv8jiIiMqRUCsfhrEljCeZlsGT1Tr+jiIgMKZXCcUgJGBefVsryTY20HujxO46IyJBRKRynRbNDhHsjPL1uj99RRESGjErhOJ1eXkhZUZauhSQiCUWlcJzMjOrKEC9sbWJve7ffcUREhoRK4QQsqgzRF3E8Ubvb7ygiIkNCpXACpo3P45SSXN2RTUQShkrhBJgZiypDvPb2Pna1dPkdR0TkhKkUTlB1ZQjn4LE1u/yOIiJywlQKJ2hicQ6nTSjQFJKIJASVwhCorixlTV0Lbzd1+B1FROSEqBSGwMJZIQCdsyAio55KYQiECrOYUzGGJavrcU73bxaR0UulMESqK0vZ0tDOpj1tfkcRETluKoUhsuC0UlICppvviMioplIYIsW5GZwzeSxL12gKSURGL5XCEFpUGWLHvi5W7Wj2O4qIyHFRKQyhi2aOJz0lwNLVOpFNREYnlcIQys9MY97UIMvW1NMX0RSSiIw+KoUhtmh2iIa2bl7ZttfvKCIig+ZZKZjZPWbWYGa1R3m/wMyWmtlqM1tnZld7lWU4fXTaOLLTUzSFJCKjkpcjhcXA/Pd5/zpgvXOuEpgH/NDM0j3MMyyy0lO4cPo4nqjdRbg34nccEZFB8awUnHMrgH3vtwuQZ2YG5Mb27fUqz3CqnhWiubOHF7c2+R1FRGRQ/DymcCfwAaAeWAvc4JxLiH9anzclSEFWmq6cKiKjjp+lcBGwCggBs4E7zSz/SDua2TVmVmNmNY2NjcOZ8bikpwZYMHM8T6/bTVe4z+84IiID5mcpXA087KK2AtuAaUfa0Tl3t3OuyjlXFQwGhzXk8aquDNER7mP5pga/o4iIDJifpbAd+CiAmY0DpgJv+ZhnSJ09aSzFuRm6FpKIjCqpXn1jM7uP6KqiYjOrA74LpAE4534B/B9gsZmtBQz4F+dcwhyZTQkYC2eV8vtXt9N2oIe8zDS/I4mIHJNnpeCcu/IY79cDH/fq548E1ZUhFr/0Nk+v28OnP1jmdxwRkWPSGc0eOuOkQiYUZrF0jaaQRGR0UCl4yMyorgzxwpYm9nWE/Y4jInJMKgWPLaoM0RtxPL5Wl70QkZFPpeCxD5TmMTmYw1KdyCYio8CASsGiPmtm34m9PsnM5ngbLTGYGYsqJ/Dq2/vY3XLA7zgiIu9roCOFnwMfAg6uKGoDfuZJogRUXVmKc7BMB5xFZIQbaCmc5Zy7DjgA4JzbD4z6K5oOl0nBXGZOyNcUkoiMeAMthR4zSyF6ZVPMLAgkxMXrhsuiyhCr61p4Z2+H31FERI5qoKXwE+ARoMTMfgC8APybZ6kS0MWzQgAaLYjIiDagUnDO3QvcBPw7sAv4pHPuj14GSzQTCrM4s6JIl9MWkRFtoKuPJgPbnHM/A2qBC82s0NNkCai6MsTmPe1s2t3mdxQRkSMa6PTRQ0CfmZ0C/BIoB37vWaoE9YnTSkkJGEtW7/Q7iojIEQ20FCLOuV7gU8CdzrkbgVLvYiWm4twMzpk8lkdX1ev+zSIyIg1m9dGVwFXAstg2XQv6OFx9bgV1+7v48Z83+x1FROQ9BloKVxM9ee0HzrltZjYR+C/vYiWuj0wbx5VzyvnFX9/kb2/t9TuOiMhhzDnnd4ZBqaqqcjU1NX7HOCGd4V4u/skLdPf08cQN51GQrUGXiHjLzFY656qOtd9AVx8tNLM3zGyfmbWaWZuZtZ54zOSUnZ7K7ZfPpqGtm//5aC2jrZhFJHENdProduBzwFjnXL5zLs85l+9hroRXWV7I1y6cwtLV9fxplVYjicjIMNBS2AHUOv2Tdkhde/5k5lSM4dt/WseOfZ1+xxERGXAp3AQ8bma3mNnXD/7xMlgySAkYP7q8EgO+dv8qevu0TFVE/DXQUvgB0AlkAnn9/sgJKivK5vuXzqTmnf3c9dybfscRkSSXOsD9Qs65mZ4mSWKXzJ7AXzY2cPuzW5h7ajGnn1TkdyQRSVIDHSk8bmYf9zRJkvveJTMZn5/JV+9fRUd3r99xRCRJHbMUzMyAbwJPmlmXlqR6oyArjR9fPpsd+zr530vX+R1HRJLUMUshtuJovXMu4JzL0pJU78yZOIZ/mjeZB2rqeLJ2l99xRCQJDXT6aKWZnelpEgHgqx+bwqyyAm5+eC27Ww74HUdEksyA79EMvGxmb5rZGjNba2ZrvAyWrNJSAtx++Wy6eyJ844+riER0aoiIDJ+Brj66yNMUcphJwVy+Uz2dWx5eyz0vbuNLH57kdyQRSRIDKgXn3DteB5HDXXFmOcs3NvD/ntzEOZOLmR7SIRwR8d5Ap49kmJkZt356FoXZadzwhzc40NPndyQRSQIqhRFsTE46t11WyZaGdm59YqPfcUQkCagURrjzpgT5wrkTWfzS2yzf1OB3HBFJcCqFUeCm+VOZNj6PG/+4hqb2br/jiEgCUymMAplpKdx+xWxaD/Rw80NrdFMeEfGMSmGUmDY+n5vnT+PPGxq495XtfscRkQTlWSmY2T1m1mBmtUd5/0YzWxX7U2tmfWY2xqs8ieDz51Tw4VOL+f5j69na0O53HBFJQF6OFBYD84/2pnPuP5xzs51zs4FbgL865/Z5mGfUCwSMH15WSVZaCl+9/w3Cvbopj4gMLc9KwTm3AhjoL/krgfu8ypJISvIzufXTs6jd2cqPntnsdxwRSTC+H1Mws2yiI4qH/M4yWlw0YzxXzinnlyve5OU39/odR0QSiO+lAFQDL77f1JGZXWNmNWZW09jYOIzRRq5vL5xOxdgcvvHAKlo6e/yOIyIJYiSUwhUcY+rIOXe3c67KOVcVDAaHKdbIlp2eyu2Xz6ahrZt//dNaLVMVkSHhaymYWQFwPvConzlGq8ryQr524RSWrdnFI2/s9DuOiCQAL5ek3ge8DEw1szoz+6KZXWtm1/bb7VLgaedch1c5Et21509mTsUYvvPoOnbs6/Q7joiMcjbaph2qqqpcTU2N3zFGlLr9nSy443mmjMvj/mvOJjVlJMwKishIYmYrnXNVx9pPvz0SQFlRNt//5ExWvrOfnz/3pt9xRGQUUykkiEtmT+CTs0Pc8ewWXt++3+84IjJKqRQSyPc+OZPx+Zl87f5VtHf3+h1HREYhlUICyc9M48eXz2bHvk6+t3Sd33FEZBRSKSSYORPH8M/zTuGBmjqeWLvL7zgiMsqoFBLQDR87lcqyAm5+eC27Wrr8jiMio4hKIQGlpQT48eWzCfdG+OYfVxOJjK5lxyLiH5VCgpoUzOW71dN5cetefv3CNr/jiMgooVJIYJefWc7Hp4/jP57axLr6Fr/jiMgooFJIYGbGrZ+eRWF2Gjf8YRUHevr8jiQiI5xKIcGNyUnntssq2drQzr8/vsHvOCIywqkUksB5U4J84dyJ/Pbld1i+scHvOCIygqkUksRN86cybXweNz64mqb2br/jiMgIpVJIEplpKdxxxem0HujlXx5co5vyiMgRqRSSyNTxedw8fxrPbmzg3le2+x1HREYglUKS+fw5FZw3Jcj3H1vP1oZ2v+OIyAijUkgygYBx29/NIjs9lS/99jU272nzO5KIjCAqhSRUkp/Jf171Qdq7+7jkzhd5dJXu7ywiUSqFJPXBk8fw2FfmMiOUzw1/WMX/WrKOcG/E71gi4jOVQhIbl5/JfdeczRfOncjil97mirtf1lVVRZKcSiHJpaUE+E71dO78zOls3N3Gwp+8wEtbm/yOJSI+USkIAAtnhVjy5XMpzE7js79+hbuee1PnMogkIZWCxJ1SksejX57LgtNK+b9PbuSa/1pJ64Eev2OJyDBSKchhcjNSufPK0/n2wuks39jAop++wIZdrX7HEpFholKQ9zAzvjh3Ivddczad4T4u/fmLPPx6nd+xRGQYqBTkqM6sGMOyr8ylsqyQrz+wmn99ZC3dvbong0giUynI+yrJy+TeL53FP543iXtf2c7f//Jv7GzWslWRRKVSkGNKTQlwyyc+wC8+ewZvNrSz8CfP8/yWRr9jiYgHVAoyYPNnlrLky+dSkpfJVfe8yp1/2UIkomWrIolEpSCDMimYyyPXncOiyhC3Pb2Z//G7Glo6tWxVJFGoFGTQstNTuf3y2Xzvkhms2NLIwjufp3Zni9+xRGQIqBTkuJgZV32ogvv/8UP09jk+fddLPFCzw+9YInKCVApyQs44qYhl18+lqqKImx5cwy0Pr+FAj5atioxWKgU5YWNzM/jdF87iugsmc9+rO7jsFy+zY1+n37FE5DioFGRIpASMGy+axn9eVcXbeztY+NMXWL6pwe9YIjJInpWCmd1jZg1mVvs++8wzs1Vmts7M/upVFhk+F04fx7Lr5xIqzOILi1/jx89s1rJVkVHEy5HCYmD+0d40s0Lg58Ai59wM4DIPs8gwOnlsDg//0zl86vQy7nh2C1cvfo39HWG/Y4nIAHhWCs65FcC+99nlM8DDzrntsf0115BAstJTuO2yWfzbpafx8pt7WfjTF1hT1+x3LBE5Bj+PKUwBiszsOTNbaWZXHW1HM7vGzGrMrKaxUZdXGC3MjM+cdRJ/vPZDAPzdXS/z+1e26+Y9IiOYn6WQCnwQuBi4CPi2mU050o7Oubudc1XOuapgMDicGWUIVJYXsuz6uZw9eSzfemQtNz6oZasiI5WfpVAHPOWc63DONQErgEof84iHinLS+c3nz+QrHz2VB1fWcenPX+KdvR1+xxKRd/GzFB4F5ppZqpllA2cBG3zMIx5LCRhfv3AKv/n8mdQ3d7Hwpy/w5/V7/I4lIv14uST1PuBlYKqZ1ZnZF83sWjO7FsA5twF4ElgDvAr8yjl31OWrkjgumFbCsuvncvLYbL70uxr++d6VPLSyjsa2br+jiSQ9G20H/aqqqlxNTY3fMWQIHOjp47anNvGnVfU0tUcLYVZZAfOmBJk3rYTKskJSAuZzSpHEYGYrnXNVx9xPpSB+i0Qc63e18tymBpZvauSN7fuJOCjKTuO8KUHmTQ1y3qlBxuZm+B1VZNRSKcio1dwZZsWWJp7b1MBfNzWytyOMGcwqK+SCqUHmTS1h1oQCAhpFiAyYSkESQiTiWLuzhec2NbJ8UwOr65pxDsbmpHP+lCDnx0YRRTnpfkcVGdFUCpKQ9nWEWbG5MTqK2NzI/s4eAgazywu5YGoJ86aWMCOUr1GEyLuoFCTh9UUca+qaWb4pWhJr6qJ3fyvOzeD8KUEumBbkw6cEKchO8zmpiP9UCpJ0Gtu6o6OIzY2s2NxIS1cPKQHjjJMKmTe1hHlTg0wvzcdMowhJPioFSWq9fRFW1zWzfGMjz21uoHZnKwAleRnMix2snntqMfmZGkVIclApiPTT0HaAv25q5LlNjazY0kjbgV5SA8YZJxfFjkUEmTouT8ciJGGpFESOorcvwuvbm+PnRWzYFR1F5GakMr00n+mhfGZOKGBGKJ9TSnJJS9ENCmX0UymIDNDulgOs2NLI2roW1tW3sGFXG12xq7impwaYNj6PGaF8pocKmBnKZ9r4fLLSU3xOLTI4KgWR49QXcWxramddfSvr6lup3dnCuvpWWrp6AAgYTA7mxkcT00P5zAgVUJCl4xMycqkURIaQc46dzV3U7mxlfX20JGrrW9jTeugifuVjsphRWsDMCdGSmBHKpyQ/08fUIocMtBRShyOMyGhnZpQVZVNWlM38mePj25vau+OjifX1rayrb+HJdbvj7wfzMpgRymdGKJ+ZoQJmhAooH5OlZbEyYqkURE7AwRPlzp9y6I6ArQd62HBw6qk+WhbPb2miLxIdledlpsaKIjqamDmhgEnFOaTqgLaMACoFkSGWn5nGWZPGctaksfFtB3r62LS7LXacooXa+lb++2/v0N0bASAjNcC00nyml+YzPj+TsbnpFOdmUJybztjYY25GqkYY4jmVgsgwyExLobK8kMrywvi23r4IbzV1REtiZ2zqqXYX+zt7jvg90lMDFOccKomxuRmMzU0nGHscm3PodVFOupbSynFRKYj4JDUlwJRxeUwZl8elpx/aHu6NsL8zTGNbN3s7wuxt76apvZu97WGa2sPs7eimsb2bjbvb2NseJtwXOeL3L8xOozg3g7E50VHHwdHHwQIp7vdaoxA5SKUgMsKkpwYYl5/JuAGsXHLO0dbdS1O/Amlsjz7ujRVIU1uYDbtb2dseji+rfbeM1EC/wkgnmJdBWVE25WOyoo9F2ZTkZeiM7ySgUhAZxcyM/Mw08jPTmBQ89v7h3gj7OsLRkUdHOFYmh0YhTe3RUUhtfet77pmdnhJgQlEWZUVZlI/Jjj4WZcdfj81J12gjAagURJJIemqA8QWZjC849ijkQE8fO5u72LGvkx37u6jb30ndvi527O9kXe1u9nWED9s/Ky0lXhDlRVnvGWnoEuajg0pBRI4oMy2FycFcJgdzj/h+e3fvYUVRt/9Qgby2bR9t3b2H7Z+XmXrYyOJQcUS35WTo19FIoP8LInJccjNSmTY+ei2oI2np7ImVRSc7+hXHtqYOnt/SFL++1EFjctLjRVE2Jjo1VVqQSWF2OkXZaRRlp5OflUaKjmt4SqUgIp4oyE6jILuAmRMK3vOec469HWF27IuNMGLFUbe/k/W7Wnlm/Z4jrqoyi54HUpSddlhZFMQeD21PpzA7jcLY9uz0FB3vGCCVgogMOzOLnZyXweknFb3n/UjE0dDWze7WA+zvDNPcGaa5s4f9nT00d4bjj03tYbY0tNPc2UP7u6ar+ktPCcRLon+ZFMaL5FCZHCyWwuy0EzrXwzlHxEUvsBhxjr6Io885IpH+zznCtuhjb9+hr4s+QmlBJuVjso8700CoFERkxAkEbMAHxA8K90Zo7oqWR7RAwvEC2d8ZpiX2uL+zh21NHbze2UxzZ5ievqNfFDQ3I5XC7DTSUwPxX9aRCO/5Bd4XcYe/H9s21K49fzI3L5g25N+3P5WCiCSE9NQAJXmZlOQNvEicc3SG+2IF0hN/7F8mzZ09hPsipJiREjACZqQE6Pf80OPB56kBIxCw2NfQ73m/r+n//vt+/aHnZUVZHv4XjFIpiEjSMjNyMlLJyUil7L2zWElJF0cREZE4lYKIiMSpFEREJE6lICIicSoFERGJUymIiEicSkFEROJUCiIiEmfODf2p2F4ys0bgnUF8STHQ5FGckSwZP3cyfmZIzs+djJ8ZTuxzn+ycO+atmEZdKQyWmdU456r8zjHckvFzJ+NnhuT83Mn4mWF4Premj0REJE6lICIicclQCnf7HcAnyfi5k/EzQ3J+7mT8zDAMnzvhjymIiMjAJcNIQUREBiihS8HM5pvZJjPbamY3+53Ha2ZWbmbLzWy9ma0zsxv8zjSczCzFzN4ws2V+ZxkOZlZoZg+a2UYz22BmH/I703Aws6/F/n7Xmtl9Zjbwu+qMImZ2j5k1mFltv21jzOwZM9sSexzyu0AkbCmYWQrwM2ABMB240sym+5vKc73AN5xz04GzgeuS4DP3dwOwwe8Qw+gO4Enn3DSgkiT47GY2AfgKUOWcmwmkAFf4m8ozi4H579p2M/Csc+5U4NnY6yGVsKUAzAG2Oufecs6FgT8Al/icyVPOuV3Ouddjz9uI/pKY4G+q4WFmZcDFwK/8zjIczKwAOA/4NYBzLuyca/Y31bBJBbLMLBXIBup9zuMJ59wKYN+7Nl8C/Db2/LfAJ4f65yZyKUwAdvR7XUeS/IIEMLMK4HTgFX+TDJvbgZuAiN9BhslEoBH4TWzK7FdmluN3KK8553YCtwHbgV1Ai3PuaX9TDatxzrldsee7gXFD/QMSuRSSlpnlAg8BX3XOtfqdx2tmthBocM6t9DvLMEoFzgDucs6dDnTgwVTCSBObQ7+EaCmGgBwz+6y/qfzhoktHh3z5aCKXwk6gvN/rsti2hGZmaUQL4V7n3MN+5xkm5wKLzOxtotOEHzGz//Y3kufqgDrn3MGR4INESyLRfQzY5pxrdM71AA8D5/icaTjtMbNSgNhjw1D/gEQuhdeAU81sopmlEz0YtcTnTJ4yMyM6x7zBOfcjv/MMF+fcLc65MudcBdH/z39xziX0vx6dc7uBHWY2Nbbpo8B6HyMNl+3A2WaWHfv7/lGS4AB7P0uAz8Wefw54dKh/QOpQf8ORwjnXa2ZfBp4iukLhHufcOp9jee1c4B+AtWa2KrbtW865x33MJN65Hrg39o+et4Crfc7jOefcK2b2IPA60dV2b5CgZzeb2X3APKDYzOqA7wK3Ag+Y2ReJXi3674f85+qMZhEROSiRp49ERGSQVAoiIhKnUhARkTiVgoiIxKkUREQkTqUgIiJxKgUREYlTKYicADOriN3PYLGZbTaze83sY2b2Yuya93P8zigyGCoFkRN3CvBDYFrsz2eAucA3gW/5mEtk0FQKIidum3NurXMuAqwjehMUB6wFKnxNJjJIKgWRE9fd73mk3+sICXx9MUlMKgUREYlTKYiISJyukioiInEaKYiISJxKQURE4lQKIiISp1IQEZE4lYKIiMSpFEREJE6lICIicSoFERGJ+/9xdbaZMUclswAAAABJRU5ErkJggg==\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "line_df = pd.DataFrame({'m' : rnge * car_df.shape[1] - 1, 'rmse' : sqrt_rmses})\n", "sns.lineplot(x='m', y='rmse', data=line_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. This problem involves the OJ data set which is part of the ISLR package." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Purchase</th>\n", " <th>WeekofPurchase</th>\n", " <th>StoreID</th>\n", " <th>PriceCH</th>\n", " <th>PriceMM</th>\n", " <th>DiscCH</th>\n", " <th>DiscMM</th>\n", " <th>SpecialCH</th>\n", " <th>SpecialMM</th>\n", " <th>LoyalCH</th>\n", " <th>SalePriceMM</th>\n", " <th>SalePriceCH</th>\n", " <th>PriceDiff</th>\n", " <th>Store7</th>\n", " <th>PctDiscMM</th>\n", " <th>PctDiscCH</th>\n", " <th>ListPriceDiff</th>\n", " <th>STORE</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " <td>237</td>\n", " <td>1</td>\n", " <td>1.75</td>\n", " <td>1.99</td>\n", " <td>0.00</td>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0.500000</td>\n", " <td>1.99</td>\n", " <td>1.75</td>\n", " <td>0.24</td>\n", " <td>0</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.24</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>1</td>\n", " <td>239</td>\n", " <td>1</td>\n", " <td>1.75</td>\n", " <td>1.99</td>\n", " <td>0.00</td>\n", " <td>0.3</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0.600000</td>\n", " <td>1.69</td>\n", " <td>1.75</td>\n", " <td>-0.06</td>\n", " <td>0</td>\n", " <td>0.150754</td>\n", " <td>0.000000</td>\n", " <td>0.24</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>1</td>\n", " <td>245</td>\n", " <td>1</td>\n", " <td>1.86</td>\n", " <td>2.09</td>\n", " <td>0.17</td>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0.680000</td>\n", " <td>2.09</td>\n", " <td>1.69</td>\n", " <td>0.40</td>\n", " <td>0</td>\n", " <td>0.000000</td>\n", " <td>0.091398</td>\n", " <td>0.23</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>0</td>\n", " <td>227</td>\n", " <td>1</td>\n", " <td>1.69</td>\n", " <td>1.69</td>\n", " <td>0.00</td>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0.400000</td>\n", " <td>1.69</td>\n", " <td>1.69</td>\n", " <td>0.00</td>\n", " <td>0</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.00</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>1</td>\n", " <td>228</td>\n", " <td>7</td>\n", " <td>1.69</td>\n", " <td>1.69</td>\n", " <td>0.00</td>\n", " <td>0.0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0.956535</td>\n", " <td>1.69</td>\n", " <td>1.69</td>\n", " <td>0.00</td>\n", " <td>1</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.00</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Purchase WeekofPurchase StoreID PriceCH PriceMM DiscCH DiscMM \\\n", "0 1 237 1 1.75 1.99 0.00 0.0 \n", "1 1 239 1 1.75 1.99 0.00 0.3 \n", "2 1 245 1 1.86 2.09 0.17 0.0 \n", "3 0 227 1 1.69 1.69 0.00 0.0 \n", "4 1 228 7 1.69 1.69 0.00 0.0 \n", "\n", " SpecialCH SpecialMM LoyalCH SalePriceMM SalePriceCH PriceDiff \\\n", "0 0 0 0.500000 1.99 1.75 0.24 \n", "1 0 1 0.600000 1.69 1.75 -0.06 \n", "2 0 0 0.680000 2.09 1.69 0.40 \n", "3 0 0 0.400000 1.69 1.69 0.00 \n", "4 0 0 0.956535 1.69 1.69 0.00 \n", "\n", " Store7 PctDiscMM PctDiscCH ListPriceDiff STORE \n", "0 0 0.000000 0.000000 0.24 1 \n", "1 0 0.150754 0.000000 0.24 1 \n", "2 0 0.000000 0.091398 0.23 1 \n", "3 0 0.000000 0.000000 0.00 1 \n", "4 1 0.000000 0.000000 0.00 0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "oj_df = pd.read_csv('oj.csv')\n", "oj_df = oj_df.drop(oj_df.columns[0], axis=1)\n", "oj_df.Purchase = oj_df.Purchase.map({'CH' : 1, 'MM': 0})\n", "oj_df.Store7 = oj_df.Store7.map({'Yes' : 1, 'No': 0})\n", "oj_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations." ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [], "source": [ "train, test = model_selection.train_test_split(oj_df, test_size=oj_df.shape[0] - 800)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(b) Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?" ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "train error: 0.16249999999999998\n" ] } ], "source": [ "reg = tree.DecisionTreeClassifier(max_depth=3) \n", "train_x = train.drop('Purchase', axis=1)\n", "train_y = train.Purchase\n", "reg.fit(train_x,train_y)\n", "\n", "# train error\n", "preds = reg.predict(train_x)\n", "train_error = 1 - ((train_y == preds).sum() / train_y.shape[0])\n", "print('train error: {}'.format(train_error))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(c) Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed. (d) Create a plot of the tree, and interpret the results." ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n", " -->\n", "<!-- Title: Tree Pages: 1 -->\n", "<svg width=\"515pt\" height=\"534pt\"\n", " viewBox=\"0.00 0.00 514.87 534.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 530)\">\n", "<title>Tree</title>\n", "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-530 510.8721,-530 510.8721,4 -4,4\"/>\n", "<!-- 0 -->\n", "<g id=\"node1\" class=\"node\">\n", "<title>0</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"96.9985,-322 .0005,-322 .0005,-258 96.9985,-258 96.9985,-322\"/>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-306.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[8] <= 0.479</text>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-292.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.24</text>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-278.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 800</text>\n", "<text text-anchor=\"middle\" x=\"48.4995\" y=\"-264.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.6</text>\n", "</g>\n", "<!-- 1 -->\n", "<g id=\"node2\" class=\"node\">\n", "<title>1</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"229.9976,-356 132.9995,-356 132.9995,-292 229.9976,-292 229.9976,-356\"/>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-340.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[8] <= 0.345</text>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-326.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.171</text>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-312.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 288</text>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-298.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.219</text>\n", "</g>\n", "<!-- 0->1 -->\n", "<g id=\"edge1\" class=\"edge\">\n", "<title>0->1</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M97.2682,-302.4673C105.5808,-304.5923 114.2921,-306.8193 122.8288,-309.0016\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"122.1804,-312.4483 132.7357,-311.5342 123.9142,-305.6664 122.1804,-312.4483\"/>\n", "<text text-anchor=\"middle\" x=\"111.2305\" y=\"-320.0828\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">True</text>\n", "</g>\n", "<!-- 8 -->\n", "<g id=\"node9\" class=\"node\">\n", "<title>8</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"229.9976,-234 132.9995,-234 132.9995,-170 229.9976,-170 229.9976,-234\"/>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-218.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[8] <= 0.765</text>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-204.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.151</text>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-190.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 512</text>\n", "<text text-anchor=\"middle\" x=\"181.4985\" y=\"-176.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.814</text>\n", "</g>\n", "<!-- 0->8 -->\n", "<g id=\"edge8\" class=\"edge\">\n", "<title>0->8</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M96.8914,-257.981C105.8761,-252.0362 115.3382,-245.7756 124.5505,-239.6802\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"126.5113,-242.5796 132.9198,-234.1426 122.6487,-236.7418 126.5113,-242.5796\"/>\n", "<text text-anchor=\"middle\" x=\"108.4224\" y=\"-224.9545\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">False</text>\n", "</g>\n", "<!-- 2 -->\n", "<g id=\"node3\" class=\"node\">\n", "<title>2</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"368.4346,-465 271.4365,-465 271.4365,-401 368.4346,-401 368.4346,-465\"/>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-449.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[8] <= 0.035</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-435.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.12</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-421.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 200</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-407.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.14</text>\n", "</g>\n", "<!-- 1->2 -->\n", "<g id=\"edge2\" class=\"edge\">\n", "<title>1->2</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M222.2854,-356.1141C237.4991,-368.0927 254.9306,-381.8176 270.8157,-394.3249\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"268.9397,-397.3025 278.9618,-400.7388 273.2701,-391.8027 268.9397,-397.3025\"/>\n", "</g>\n", "<!-- 5 -->\n", "<g id=\"node6\" class=\"node\">\n", "<title>5</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"366.1489,-356 273.7222,-356 273.7222,-292 366.1489,-292 366.1489,-356\"/>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-340.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[9] <= 2.04</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-326.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.24</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-312.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 88</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-298.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.398</text>\n", "</g>\n", "<!-- 1->5 -->\n", "<g id=\"edge5\" class=\"edge\">\n", "<title>1->5</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M230.3062,-324C240.9248,-324 252.2472,-324 263.1447,-324\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"263.4258,-327.5001 273.4258,-324 263.4258,-320.5001 263.4258,-327.5001\"/>\n", "</g>\n", "<!-- 3 -->\n", "<g id=\"node4\" class=\"node\">\n", "<title>3</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"504.5859,-526 412.1592,-526 412.1592,-476 504.5859,-476 504.5859,-526\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-510.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.018</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-496.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 56</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-482.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.018</text>\n", "</g>\n", "<!-- 2->3 -->\n", "<g id=\"edge3\" class=\"edge\">\n", "<title>2->3</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M368.7433,-456.9743C379.6903,-462.3514 391.3853,-468.096 402.5916,-473.6005\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"401.3441,-476.8871 411.8628,-478.1545 404.4303,-470.6041 401.3441,-476.8871\"/>\n", "</g>\n", "<!-- 4 -->\n", "<g id=\"node5\" class=\"node\">\n", "<title>4</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"506.8716,-458 409.8735,-458 409.8735,-408 506.8716,-408 506.8716,-458\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-442.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.152</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-428.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 144</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-414.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.188</text>\n", "</g>\n", "<!-- 2->4 -->\n", "<g id=\"edge4\" class=\"edge\">\n", "<title>2->4</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M368.7433,-433C378.6685,-433 389.2087,-433 399.4421,-433\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"399.539,-436.5001 409.539,-433 399.5389,-429.5001 399.539,-436.5001\"/>\n", "</g>\n", "<!-- 6 -->\n", "<g id=\"node7\" class=\"node\">\n", "<title>6</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"504.5859,-390 412.1592,-390 412.1592,-340 504.5859,-340 504.5859,-390\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-374.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.185</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-360.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 53</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-346.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.245</text>\n", "</g>\n", "<!-- 5->6 -->\n", "<g id=\"edge6\" class=\"edge\">\n", "<title>5->6</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M366.4182,-337.7665C377.8324,-341.1469 390.1746,-344.8022 401.9985,-348.3041\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"401.1947,-351.7162 411.777,-351.2001 403.1826,-345.0044 401.1947,-351.7162\"/>\n", "</g>\n", "<!-- 7 -->\n", "<g id=\"node8\" class=\"node\">\n", "<title>7</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"504.5859,-322 412.1592,-322 412.1592,-272 504.5859,-272 504.5859,-322\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-306.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.233</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-292.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 35</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-278.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.629</text>\n", "</g>\n", "<!-- 5->7 -->\n", "<g id=\"edge7\" class=\"edge\">\n", "<title>5->7</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M366.4182,-314.9343C377.7183,-312.7304 389.9278,-310.3491 401.6436,-308.0641\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"402.6319,-311.4374 411.777,-306.0878 401.2919,-304.5668 402.6319,-311.4374\"/>\n", "</g>\n", "<!-- 9 -->\n", "<g id=\"node10\" class=\"node\">\n", "<title>9</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"373.8106,-234 266.0605,-234 266.0605,-170 373.8106,-170 373.8106,-234\"/>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-218.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[11] <= -0.165</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-204.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.219</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-190.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 246</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-176.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.675</text>\n", "</g>\n", "<!-- 8->9 -->\n", "<g id=\"edge9\" class=\"edge\">\n", "<title>8->9</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M230.3062,-202C238.5871,-202 247.2959,-202 255.8974,-202\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"255.9135,-205.5001 265.9135,-202 255.9135,-198.5001 255.9135,-205.5001\"/>\n", "</g>\n", "<!-- 12 -->\n", "<g id=\"node13\" class=\"node\">\n", "<title>12</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"370.3106,-125 269.5605,-125 269.5605,-61 370.3106,-61 370.3106,-125\"/>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-109.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X[11] <= -0.39</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-95.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.053</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-81.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 266</text>\n", "<text text-anchor=\"middle\" x=\"319.9355\" y=\"-67.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.944</text>\n", "</g>\n", "<!-- 8->12 -->\n", "<g id=\"edge12\" class=\"edge\">\n", "<title>8->12</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M222.2854,-169.8859C237.4991,-157.9073 254.9306,-144.1824 270.8157,-131.6751\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"273.2701,-134.1973 278.9618,-125.2612 268.9397,-128.6975 273.2701,-134.1973\"/>\n", "</g>\n", "<!-- 10 -->\n", "<g id=\"node11\" class=\"node\">\n", "<title>10</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"504.5859,-254 412.1592,-254 412.1592,-204 504.5859,-204 504.5859,-254\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-238.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.167</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-224.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 33</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-210.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.212</text>\n", "</g>\n", "<!-- 9->10 -->\n", "<g id=\"edge10\" class=\"edge\">\n", "<title>9->10</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M373.8553,-212.5162C383.1104,-214.3213 392.7496,-216.2012 402.0653,-218.0181\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"401.5283,-221.4793 412.0134,-219.9584 402.8683,-214.6087 401.5283,-221.4793\"/>\n", "</g>\n", "<!-- 11 -->\n", "<g id=\"node12\" class=\"node\">\n", "<title>11</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"506.8716,-186 409.8735,-186 409.8735,-136 506.8716,-136 506.8716,-186\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-170.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.189</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-156.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 213</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-142.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.746</text>\n", "</g>\n", "<!-- 9->11 -->\n", "<g id=\"edge11\" class=\"edge\">\n", "<title>9->11</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M373.8553,-186.0309C382.4603,-183.4824 391.3974,-180.8356 400.0978,-178.2589\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"401.1955,-181.5841 409.7899,-175.3884 399.2077,-174.8723 401.1955,-181.5841\"/>\n", "</g>\n", "<!-- 13 -->\n", "<g id=\"node14\" class=\"node\">\n", "<title>13</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"503.3716,-118 413.3735,-118 413.3735,-68 503.3716,-68 503.3716,-118\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-102.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.21</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-88.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 10</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-74.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.7</text>\n", "</g>\n", "<!-- 12->13 -->\n", "<g id=\"edge13\" class=\"edge\">\n", "<title>12->13</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M370.6983,-93C381.2245,-93 392.3649,-93 403.0452,-93\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"403.109,-96.5001 413.109,-93 403.1089,-89.5001 403.109,-96.5001\"/>\n", "</g>\n", "<!-- 14 -->\n", "<g id=\"node15\" class=\"node\">\n", "<title>14</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"506.8716,-50 409.8735,-50 409.8735,0 506.8716,0 506.8716,-50\"/>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-34.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">mse = 0.045</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-20.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">samples = 256</text>\n", "<text text-anchor=\"middle\" x=\"458.3726\" y=\"-6.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">value = 0.953</text>\n", "</g>\n", "<!-- 12->14 -->\n", "<g id=\"edge14\" class=\"edge\">\n", "<title>12->14</title>\n", "<path fill=\"none\" stroke=\"#000000\" d=\"M370.6983,-68.0654C380.2888,-63.3546 390.3892,-58.3933 400.187,-53.5807\"/>\n", "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"402.0288,-56.5754 409.4614,-49.0251 398.9426,-50.2925 402.0288,-56.5754\"/>\n", "</g>\n", "</g>\n", "</svg>\n" ], "text/plain": [ "<graphviz.files.Source at 0x1127460b8>" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot_dat = tree.export_graphviz(reg, out_file=None, rotate=True)\n", "graph = gv.Source(dot_dat)\n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(e) Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?" ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>0</th>\n", " <th>1</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>75</td>\n", " <td>22</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>27</td>\n", " <td>146</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " 0 1\n", "0 75 22\n", "1 27 146" ] }, "execution_count": 163, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_x = test.drop('Purchase', axis=1)\n", "test_y = test.Purchase\n", "preds = reg.predict(test_x)\n", "conf = pd.DataFrame(metrics.confusion_matrix(test_y, preds))\n", "conf" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test error rate: 0.18148148148148147\n" ] } ], "source": [ "print('test error rate: {}'.format( 1 - ((conf[0][0] + conf[1][1]) / test_y.shape[0])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(f) Apply the cv.tree() function to the training set in order to determine the optimal tree size.\n", "\n", ".. not easily available in python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(g) Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis. (h) Which tree size corresponds to the lowest cross-validated classification error rate?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "rnge = range(1, 50)\n", "\n", "errors = [run_cv(oj_df,\n", " 5,\n", " lambda df: df.drop('Purchase', axis=1),\n", " lambda df: df.Purchase,\n", " lambda x, y: tree.DecisionTreeClassifier(max_depth=i) .fit(x,y),\n", " lambda preds, true: 1 - ((preds == true).sum() / preds.shape[0])).mean()[0] for i in rnge]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1162366d8>" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAELCAYAAAAoUKpTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8m9WV8PHfkbzvdux4iZeskJjESUiAsBQKTWloEygtbaGkU/oyL3Shw5TpdOjbKZ3SaWdKWzplyszQUroBpcB0oZQ9hDUkJEDiJM5mJ7Hjfd9XSff9Q5Kj2LIl23osyz7fz8cfrEfPY18FWee599x7rhhjUEoppcZjC3cDlFJKzXwaLJRSSgWkwUIppVRAGiyUUkoFpMFCKaVUQBoslFJKBaTBQimlVEAaLJRSSgWkwUIppVRAUeFuQKhkZmaahQsXhrsZSikVUd55551mY0xWoPNmTbBYuHAhe/bsCXczlFIqoohIZTDn6TCUUkqpgDRYKKWUCkiDhVJKqYA0WCillApIg4VSSqmANFgopZQKSIOFUkqpgDRYqIjSP+Rk+5FGdDtgpaaXBgsVMXoHHXzul7v53C9383RpXbibo9ScosFCRYTuAQc3PbSbXSdaSEuI5uGdQS06VUqFiAYLNeN19A3xmV/s4p2qNu67YS2fv2wJu060cqyhK9xNUxaobuulqWsg3M1QI2iwUDNae+8gWx/cxYGaDu7/9LlsLsnjE+vyibHbtHcxS93623e4688Hwt0MNYIGCzVjtXQPcMPPd3Gkvov/2bqOTStzAJiXFMuHV+Xwh3dr6BlwhLmVKpQcThdHG7qoau0Nd1PUCLOm6qwKv67+IQ7X+x8aSoqNYkVuStA/q6lrgBsf3EllSy8PfnY9l551ZgXlrRuK+NPeWp7aV8sN5xdOqd1q5qhq7WXIaWjojJxhKIfTRX1nP/npCeFuiqU0WKiQuevPB/njezVjPv/c37+P5TnBBYwfPH+YypZefnnTeVy0NHPU8+uK0lmek8zDOyu5/rwCRGTS7VYzR3ljNwAtPQMMOV1E22f+4Mejb1fxnafLeOmOyyialxju5lhGg4UKmd0nW7l46Ty+cNnSM4639Axw+2N72XeqPehgse9UB5cszfQbKABEhBs3FPHNPx1g76l21hamT7n9KvwqmnoAMMbdu8xLiw9ziwJ75UgTQ07Do7uq+PqHV4S7OZaZ+WFbhYUxhq7+oaDPb+0ZpLqtj0uXZXHJsswzvraU5JEYY6estjOon9U/5KS8qTvgsNW1axeQGGPn4Z1VQbdTzWzengVAQ2d/GFsSHIfTxdsnWgF4fM8p+oecYW6RdTRYqDMYY3iprIGP/tcOzv3Oi1S1BJdoLK1uB6AkP23UczabsDw3hbK64ILFsYZunC5Dcd74wSIpNoqPrl3A06W1tPUMBvWz1cxW0dRNWkI0QETkLfbXdNA94ODTFxTS1jvEM/tn72JRDRYKAJfL8Oz+Oj5y3xv87W/2UN/Rx5DTsKOiOajr91d3IAIrF/j/gC/OTeFQXRcuV+AyHWV1HcPXBLJ1QxEDDhdPvlMdVDvVzGWMoaKpmwsXzwMio2exo6IFgK9sPIvFmYmzejq3Bos5zuky/HlvDR/6j9f4wiPv0j/k5IefWM0b/3QFGYkxvFPZFtTP2VfdweLMRJLjov0+X5yXQveAg+q2voA/61BdF4kxdgozAs8uWZGbwrqidB7ZVRlUIJrNnBH++pu6Bujqd3D+ogzsNomIYLHzeAvLc5LJSo7l0xcU8m5Ve9DDrZFGg8UcNeR08cSeU2y891Vuf2wvInDfDWt58Y7LuG5dPtF2G+cWpgcdLEqr2/0OQXl58w/eXsN4ymo7WZGbgs0W3AynrRsKOdnSy5tB9oKm4icvHePKH786owKTw+nijt/v5QM/eiWiCyyWN7nzFcvmJzM/OXbGD0MNOJzsPtnKhUvcPaHr1uUTG2Xj4V2zs3ehwWKOGXA4eWRXJZf/8BX+8clSEmLs/M/Wc3nu9ku5enUedp8P6HVF6Rxv7qE1QD6gobOfxq4BSvJTxzzn7OxkbELAuy6Xy1BW1zmhNRlXrcwlfZrqRf3xvWqONnSzJ8ggarVBh4sv/+49/vBeDSdbemnvDX5SwkxT4UluL52fxPyUOBq7ZnbPYm9VO/1DruFhs7SEGLaszuNP79VMaHJIpNBgMUf0DTr55ZsnuOyeV/jGHw+QmRTLQzet5+kvX8Kmlbl+7+LXL3RPR303wAfjvlNjJ7e94mPsLM5KCpjkrm7ro3vAETC57Ssu2s4n1xfw0qFG6jus+4A53tTNSU/C/+nSWst+T7AGHE6++Mg7PHugniuWzwcIaphvpqpo6iEpNorslFhyUmIt/X8ZCjsqWrAJXOAJFuDOofUOOvnTOOuNIpUGiznA6TJc/dM3+PZfyijMSOC3N5/PH794EVcszx53MduqBalE2yXgXfT+mg7sNgmYkPYmucczkeS2r09fUIjLGH73tnXTaF8+3AjAmoI0ntlfh8Ppsux3BdI/5OSW37zDS4ca+c5HV/IPV54FwKm24MtkfOnRd7n3xaNWNXHCKpq6WZKViIiQnRI343MWbx1vYeWCVFLjT+fpVuensmpBKr/dWRnRQ4L+WBosRGSTiBwRkXIRudPP83eISJmIlIrINhEpGvF8iohUi8hPrWznbFfb3sexxm7+8UNn8/jnL+R9y7KCWvEcF23nnLzUwD2L6g7Oyk4mPsY+7nnFeSnUtPfR3jv2sFZZXRc2gbNzkgO2z1fRvEQuXZbFY7urGLLoQ3z7kUaWzU/i1ksX09w9yM7jrZb8nkB6Bx3c/OvdvHasie9/fBWf2VBEgWcywKkgayoZY9h2qIFfvH58xgyZlDd2syQrCYDslDg6+x30Dc7MdQt9g07eq2obHoLyEhG2bijkaEM3u0/OjKHKULEsWIiIHbgfuAooBm4QkeIRp70HrDfGlABPAveMeP47wGtWtXGu8BZlW1sw9jDRWNYXpbOvup1Bh/8PYGMM+6vbKVkwdr7C63SSe+yhqLLaThZnJREXPX7g8WfrhiIaOgd4qaxhwtcG0j3g4O0TrVyxfD6XL59PYow9LENR3QMObvrlbt6qaOFHn1jNp85z18VKiYsmNT466J5FY9cA/UMuembIkEn3gIO6jn6WzD8dLIAZm7fYU9nKkNMMJ7d9bVmdR3Jc1KybRmtluY/zgXJjzHEAEXkMuAYo855gjNnuc/5OYKv3gYisA7KB54D1FrZz1qv0jLMXzpt4obN1Rek8+MYJDtZ2+C2pUd3WR1vvECUFgYOFd2iprLaTi5b4L+NxqK6TdUWTK91xxfL55KXG8fCuSq5alTupnzGWN465Szpcvnw+cdF2PliczbMH6rn7mpXERIX2nqu+o5/P/GIXvX7uqnsGHXT1O/jJ9WvZsjrvjOcKMuI51RpczsL7noiJsvHwziq2bigKa32t456ZUKd7FrGA+99iKvWW/uWpg2QkxvB3H1g29Ub6eKuihSibcN7CjFHPJcRE8fFz83lkVyXN3cVkJsWe8fzuk6088GoFZ2Un87VNy0PaLitZOQy1ADjl87jac2wsNwPPAoiIDfgR8NXxfoGI3CIie0RkT1NT0xSbO3tVtvYQbRdyUydeZ8f7wT3WFNp93pXbCwL3WrKSY8lKjh0zb9HeO0hNe9+Ektu+7DbhhvMLebO8ZfjDJ1RePtxIclzU8L/HltV5dPQN8WZ56KfrvnSogWON3awrSufCJfPO+Nq4IpuHbjpvVKAAKEhPCLpnUdnirsH0fy5exJGGrrDP7qpo8s6EcgcGb8+iYQqbIO091c6vdpzk568dD3kZjh0VLawpSCMx1v/99tYNhQw5DY/vcX8EGmPYUd7M9T97i0/8z1tsO9zIL944MWOGAIMxIwoJishW3L2HyzyHvgg8Y4ypHu9uxxjzM+BnAOvXr59d2aQQqmrppSA94YxpscGanxJHQUY871S28bfvG/38/uoOYuy2oHMMxeOU/fAen2hy29enzi/gJ9uO8ciuKr65eeSo5+S4XIbtR5q49Kys4Sqo71uWRUpcFH/ZV8vlnplIofJWRQu5qXH85Po1E7rbL8hIYNvhRlwuE3CNSlVrLzaBL7x/CY/squThnZV+75KnS3ljN1E2Ge5FZCd7hqGmkOS+98WjRNmErgEHrx5t4kPn5ISkrV39Q+yv6eCL718y5jlL5yezYXEGj+6qYkVOCv/58jHerWpnfnIs39xczNnZyWz9xS5eOtTAtWvzQ9Iuq1nZs6gBCnwe53uOnUFENgLfAK42xnhvIy4EbhORk8APgb8RkX+3sK2zWmVL76SGoLzWFaazp7LN7+yOfdXtrMhNDnooZkVuCuWNXX5zIN4ex0TWWIw0PzmOD52Tw5PvVIfsbvJgbSdNXQNccfbpoBATZWPTyhxeKGsI6V2ry2V463gLFy6ZN+Fhofz0eAYdLpq7A9+NV7b0kpcWT2p8NB8/N59n99fTEsR1Vqlo7KFwXsJwME6JjyIu2jbpGVG7T7by2tEm7rjyLDISY/jLvtDll3afbMXp8p+v8LV1QxHVbX187le7aegc4DsfXclrX7ucmy9ZxEVL5pGXGsfT+yKnlpSVwWI3sExEFolIDHA98JTvCSKyFngAd6Bo9B43xtxojCk0xizEPRT1G2PMqNlUKjBjDFWtvRQFUTpjLOsWZtDUNTBqDr/LZThQ0znu+oqRivNSGHIajjWOHooqq+0cHqqaihs3FNLRNxSyD4hthxsQgfeffeYGTFtW59E94OCVI6EbAj3a2EVrz+CoWTbBKPBsvhPMUFRlay9FnhuIGy8oZNDp4vE94auvVdHUzVJPvgIYnj5bP8lV3D964QhZybF87qJFXLUyh22HGukdDM2uijvKW4iJclc4GM+HzsnhsxcW8f2Pr2L7V9/PZzYUDU/csNmEzavzeO1Y07izA2cSy4KFMcYB3AY8DxwCHjfGHBSRu0Xkas9pPwCSgCdEZK+IPDXGj1OT1NozSPeAg8IpJAnXFfrPWxxv7qF7wDHuyu2RvENM/vIWZXWdUxqC8rpw8TyWZCXy8K7QrLnYfriRNQVpzBuRqLxw8TzmJcbwlxDOitpR7i5MF+iu1Z+CDHdOKpgkd1VLD4UZ7vfEsuxkLliUwaNvh6e+lsPp4mRLz/BMKK/s5MmttdhR3szO46188f1LiI+xs7kkj74hJ9sONQa+OJifX9HCusL0gDP2ou02vn3NSj51XqHfnvfmklyGnIbnD9aHpF1Ws3SdhTHmGWPMWcaYJcaY73qO3WWMecrz/UZjTLYxZo3n62o/P+NXxpjbrGznbFbpmTY7lZ7F2TnJJMVGjQoW45UlH8uizETiom2jyn4MOlyUN3ZNOrntyz3XvYh9p9o5UBO4FtV4mroG2FfdccYQlFeU3cZVq3J4OZR3rRUtFM1LmNQWnd5rAq216Owfoq13aLhnAe4hk1Otfbx6bPonini3Ul2SNSJYpMZNOGdhjOFHLx4lNzVueLvd8xdlMD85NiRTndt6BjlU38lFkwjmI61akErRvASeLo2MoShdwT3LefejKJpCzsJuE9YWpo2aMVNa3UF8tJ2lI+4IA/2s5TkpowoKHmvsYshpQtKzAPjYufnER9unPNf9lSPuu9GxkthbPHetL4XgrtXpMuw60TLpD6K4aDtZybEBh6GG3xM+NxAfOieHzKRYHgnD2oByn5pQvrKTY6nv7J/QSuhXjjbxTmUbt12xdPjO324TPlKSy/YjTXROcfbRrhMtGAMXLZ16sBARtpTk8WZ5c1B5pnDTYDHLeefTF0yhZwFwbmE6R+o7z5jqV1rdzsoFKROeZbUiN4Wy2s4zPgRCkdz2lRofzdWr8/jz3topfUBsP9JIdkos54zR4zlvYQbZKbEhyY8crO2gq9/BhknkK7wK0gOvtfC37iYmysb15xXw8uFGatqnt76UdyvVxVlnDpVmp8TRP+Sisz+4Xpsxhh+/eJT89Hg+sa7gjOc2l+Qx6HDx4sGpLdjcUdFCQox9Qr3p8WxenYvLwLMHZv5QlAaLWa6ytYeclLhJrYj2tX5hOi7j3hsb3OPMB2snltz2Ks5LobPfQa1Pobiy2k7iom0sygzdhvdbNxTRN+TkD5PcGGnI6eL1o81cfvb8MWcm2WzCR1bl8WoI7lq9G+lMJl/hVZCRQHX7+D2Lylb3h/PIxW43XOAetvndBHM9xhgee7uK1yc5hFXR1E12SiwpI/ZCme9ZmBfsUNSLZQ2UVndw+weWjcoRnFuYxoK0+CkPRe2oaOG8hRnDs7am6uzsZJbNTwrpbC2raLCY5aqmOG3Wa01BGiLuMgcARxu6GXC4JpTc9vJdye1VVtfB8pyJ91LGsyo/ldX5qTy8q2pSRd12n2yla8ARcB3FltW5DDpdvBCCu9Zl85OY71ljMBn56fHUtvePW+SwqqWXeYkxJI1YULYgLZ4rls/nsd2nxizvMpIxhn9/7jB3/mE/n//tO0HXpvLlWxPKV453YV4QM6JcLsO9Lx5lcWYi164dvfZXRNhcksvrx5onvQVvY1c/5Y3dIclXnNmuPHafbJ3xVXY1WMxylVOcNuuVHBfN2dnJw0nuySS3vZbnJCM+e1sYYyir7QxJcnukGzcUUd7Yza4TEy/6t/1wIzF2G5cs9V+axGtNQRr56fFTujscdLjY47ORzmQVpCfgdBnqxvngqWzpHTOHdeOGIpq7B3ihLPCwiDGGu58u44FXj3Pt2gXYRPiHJ/ZNaEaVdytVf8HCu4q7PoiexTMH6jhc38XtG5cRNcZd/5bVeThchucmOfvoLU/Pb6xSNZO1eXUuxsBfZ/j+3RosItD+6o7hN+54egcdNHUNTCm57Wv9wnTeq2rH6TKU1nSQHBfFwkn87MTYKBbOSxxOctd29NPZ7whZvsLXlpI8UuKi+O0kErcvH27kgsUZY5Z08PLeHb5Z3hxwo6ixlFa30zvonPJd63D12XGS3FWtvWPWW7psWRYFGfH84o0T4w6ruVyGb/75AL988ySfu3gh935yNXdtKebtE6089OaJoNvr3UrV3ySJ4ZIfAYKF0+XOVZyVncSWktFlULzOyUthUWbipIei3qpoISUuKuQ3NUuykjgnL2XGD0VpsIhA9zx/mK8+sS/ged5qs1NZY+FrXVE63QMOjjZ0ebZRTZ108TnfvS28PYxQzYTyFR9j52Pn5vPiwQY6+oLPKVS29FDR1MPlfqbM+vOxcxfgMobvP3t4Uu3cUdGCCFywaOo9C4DqMZLcAw4ntR19Y+5vbrMJX7hsKe9VtXPxv7/MvS8cGTVs43QZvv6H/Ty8s4pbL1vMXZuLERGuW5fPB4uzuef5IxxtGH/fEq/yEQUEfcXH2EmJiwqYszhU10lFUw+3XLpk3DIn3qGotypaaJpgzamO3iH+ur+OS8/KCulQqdfmkjz2nmqf1DDedNFgEYFq2/uoae8LWJ7BO+tlMnf//qwvctcO2lHRwpH6rinNCCnOS6GqtZfO/iHKajsRcQ9PWeGaNXmenELwww/ejY6uCLLu01nZydxy6RJ+v+cULx+eeO5iR0UzK3JSSE+MmfC1vnLT4rDJ2D2L6rY+jBl/KvWnLyjk6S9fwsVLMrnv5XIu+f7L/Nuzh2juHsDhdPGPT+zj93tO8XdXLOXOTcuHbxhEhH/72CqSYqO44/G9Qe0r4p0JNdb0a/cmSOO/z701xYKpVrxldZ5n9tHEhnx+/vpxuvodfOnypRO6LlibS9xVkmfymgsNFhHImwgrDbDg7PR8+tD0LPLT48lKjuXRXZUMOU1Qe1iMxduLOFzXRVldB4vmJQYc7pksb05hIn+ILx9uZHFmIgsnMDvrKx9cxvKcZP7pf/dPKInaP+Tk3ar2kCROo+02clPjx7xDDXbdzcoFqfzPZ9bx/N9fygdWZPPz145zyfdf5uP/vYM/vFfDP3zwLO648uxRPcvMpFi+d+1KDtR08tOXywO2t6Kxe3grVX/cJT/G71mU1XaSEGMPKjd3VnYyZ2VPbPZRa88gv3zzBB8pybVkqBTcw4drCtJmxHa9Y9FgEWG6+ofo8exzUHpq/GBR2dpDanw0qQnR454XLBFhXWH68N1gySQ2U/LyjvuW1XZQVtfJCguS217enMIbQeYUmroG2Hm8hY3F2RP6PbFRdn70ydW09w7yz38+EPR171a2MehwhWShF7jLfoy1F7e3NHlhkDcQZ+ckc98Na3npjsvYXJLHobouvn7Vcr48zv4Qm1bmcu3aBfx0e/nwRIix+G6l6s/8lNiAw1BldZ0sz0kOWGnXa0tJHrtPtlEb5HqSB16toG/IyVc2hnZPjFHtWp3HwdrOkJfXDxUNFhHGd3rd/prx/xDHm/UyWesXurv68xJjyEud/BTP+cmxZCTG8PbJVk619lmSr/C1ZXUuTpfhuSAWPz2+5xRDTsOnzisIeO5I5+Sl8vcbz+KvpXU8FeTd646KFuxjbKQzGfnj7GtxsqWXhBg7mUkTG+5anJXEDz+xmrK7P8Stl41dmtvrX64+h6ykWO54fN+4VXnHmjbrlZMSR2PXwJgzrIwxHKqb2Ey6zZ69QJ4JYvZRY1c/v37rJB9ds4Cl860ZJvX6yKpcRGbuUJQGiwjjnRJZkBHPvuqOcdcPVLb0jpnInKxzPePCU0lug/tuvzg3hZfK3LkBq4NFcW4KizMTAw4/OF2GR3dVcdGSeeN+iI3n1ksXs7YwjW/+6UBQhfDeOt7CqgWpJMeFpgdYkJ5AQ+eA3w/pqlb3e2Ky/+/GmpY6Ump8NPdcV0J5Yzc/eP6I33NGbqXqT3ZKHA6XoXWMyqzVbX109Tsozg1+SHRRZiIrF6QEFcz/a3sFQ04T8p32/MlJjeO8hRk8ta92QuuCDtR0sOt44NmRU6XBIsJ4x2+vLM6hqWtgzOTfkNNFTXtfyHsW5+SlMC8xhosDrD0IRnFeCoOeJKgVayx8ibhLQu880TLusMYrR9zlLrZuKJr074qy2/jRJ1Yz4HDyT/9bOu4ffveAg32nQpOv8PJWn/VXtqOypSfk74mxXHpWFp/ZUMQv3jjBg68fH/X8yK1U/fHdXtWf4Q2zJvj+uXZtPqXVHfzntmNjnlPb3seju6r4xLr8CeWupuJjaxcMB9hgAsZ7VW3c8POdfPPPB3BaXDFYg0WE8f7RfNAznr5vjDHh2vY+nC4TsuS2V2yUnde+djmfu3jRlH+WtzcxLzGG+VPcwyIYW0rci5/GG354eGcl85Njh/99J2txVhJ3blrOK0eaeGz3qTHP232yFUcQG+lMxPBaixFJbpfLcKqtb0p7Wk/UNzcXc9XKHP71r4f471cqznhu5Faq/njXWjR2jREsajuxibtsxkTcdNFCPrZ2AT968Sj3vuD/g/mn28sxGG67wpoZUP58cn0BN15QyH+9UsF3/3po3ICx+2Qrn/nF26QnxPDQTedZMqXXlwaLCFPX0c+8xBjWFKQRZZMxE4j+isWFSmJsVEjemN6ZJStyU6Y0pBWsZdnJLM9J5i9jjAmfau3llaNNXH9eQUhq//zNhQu5aMk8vvN0GSebe/ye81ZFC9F2GZ6WHAqnN0E6s2dR39nPoMMV8qHJ8cRE2fjPG9Zy9eo8vv/cYe7zuZOvaOw5YytVf7IDlPwoq+tkUWYi8TETq31mtwk/+MRqPrW+gPteLuffnzt8xgdzVUsvj+8+xfXnFU6qXPxk2WzCv350JTddtJAH3zjBvzx10G++5q2KFj770NvMT47l8VsvnJY2arCIMA2d/eSkugsDnpWdTGm1/xlRw/tYTNOQw2QszkokNT6acwtDU8EzGFtW5/FOZZvfIZpH365CgOs9+yBMlc3zgWS3CR++73W+98yhUXfIb1W0sLYwfcIfduOZnxxLTJSN6hE9i8oQlKufjCi7jR9/ag0fO3cB9754lB96hljKG7vP2ErVH++uiWMOQ9V2Upw3uSncdpt7XcjWDYU88Opx7n66bDhg3PfyMew2mdZehZeI8K0txdxy6WJ+/VYl3/jT/jMCxhvHmvncr95mQVo8j926gZwpTDSZCGsmtivL1HX0D89CKslP5bmD9RhjRt2ZV7X0EBNlG974fiaKttt44SuXkhofmsRuMDaX5PKD54/w19Jabrn09KyeAYeTx3efYuOKbPLS4kP2+xakxfOnL13Mf247xoOvH+fXO05yw/mF3HrZYhKiozhQ666SGko2m5CfNnr6bJW32myIhyaDYbcJP7xuNTF2Gz/dXs6Q00X5iK1U/Ym228hMivE7DNXRNzTl/JLNJnznmpVE22388s2TDDld3HTRIv7wbjWfu3jRcM9muokIX79qOdF24f7tFQw6DPdcV8JrR5u49eF3WJyZyCN/e8Go3RutpMEiwtR39A3fiZfkp/HY7lOcau0bNdzknQkV7NzzcJnuP8aieYmU5Kfyl311ZwSL5w7U09IzOKUPnrEsyUriP65fy+0bz+K/Xynn4Z2VPLKrkvVFGRjDpPbbDmRBevyo6bOVLb1E2YS8tPB8ANpswveuXUW03cYDr7kT3sHkhsZaxX3Ik9xekTu1Ka0iwl2bi4nxtOuZ/fXERtn5wvsDTxG2kojw1SvPJsZu58cvHaWuo4/dJ1s5OyeZ3/6fC6a82n+idBgqgvQPOWnrHSLXp2cB/pPcVSGqNjsbbSnJY39Nxxl5hId3VlI0LyFghdmpWJSZyD3XrWb7V9/PJ9cX8E5lG0mxUayxYBiuICNhVIK7srWXBenxQU9/tYLNJtx9zTncfIl7gkQwJV7cwWJ0z2K4pliItuK986rlfPmKpbT2DHLTxQvJnMa79vHadfvGZXxt09nsqGihOC+VR/52w7QHCtCeRUTx/sF478bPyk4mJsrG/poOtqw+XW3TGENVa29IZ9jMJh8pyeW7zxzi6dJabrtiGYfrO9l9so3/9+Hl09ITK8hI4LvXruLvPrCMrv4hYqNCl68Y/h3pCbT1DtE94Bjet6LKgnU3kyEi/PNHVvDhVTmsWhA4UGanxPqdyFFW10lmUuyU9v8Y2a5/uPJsPlicbfm6n4n64vuXcvGSTJZlJ5FHB0leAAAdVklEQVQQE56Pbe1ZRBDvgrzcVPeYekyUjRW5Kew7deYfUlP3AL2DTu1ZjCEvLZ71Ren8ZZ97VtQjO6uIibKN2orTatkpcZatCvautfDtXUznGotARIR1RRmjdrTzJzsljubuwVGFCa3aA6UkPy2sva+xrC5IC1ugAA0WEcXbs/Cd/bA6P5UDNR1nzJY4XSxu+hOZkWLL6jyONHTxXlUbf3yvhs2rcsPStbfK8PRZT7Bo7x2ks98RluT2VHl70r5lxQcdLsobu2dcD2A202ARQbw9C99gUZKfRs+gk+PNp4uPWbnGYra4alUONoGv/H4v3QMObrQgsR1O3oV53hlRkfye8K7i9s1bVDR1M+h0TTm5rYKnwSKC1Hf0kxwbdcbeyd4kt+96i8rWXkTcJcWVf/OT49iweB4nW3pZkZsyrWs9pkN6QjSJMfbhGVGRsO5mLN6chG+w8Ca3z7G4TIw6zdJgISKbROSIiJSLyJ1+nr9DRMpEpFREtolIked4kYi8KyJ7ReSgiHzeynZGivqO/lELcJZkJZEQYz8jWFS19JCXGm9J4nQ28U4K2LqhcFpWkE8nEfHMiHL3LKqGS5NHXrDwvud9p8+W1XUSF21jUebkij2qibMsWyIiduB+4INANbBbRJ4yxpT5nPYesN4Y0ysiXwDuAT4F1AEXGmMGRCQJOOC5dubuDDIN6jpHBwu7TViZl3rG9NnK1tCXJp+Nrl3r3gp1uhPb0yU/PZ5qb8+ipZf5ybFhTZBOVkZCDFE2GdWzODsnxfJ6SOo0K3sW5wPlxpjjxphB4DHgGt8TjDHbjTHe6Ro7gXzP8UFjjPc2ItbidkaM+o4+cvwsYivJT6WstnN4tkiVBftYzEZx0XZuvKAoqBk5kSg/3b3WwhgT0TcQNpswPzl2uGdhjOFQfSfFmq+YVlb+lSwAfMttVnuOjeVm4FnvAxEpEJFSz8/4/lzvVTicLpq6BoYX5PlalZ/KgMPF0YYuugcctPQMBr0Tmpq9CjIS6Bl0L+R0r7GI3PdEdurphXl1Hf209w7pTKhpNiP6pCKyFVgPXOY9Zow5BZSISB7wJxF50hjTMOK6W4BbAAoLQ1P8baZq6h7AZdx/NCOtzncnZ/f75C0i9S5ShU6BZ4JDeWM39Z39Ef2eyE6OGy5pHsqV2yp4VvYsagDfweB8z7EziMhG4BvA1T5DT8M8PYoDwPv8PPczY8x6Y8z6rKyskDV8Jjq9IG90sCial0BKXBT7qjuG11hEYiJThZZ3+uyOimYgsm8gslNih3sWZXWdiMDZORosppOVwWI3sExEFolIDHA98JTvCSKyFngAd6Bo9DmeLyLxnu/TgUsA/3szzhEN3jUWKaOnw4oIJflp7K9pj+gpkiq0hoNFuXvLzUi+gZifEkdnv4O+QSdltZ0UZSScMYVcWc+yYGGMcQC3Ac8Dh4DHjTEHReRuEbnac9oPgCTgCc80WW8wWQHsEpF9wKvAD40x+61qayTwtyDP16r8VA7XdXGsoZuMxJiQ7eesIldSbBTpCdG8W9UGRPaK/pyU02stDtVbU+ZDjc/S0GyMeQZ4ZsSxu3y+3zjGdS8CJVa2LdLUd/YTE2UjPcF/EFidn4rDZdh2uIGFEfyhoEIrPz2B/TUdJHsCR6TylvyoaOqmsqWXT6zLD3OL5p7ZOWdwFqrv6Cc3NW7MxWMlniR3e++QDkGpYd6CgoXzEiJ64aG35MerR5sATW6HgwaLCFHf0T/uRkG5qXFkJrkL4Wm1WeXlLSgY6TcQ3lmALx92pzaLcye3laqaPA0WEaKus8/vTCgvb5IboFCHoZRHvufGIZLXWAAkx0YRH22nuq2P9ITo4Z6Gmj4aLCKAMYaGjgG/q7d9rVrgvtuK9LtIFTretRaR/p4QkeEAUZyXEtFDapFKg0UEaO0ZZNDpGnMmlNdHSnK5ZGmmrmxVw9YUpHHRknmWbhc7XeZ7bpb0/R0eOlE5Aoy3IM/XWdnJPPy3F0xHk1SESEuI4dH/uyHczQgJb89ak9vhoT2LCDBy722l5iLvMNQK7VmEhfYsIsDIvbeVmosuXDKPd6vaWZKle1iEgwaLCFDf0Y/dJmQl6wwQNXddsTybK5Znh7sZc5YOQ0WA+s5+spJidaMXpVTYaLCIAP62U1VKqemkwSIC1HWMvyBPKaWspsEiAjR0DuhMKKVUWGmwmOG6+ofoHnBoz0IpFVYaLGa4+gD7WCil1HTQYDHD1Xd6d8jTYKGUCh8NFjOcLshTSs0EGixmOO8w1HwtyayUCiMNFjNcfWc/GYkxxEXbw90UpdQcpsFihqvv6Nd8hVIq7DRYzHB1nr23lVIqnDRYzHANnf3D+w8rpVS4aLCYwfqHnLT2DJKrw1BKqTDTYDGDeTc90gV5SqlwszRYiMgmETkiIuUicqef5+8QkTIRKRWRbSJS5Dm+RkTeEpGDnuc+ZWU7Zypdva2UmiksCxYiYgfuB64CioEbRKR4xGnvAeuNMSXAk8A9nuO9wN8YY84BNgH/ISJpVrV1pvKu3tYEt1Iq3KzsWZwPlBtjjhtjBoHHgGt8TzDGbDfG9Hoe7gTyPcePGmOOeb6vBRqBLAvbOiOd7lno6m2lVHhZGSwWAKd8Hld7jo3lZuDZkQdF5HwgBqgIaesiQF1HP0mxUSTF6u63SqnwmhGfQiKyFVgPXDbieC7wW+CzxhiXn+tuAW4BKCwsnIaWTi/dIU8pNVNY2bOoAQp8Hud7jp1BRDYC3wCuNsYM+BxPAf4KfMMYs9PfLzDG/MwYs94Ysz4ra/aNUtV36oI8pdTMYGWw2A0sE5FFIhIDXA885XuCiKwFHsAdKBp9jscAfwR+Y4x50sI2zmj1Hf26Q55SakYIGCxExC4iX5noDzbGOIDbgOeBQ8DjxpiDInK3iFztOe0HQBLwhIjsFRFvMPkkcClwk+f4XhFZM9E2RLLuAQeNXVoXSik1MwTMWRhjnCJyA/Djif5wY8wzwDMjjt3l8/3GMa57GHh4or9vNvnuX8sA2FicHeaWKKVU8AnuN0Xkp8DvgR7vQWPMu5a0ao57+XADv3v7FJ+/bAlrCubc8hKl1AwUbLDwDgHd7XPMAFeEtjmqrWeQf/rf/SzPSeYrH1wW7uYopRQQZLAwxlxudUOU2zf/fID23kF+9bnziI3SDY+UUjNDULOhRCRVRO4VkT2erx+JSKrVjZtr/rKvlqdL67j9A8s4J0//eZVSM0ewU2cfArpwz1L6JNAJ/NKqRs1FDZ39fPPPB1hTkMbnL1sS7uYopdQZgs1ZLDHGfNzn8bdFZK8VDZqLjDH80/+W0j/k5N5PribKrpXjlVIzS7CfSn0icon3gYhcDPRZ06S557Hdp3jlSBN3blrO4qykcDdHKaVGCbZn8XngNz55ijbgs9Y0aW6pbuvlX58u46Il8/ibCxeGuzlKKeVXwGAhIjbgbGPMak+9JowxnZa3bI54/mADPYNOvnftKmw2CXdzlFLKr4DDUJ5qr1/zfN+pgSK0TjR3kxIXRdG8hHA3RSmlxhRszuIlEfmqiBSISIb3y9KWzREnm3tZlJmIiPYqlFIzV7A5C+8e2F/yOWaAxaFtztxzormH8xamh7sZSik1rmBzFluNMW9OQ3vmlP4hJ7UdfSzMzA93U5RSalzB5ix+Og1tmXMqW3oxBhZlJoa7KUopNa5gcxbbROTjogPrIXWi2V3AV4OFUmqmCzZY3Ao8DgyISKeIdImIzoqaIm+wWKjBQik1wwWb4E4FbgQWGWPuFpFCINe6Zs0NJ5t7yEyKISUuOtxNUUqpcQXbs7gf2ADc4HncheYxpuxEc48OQSmlIkKwweICY8yXgH4AY0wbEGNZq+aIEy09LJynwUIpNfMFGyyGRMSOe20FIpIFuCxr1RzQ1T9EU9cAi7I0WCilZr5gg8V9wB+B+SLyXeAN4HuWtWoOqGzpBWCR9iyUUhEg2G1VHxGRd4APAAJ81BhzyNKWzXLHvdNmtWehlIoAwc6GwhhzGDhsYVvmlJOeYFGUocFCKTXz6ZZsYXKiuYe81DjiY+zhbopSSgVkabAQkU0ickREykXkTj/P3yEiZSJSKiLbRKTI57nnRKRdRJ62so3hcqK5R4eglFIRw7Jg4Zk9dT9wFVAM3CAixSNOew9Yb4wpAZ4E7vF57gfAZ6xqX7idaNZps0qpyGFlz+J8oNwYc9wYMwg8Blzje4IxZrsxptfzcCeQ7/PcNtyL/2adtp5BOvqGdEGeUipiWBksFgCnfB5Xe46N5WbgWQvbM2Mc1wKCSqkIE/RsKCuJyFZgPXDZBK+7BbgFoLCw0IKWWUOrzSqlIo2VPYsaoMDncb7n2BlEZCPwDeBqY8zARH6BMeZnxpj1xpj1WVlZU2rsdDrZ3IPdJhRk6L7bSqnIYGWw2A0sE5FFIhIDXA885XuCiKwFHsAdKBotbMuMcqK5h4L0eKLtOnNZKRUZLPu0MsY4gNuA54FDwOPGmIMicreIXO057QdAEvCEiOwVkeFgIiKvA08AHxCRahH5kFVtnW4nmnt0DwulVESxNGdhjHkGeGbEsbt8vt84zrXvs7BpYWOM4WRLDxcszgh3U5RSKmg6DjLNGrsG6B10anJbKRVRNFhMs+NNOhNKKRV5NFhMs5Mtnn23dfW2UiqCaLCYZieae4iJspGXFh/upiilVNA0WEyzE809FGUkYLdJuJuilFJB02AxCadae/nuX8twusyErz3R3KP5CqVUxNFgMQlP7DnFz18/QVVrb+CTfThdhqqWXg0WSqmIo8FiEkprOgBo6ppQdRJq2/sYdLo0WCilIo4GiwkyxlBa7Q4Wzd0TCxZaQFApFak0WExQTXsfrT2DwMR7FhoslFKRSoPFBHl7FTC5YJEYYycrOTbUzVJKKUvNiP0sIsm+6nai7UJSbNSkgsXCzEREdNqsUiqyaLCYoP3VHazITcHpMjRNMGdxsqWHVQtSLWqZUkpZR4ehJsDlMuyv7qAkP5Ws5NgJ9SwGHS5Oteq0WaVUZNJgMQEnW3roGnBQsiCNrKSJBYuq1l5cRpPbSqnIpMFiArzJ7ZKCVDKTY2nuHsAV5Cruk56ZULrpkVIqEmmwmIDS6g7io+0szUoiKykWh8vQ3jcU1LXeabOLNVgopSKQBosJKK1u55y8FKLstuHpr8EuzDvR0kNaQjRpCTFWNlEppSyhwSJIDqeLA7UdlOSnAQwHi2DzFpUtPRTpHhZKqQilwSJI5U3d9A+5KMl3T32daLCoa+8nX/ewUEpFKA0WQSo95UluTyJYGGOoae8jNzXOugYqpZSFNFgEqbSmneTYqOHtUJNjo4iNsgW1MK+td4gBh4tc7VkopSKUBosglVZ3sCo/FZtnhzsRCXphXm17HwAL0rRnoZSKTBosgjDgcHKorpNV+WeW6gg2WNR19AOQm6o9C6VUZLI0WIjIJhE5IiLlInKnn+fvEJEyESkVkW0iUuTz3GdF5Jjn67NWtjOQI/VdDDkNqz0zobyCXcVd1+HuWeRqz0IpFaEsCxYiYgfuB64CioEbRKR4xGnvAeuNMSXAk8A9nmszgG8BFwDnA98SkXSr2hrIvuozk9temcmxQeUsatr7iLYLmYlamlwpFZms7FmcD5QbY44bYwaBx4BrfE8wxmw3xng3st4J5Hu+/xDwojGm1RjTBrwIbLKwrePaX91ORmIMC0YkqLOSYmnrHWTI6Rr3+rr2fnJS44bzHUopFWmsDBYLgFM+j6s9x8ZyM/DsJK+1VKmn0uzIfSiykmMxhuGd88ZS19FHnuYrlFIRbEYkuEVkK7Ae+MEEr7tFRPaIyJ6mpiZL2tY36ORoQxclfvahCHatRW17P3k6bVYpFcGsDBY1QIHP43zPsTOIyEbgG8DVxpiBiVxrjPmZMWa9MWZ9VlZWyBru62BtBy7DcJkPX8EEC6fL0NDZrwvylFIRzcpgsRtYJiKLRCQGuB54yvcEEVkLPIA7UDT6PPU8cKWIpHsS21d6jk270jGS2+DOWcD4waKpawCHy+iCPKVURLNsW1VjjENEbsP9IW8HHjLGHBSRu4E9xpincA87JQFPePIBVcaYq40xrSLyHdwBB+BuY0yrVW0dT2l1OzkpccxPGd0zGO5ZjDMjqtYzbTZPexZKqQhm6R7cxphngGdGHLvL5/uN41z7EPCQda0Ljje57U9ctJ3kuKhxexZ17e4FeZqzUEpFshmR4J6pOvuHON7cM2awgMCruOuGexYaLJRSkUuDxTgODOcrRie3vQKt4q5t7ychxk5KvKWdOKWUspQGi3GU1riDxSo/02a9vHtxj6XWU5p85BoNpZSKJHP+drejd4hPP7jT73O17X0UZiSQnjj2VqhZSbG8FmAYSvMVSqlIN+eDhdgYcw1EbmocV56TM+71WcmxdA046Bt0Eh9jH/V8bUc/y3NSQtJWpZQKlzkfLFLionnws+dN+nrv9Nnm7gEKMhLOeG7Q4aK5e0CrzSqlIp7mLKbIGywa/QxFNXT2Y4zOhFJKRT4NFlM03irumnbdx0IpNTtosJii+eOs4h7e9Eh7FkqpCKfBYooyEmMQ8d+zqB1eva09C6VUZNNgMUVRdhvzEmP8rrWo6+gjLSGahJg5P49AKRXhNFiEQOYYq7hr2/t1CEopNStosAiBsepD1bb3abVZpdSsoMEiBMaqD1XX0a8zoZRSs4IGixDISo6lqXsAY8zwsd5BBx19Q1rqQyk1K2iwCIGs5FgGHS46+x3Dx4ZnQmnOQik1C2iwCAF/e3HXehfkac5CKTULaLAIAX+ruIc3PdJhKKXULKDBIgT87cVd296PCGT72btbKaUijQaLEBiuPDuiZ5GVFEtMlP4TK6Uin36ShUBqfDTRdjmjZ+GeNqtDUEqp2UGDRQiIyKi1FjW6IE8pNYtosAiRTJ9V3MYY6rTUh1JqFtFgESK+PYuOviH6hpxabVYpNWtYGixEZJOIHBGRchG508/zl4rIuyLiEJHrRjz3fRE54Pn6lJXtDAXvKm7wLU2uPQul1OxgWbAQETtwP3AVUAzcICLFI06rAm4CHh1x7UeAc4E1wAXAV0Ukxaq2hkJWciwt3QM4XcZn0yPtWSilZgcrexbnA+XGmOPGmEHgMeAa3xOMMSeNMaWAa8S1xcBrxhiHMaYHKAU2WdjWKctKjsVloLVncHj1tvYslFKzhZXBYgFwyudxtedYMPYBm0QkQUQygcuBghC3L6R8V3HXdvQTZRMyPceUUirSzcgt3IwxL4jIecAOoAl4C3COPE9EbgFuASgsLJzWNo40vDCve4C69j6yU+Kw2ySsbVJKqVCxsmdRw5m9gXzPsaAYY75rjFljjPkgIMBRP+f8zBiz3hizPisra8oNngrfYoK1Hf0s0CEopdQsYmWw2A0sE5FFIhIDXA88FcyFImIXkXme70uAEuAFy1oaAt4hp6buAeo6+nTTI6XUrGLZMJQxxiEitwHPA3bgIWPMQRG5G9hjjHnKM9T0RyAd2CIi3zbGnANEA6+LCEAnsNUY4/D/m2aGxNgoEmPsNHT2U9+hC/KUUrOLpTkLY8wzwDMjjt3l8/1u3MNTI6/rxz0jKqJkJsdyqK6TIafRBXlKqVlFV3CHUFZSLAdqOgG0Z6GUmlU0WIRQVnIs3QPu0TLtWSilZhMNFiHknREFuve2Ump20WARQt6FeXHRNtISosPcGqWUCh0NFiHk7VnkpcbjmcmllFKzggaLEPIGC11joZSabTRYhJBvz0IppWYTDRYhdLpnocFCKTW7aLAIoezkOLZuKOSqlTnhbopSSoXUjKw6G6lsNuFfP7oq3M1QSqmQ056FUkqpgDRYKKWUCkiDhVJKqYA0WCillApIg4VSSqmANFgopZQKSIOFUkqpgDRYKKWUCkiMMeFuQ0iISBNQGeC0TKB5GpozU83l1z+XXzvM7devr318RcaYrEA/aNYEi2CIyB5jzPpwtyNc5vLrn8uvHeb269fXHprXrsNQSimlAtJgoZRSKqC5Fix+Fu4GhNlcfv1z+bXD3H79+tpDYE7lLJRSSk3OXOtZKKWUmoQ5EyxEZJOIHBGRchG5M9ztsZqIPCQijSJywOdYhoi8KCLHPP9ND2cbrSIiBSKyXUTKROSgiNzuOT7rX7+IxInI2yKyz/Pav+05vkhEdnne/78XkZhwt9UqImIXkfdE5GnP47n02k+KyH4R2SsiezzHQvK+nxPBQkTswP3AVUAxcIOIFIe3VZb7FbBpxLE7gW3GmGXANs/j2cgB/IMxphjYAHzJ8/97Lrz+AeAKY8xqYA2wSUQ2AN8HfmyMWQq0ATeHsY1Wux045PN4Lr12gMuNMWt8psyG5H0/J4IFcD5Qbow5bowZBB4DrglzmyxljHkNaB1x+Brg157vfw18dFobNU2MMXXGmHc933fh/uBYwBx4/cat2/Mw2vNlgCuAJz3HZ+VrBxCRfOAjwIOex8Icee3jCMn7fq4EiwXAKZ/H1Z5jc022MabO8309kB3OxkwHEVkIrAV2MUdev2cYZi/QCLwIVADtxhiH55TZ/P7/D+BrgMvzeB5z57WD+8bgBRF5R0Ru8RwLyfte9+Ceo4wxRkRm9VQ4EUkC/hf4e2NMp/sm0202v35jjBNYIyJpwB+B5WFu0rQQkc1AozHmHRF5f7jbEyaXGGNqRGQ+8KKIHPZ9cirv+7nSs6gBCnwe53uOzTUNIpIL4PlvY5jbYxkRicYdKB4xxvzBc3jOvH4AY0w7sB24EEgTEe/N4Wx9/18MXC0iJ3EPNV8B/IS58doBMMbUeP7biPtG4XxC9L6fK8FiN7DMMysiBrgeeCrMbQqHp4DPer7/LPDnMLbFMp5x6l8Ah4wx9/o8Netfv4hkeXoUiEg88EHcOZvtwHWe02blazfGfN0Yk2+MWYj7b/xlY8yNzIHXDiAiiSKS7P0euBI4QIje93NmUZ6IfBj3eKYdeMgY890wN8lSIvI74P24q042AN8C/gQ8DhTirtD7SWPMyCR4xBORS4DXgf2cHrv+f7jzFrP69YtICe4kph33zeDjxpi7RWQx7rvtDOA9YKsxZiB8LbWWZxjqq8aYzXPltXte5x89D6OAR40x3xWReYTgfT9ngoVSSqnJmyvDUEoppaZAg4VSSqmANFgopZQKSIOFUkqpgDRYKKWUCkiDhVJKqYA0WCg1zTxlpDMnee1NIpIXip+l1ERosFAqstwE5AU6SalQ02Ch5iwRWSgih0XkVyJyVEQeEZGNIvKmZ6OY8z1fb3k209khImd7rv2KiDzk+X6ViBwQkYQxfs88EXnBsxnRg4D4PLfVs1nRXhF5wLP3CiLSLSI/9lyzzVPG4zpgPfCI5/x4z4/5soi869n0Zk4UDVTTT4OFmuuWAj/CXZl1OfBp4BLgq7hLhBwG3meMWQvcBXzPc91PgKUici3wS+BWY0zvGL/jW8AbxphzcJdjKAQQkRXAp4CLjTFrACdwo+eaRGCP55pXgW8ZY54E9gA3eja36fOc22yMORf4b0+7lQo5LVGu5roTxpj9ACJyEPeOYkZE9gMLgVTg1yKyDPdeAdEAxhiXiNwElAIPGGPeHOd3XAp8zHPdX0WkzXP8A8A6YLenfHo8pyuCuoDfe75/GPgDY/M+94739ygVahos1FznW1DO5fPYhfvv4zvAdmPMtZ6NlF7xOX8Z0M3kcwgC/NoY8/Ugzh2viJu3zU70b1pZRIehlBpfKqf3P7jJe1BEUoH7cPca5nnyCWN5DffwFiJyFZDuOb4NuM6zUQ0ikiEiRZ7nbJwuq/1p4A3P911A8hRej1KTosFCqfHdA/ybiLzHmXftPwbuN8YcBW4G/t37oe/Ht4FLPcNcHwOqAIwxZcA/494GsxT3Fqi5nmt6gPNF5ADuTXzu9hz/FfA/IxLcSllOS5QrNQOJSLcxJinc7VDKS3sWSimlAtKehVIhIiKfA24fcfhNY8yXwtEepUJJg4VSSqmAdBhKKaVUQBoslFJKBaTBQimlVEAaLJRSSgWkwUIppVRA/x/LM3kv95ak7gAAAABJRU5ErkJggg==\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "line_df = pd.DataFrame({'max_depth' : rnge, 'error' : errors})\n", "sns.lineplot(x='max_depth', y='error', data=line_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(i) Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes. (j) Compare the training error rates between the pruned and un- pruned trees. Which is higher? \n", "(k) Compare the test error rates between the pruned and unpruned trees. Which is higher?\n", "\n", ".. pruning not easily available in python world" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. We now use boosting to predict Salary in the Hitters data set." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>AtBat</th>\n", " <th>Hits</th>\n", " <th>HmRun</th>\n", " <th>Runs</th>\n", " <th>RBI</th>\n", " <th>Walks</th>\n", " <th>Years</th>\n", " <th>CAtBat</th>\n", " <th>CHits</th>\n", " <th>CHmRun</th>\n", " <th>CRuns</th>\n", " <th>CRBI</th>\n", " <th>CWalks</th>\n", " <th>League</th>\n", " <th>Division</th>\n", " <th>PutOuts</th>\n", " <th>Assists</th>\n", " <th>Errors</th>\n", " <th>Salary</th>\n", " <th>NewLeague</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>1</th>\n", " <td>315</td>\n", " <td>81</td>\n", " <td>7</td>\n", " <td>24</td>\n", " <td>38</td>\n", " <td>39</td>\n", " <td>14</td>\n", " <td>3449</td>\n", " <td>835</td>\n", " <td>69</td>\n", " <td>321</td>\n", " <td>414</td>\n", " <td>375</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>632</td>\n", " <td>43</td>\n", " <td>10</td>\n", " <td>6.163315</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>479</td>\n", " <td>130</td>\n", " <td>18</td>\n", " <td>66</td>\n", " <td>72</td>\n", " <td>76</td>\n", " <td>3</td>\n", " <td>1624</td>\n", " <td>457</td>\n", " <td>63</td>\n", " <td>224</td>\n", " <td>266</td>\n", " <td>263</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>880</td>\n", " <td>82</td>\n", " <td>14</td>\n", " <td>6.173786</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>496</td>\n", " <td>141</td>\n", " <td>20</td>\n", " <td>65</td>\n", " <td>78</td>\n", " <td>37</td>\n", " <td>11</td>\n", " <td>5628</td>\n", " <td>1575</td>\n", " <td>225</td>\n", " <td>828</td>\n", " <td>838</td>\n", " <td>354</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>200</td>\n", " <td>11</td>\n", " <td>3</td>\n", " <td>6.214608</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>321</td>\n", " <td>87</td>\n", " <td>10</td>\n", " <td>39</td>\n", " <td>42</td>\n", " <td>30</td>\n", " <td>2</td>\n", " <td>396</td>\n", " <td>101</td>\n", " <td>12</td>\n", " <td>48</td>\n", " <td>46</td>\n", " <td>33</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>805</td>\n", " <td>40</td>\n", " <td>4</td>\n", " <td>4.516339</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>594</td>\n", " <td>169</td>\n", " <td>4</td>\n", " <td>74</td>\n", " <td>51</td>\n", " <td>35</td>\n", " <td>11</td>\n", " <td>4408</td>\n", " <td>1133</td>\n", " <td>19</td>\n", " <td>501</td>\n", " <td>336</td>\n", " <td>194</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>282</td>\n", " <td>421</td>\n", " <td>25</td>\n", " <td>6.620073</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns \\\n", "1 315 81 7 24 38 39 14 3449 835 69 321 \n", "2 479 130 18 66 72 76 3 1624 457 63 224 \n", "3 496 141 20 65 78 37 11 5628 1575 225 828 \n", "4 321 87 10 39 42 30 2 396 101 12 48 \n", "5 594 169 4 74 51 35 11 4408 1133 19 501 \n", "\n", " CRBI CWalks League Division PutOuts Assists Errors Salary \\\n", "1 414 375 1 1 632 43 10 6.163315 \n", "2 266 263 0 1 880 82 14 6.173786 \n", "3 838 354 1 0 200 11 3 6.214608 \n", "4 46 33 1 0 805 40 4 4.516339 \n", "5 336 194 0 1 282 421 25 6.620073 \n", "\n", " NewLeague \n", "1 1 \n", "2 0 \n", "3 1 \n", "4 1 \n", "5 0 " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hit_df = pd.read_csv('hitters.csv')\n", "hit_df = hit_df.drop(hit_df.columns[0], axis = 1)\n", "hit_df = hit_df.dropna()\n", "hit_df.Salary = np.log(hit_df.Salary)\n", "hit_df.League = hit_df.League.map({'N':1, 'A':0})\n", "hit_df.Division = hit_df.Division.map({'W':1, 'E':0})\n", "hit_df.NewLeague = hit_df.NewLeague.map({'N':1, 'A':0})\n", "hit_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(b) Create a training set consisting of the first 200 observations, and a test set consisting of the remaining observations." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "train, test = model_selection.train_test_split(hit_df, test_size=hit_df.shape[0] - 200)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(c) Perform boosting on the training set with 1,000 trees for a range of values of the shrinkage parameter λ. Produce a plot with different shrinkage values on the x-axis and the corresponding training set MSE on the y-axis." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "def hitters_boosting_test_error(lr):\n", " train_x = train.drop('Salary', axis=1)\n", " train_y = train.Salary\n", " reg = ensemble.GradientBoostingRegressor(n_estimators=1000, learning_rate=lr).fit(train_x, train_y)\n", " test_x = test.drop('Salary', axis=1)\n", " test_y = test.Salary\n", " preds = reg.predict(test_x)\n", " return np.sqrt(metrics.mean_squared_error(test_y, preds))\n", "\n", "lambdas = np.linspace(0.01, 1, 100)\n", "test_errors = [hitters_boosting_test_error(lr) for lr in lambdas]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "def hitters_boosting_training_error(lr):\n", " train_x = train.drop('Salary', axis=1)\n", " train_y = train.Salary\n", " reg = ensemble.GradientBoostingRegressor(n_estimators=1000, learning_rate=lr).fit(train_x, train_y)\n", " preds = reg.predict(train_x)\n", " return np.sqrt(metrics.mean_squared_error(train_y, preds))\n", "\n", "lambdas = np.linspace(0.01, 1, 100)\n", "train_errors = [hitters_boosting_training_error(lr) for lr in lambdas]" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1167bed68>" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEKCAYAAAAW8vJGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8lOW5//HPlZ1ACJCFPYQlbIICBhRQ3C0uR2pdCtaqrS2ntrZW6zm1P1vr0r3Hc7qotaK2tVVxqVVUlLohm+x7WJMASVgDWSFkv39/zCTOZCFDyGQC+b5fL17MPHPP81xPCHPNvZtzDhERkTphoQ5AREQ6FiUGERHxo8QgIiJ+lBhERMSPEoOIiPhRYhARET9KDCIi4keJQURE/CgxiIiIn4hQB9AaiYmJLjU1NdRhiIicVtasWXPYOZfUUrnTMjGkpqayevXqUIchInJaMbM9gZRTU5KIiPhRYhARET9KDCIi4keJQURE/CgxiIiIHyUGERHxo8QgIiJ+lBhERNpJzpEyPtxyMNRhtEiJQUSkncxZnM3sv6/mUGl5qEM5ISUGEZF2sr+4nFoHb2/YH+pQTkiJQUSkneR7awpvrd8b4khOTIlBRKSdHCypICo8jI15xWTlHw11OM1SYhARaQc1tY78oxXMGNcPM3hrXcetNSgxiIi0g4JjldTUOsYOiGfq0ETeXL8P51yow2qSEoOISDs4WOLpX0iOi2HGuH7kFJSxNqcoxFE1TYlBRKQd5JdWAJDcPZrpY/oQHRHWYTuhlRhERNpBXY2hd/cY4mIiuXx0b97esI/qmtoQR9aYEoOISDs4WOKpMSR1iwbgC2f1obCsis37SvzKFZVV8uQnmdTUhq7/QYlBRKQdHCotp1fXKKIiPB+7k4ckALAs67BfuRdX5PDbBdvZmBe6/gclBhGRdnCwpILkuOj650lx0YzoHcdnWUf8yi3cfgiAnIKydo3PlxKDiEg7yC8tJ7l7jN+xyUMTWLW7gIrqGsDTjLRmTyEAe46cwYnBzKab2XYzyzSzB5opc7OZbTGzDDN7KdgxiYi0t4MlFfT2qTEATB2WSHlVLeu8w1YX7TxMrYMwC21iiAjmyc0sHHgSuALIA1aZ2Tzn3BafMmnAj4CpzrlCM0sOZkwiIu2tbtZzcnf/xDBpcC/CDJZlHub8IQl8su0QvbpGMSSxK7lncFPSJCDTOZftnKsE5gIzGpT5JvCkc64QwDl3KMgxiYi0q7pZz70bNCXFd4lk7IAeLMs6Qk2tY+H2Q1w8PInUxK7sKTgWomiDnxj6A7k+z/O8x3wNB4ab2VIzW25m05s6kZnNNrPVZrY6Pz8/SOGKiLQ931nPDU0ZmsD63CI+yzpCYVkVF49MZlCvWA6WVFBeVdPeoQIdo/M5AkgDLgZmAXPMrEfDQs65Z5xz6c659KSkpHYOUUSk9XxnPTc0ZWgC1bWO3yzYRniYcVFaEikJsUDoRiYFOzHsBQb6PB/gPeYrD5jnnKtyzu0CduBJFCIiZwTfWc8NpQ/qVb8U97kpPYmPjSSllzcxhKgDOtiJYRWQZmaDzSwKmAnMa1DmTTy1BcwsEU/TUnaQ4xKRDuqRtzM67BpCgXrwX5v43svr6p83nPXsq0tUOONTPI0kF4/0tIYMSugKwJ4zscbgnKsG7gYWAFuBV51zGWb2qJld5y22ADhiZluAT4D/cs4dafqMInImKy6r4q/LdvP4v3e0y5LUtbWOdza27XpFB0vKmbsql/c27+doRTXQeNZzQxcMSwTg0pGeQZk9YyOJi44g50hoOqCD3sfgnJvvnBvunBvqnPu599hDzrl53sfOOXefc260c26sc25usGMSkY5pbU4hznna1lftLgz69T7edoi7X1rH/M0H2uycL63IoabWUVXjWJHt+Y7bcNZzQ3dMTWXObemM7NMdADNjYK/YM7PGICJyMlbtLiAizIiNCuefa/ICek/BsUoyD5W26npLMj3rFC3LPNxCycBU1dTy8socpg5LICYyjMU7Pedtatazr7iYSK4Y3dvv2KCE2DO281lEJGCrdxdyVv94rh7bl3c37aessvqE5fMKy5jx5BJmzVnRquvVrVO0NKttEsMHWw5yqLSCOy8YzPlDEli00zO0vqlZzy1JSYglr+B4SFZZVWIQkQ6horqG9XlFTBzUkxvPHcDRimoWZDTfxJNXWMbMZ5aTW3Cc/NIKjhytOKnr5ZdWsP1gKQN7dSG34HibzDR+4bPdDOjZhYuGJ3NhWhLZ+cfILShrctZzSwb16kplTS0HvCOa2pMSg4h0CJv3FlNZXUt6ai8mpfZiYK8u/HNN06OT6pJCyfEq/usLIwDYeejoSV3vM2/7//cvGw7A0lNsTtp5sJTl2QV85bxBhIcZ09I8Hcpvrtvb5KznltQNWd0Tgg5oJQaRDsQ51yF39GoPdZ3N6ak9CQszbpgwgKVZh9lbdLxR2R+9sYni41X84xvncf14z2IKJ5sYlmUeJi4mghnj+pEcF82yrFMbDPmP5XuICg/j5vQBAAxL7kaf7jH8c62nr6SpWc8nMsg7yS0UayYpMYh0IK+vyWPSLz7iWMWJ29bPRKt3FzAksSuJ3rH+N0wYgHPwRoNO6D1HjrF452FmXziEswf0oG98DF2jwsk8eHId0MuyjnD+kAQiwsOYMjSBZVlHWhwiW1ZZzcPzMliy07924ZxjQcZBLh+dTII3fjPjwrREdnsnqZ1sU1Lf+Bgiwiwkq6wqMYh0IAt35FNwrJLVe4I/VLMjqa11rN5TSHpqz/pjA3vFcsGwRF5YvsdvzaBXVuUSZnBTumdRBTNjWO+4k6ox5BaUkVNQxtShnl3UpgxL5PDRCnYcbP4cBccqmTVnBX9dtps/L8pqcL7jHCgpZ/LQRL/jFw7/fPmek21KiggPo3/PLiEZsqrEINIGfrtgGx9vO3jK51nvXZe/4XaPZ7qs/KMUlVWRntrL7/h3Lx1GfmkFL63IATzDQV9bk8elI5PpE//5B21acreTSgx1o5GmeieWTfEmiLp+hgPF5cx+YTX3v7aBt9bvZfPeYm58ehnb9pcwPqWH3+Y6AMt3ec533mD/+C8YloiZ53FTs55bktIrNiTLYigxiJyirftLePKTLP6xPOeUznOotLy+PX35KbZ3n27q+hcmNkgM5w1JYMrQBP70aRblVTV8vO0Q+aUVfHliil+5tORu5JdWUFxWFdD1lmYdJikummHJ3QAY0DOWQQmxLMs6wqHScm55djlLMg/zwZaD3DN3Pdf+cQmHSyv4+53n8e2Lh/ltrgOwclcBPWMjGZbUze86vbpGMbZ//AlnPZ9IqOYyBHWjHpHO4O/L9wCwZV9JwO/5dEc+mYeOcucFg+uP1dUWLhiWyLKsw5SUV9E9JrJVMTnn+MGrG/jShAFckJbY8htCbPXuAhK7RZHq7XD1dc9laXz5meW8uCKHJTvz6d09mktG+K+wnNbb84GcmV/KuYN6NTqHL+ccy7KOMGVoAlb3dR6YMjSRdzbs49ZnV7C/qJwX7pzEhJSebNpbzNo9hVw0IomhSd0oPl7l2VzH20cBnsQwMbUXYWHW6HrfvTSN3YdbN7JoUK+uFB+vorisivjY1v0utIZqDCKnoKS8ijfX7SU2KpwDJeUBjaU/WlHND15dzy/mb6XwWGX98fW5RUSEGd+4cDC1DlZmF7Q6rrzC47yxbi8PvbU55KOc9jUxqshXdU0tK3YVkD6ol98HdZ26WsMTH+/k0x353HTuQCLC/T+60pLjANh5gj6COhvziskvrahvPqozdVgCpRXV7DlSxnO3pzMxtRfhYca4gT34+gWDGeqtDdRvruNtdtpffJycgjLOG5LQ6FoAV4zuzTenDWkxrqbULb+9o5Uzu1tLiUHkFLyxJo+yyhruucyzUvyW/S3XGp5emMXho54dvT7e9vmGhetyihjVtzvnD0kgOiLslIZPZnhrL9mHj/GvdaFbqfTllTlM+dXHvLNxX7NlfjF/G3uLjnPduH7Nlvn+5cMpLKui1sGXJw5s9Hr/Hl2IiQxrsZ9h4fZDfPW5FfSIjeSSkf67CE8bnsTlo3rz7O3pTBl24lpW3eY6xyqqWbnLk8Ab9i+0hfMHJxAXHcGcRe274LQSg0grOef4+/I9nDOwBzd7R8i01Jy0r+g4cxZnM2NcP/rGx9TP7K2pdWzMK2LcwB7ERIaTntrzlDqgt+wvIcxgZJ84fv/RTiqr26bWsHlvMf/wNp21ZFNeMT99KwOAuStzmyzz2upcnl+6i69NTeXqsX2bPdekwb24fFRvrhrTh4G9Gjc3hYUZw07QAe2c46mFmXztr6vo16ML875zQaN5Bd1jInn29nQuTGt5I7CpQxOprnWs3F3Ail0FdIuOYFTf7i2+72TFx0byjQuH8O8tB9mYV9TyG9qIEoNIK32WdYSs/GPcdv4genaNol98TP039eb8dsF2AP57+kiuHN2bRTvzOV5ZQ+ahoxyrrKlfl3/ykAS2HSilwKep6WRs2VfC0KRu/PCqkeQVHufV1U1/MJ+MNXsKmfnMcn785uZGi9btKzrOj97YyNLMwzjnKCqr5K4X15DYLYrbJw9iadbhRk1K63IKefBfm5kyNIEHrx7V4vWfvT2dP916brOvpyXHNTuX4c+LsvnN+9u5Zmxf3vj2lPommtZKT+1JVHgYyzIPs3JXAempPQlvon+hLXz9glR6xEby+L93BOX8TVFiEGmlFz7bQ8/YSK452/NNd3S/+BM2JW3ILeJf6/byjQsH079HF648qw/lVbUs3pnPuhzPqJxxA72JwTsefnl265qTtu4vYVTf7lw8PIlzB/XkiY8zT2n/4DV7Crj9+ZX07OrpAH1vk/8aRs8t2cXLK3P5yrMrmPHkUma/sIaDJeU8deu53HnBEJzDr0mruKyKb/1jDb3jo3nylgmN+gxaY1hyN/YVl1Na7j8yaefBUv733zuYflYf/jhrPLFRpz7mJiYynAmDevDe5gNkHjrKeYOb7l9oC3ExkXzroqF8uiOfVbtb3+90MpQYRE5SVU0tj7ydwfsZB7jlvBRiIsMBGN2vO9n5Rzle2fQH8P99uIPEblHcdfEwwNM80j0mgn9vOcj63CLiu0QyONGzc9fZA+KJjQqvH29/MorKKtlbdJzR/bpjZvzgyuEcKCnn5ZWtG067LqeQ255bSVJcNK/95xTOHdTTb/+C6ppa5m3Yx6Ujk/nF9WMpOV7Fyt0F/OTa0Ywb2IOUhFjOG9yL19fk1c8s/uV7Wzl8tJI/feVcenaNalVcDaV5h55m5X8+Aqi6ppb7X9tA1+hwfnb9mCY7t1tr6tBE8go9taBJQehf8HX75FQSu0XzPwu2t8sGRkoMIifhUGk5X5mzgr8s3c3Xpw7m+5cPr3/trH7dqXWw7UDjWsPeouN8uiOfWyal0C3a8401MjyMy0b15qOtB1mzp5BzBvao/+CKDA9j0uBerepnqKu1jPa2eU8Zmkj6oJ78ZeluaptZwrmyupaF2w+xKa/Y77hzjp/OyyC+SyRzZ59Pn/gYrhrTh637S+qHYC7NOkJ+aQU3pw/glvNS+OgHF/PhfdO4bXJq/XluOHcAuw4fY21OISuyjzB3VS7fuGAwY/rHn/T9NSetd93IpM+bk55ZnM2GvGIenTGmfqmNtlLXQR0TGcbYNryPpnSJCufuS4ayYlfBKa/pFAglBpEAlVfVcP2Ty9i4t4jfzxzHQ/8xmkifJpC6D+KmmpNeX52Hc58v41DnytG9KSyrYueho4z3NiPVmTI0gaz8Y/UbyQdq637PB6NvZ+jtU1LJKShj4Y5DfmU37y3m/tc2kP6zD7jjL6u49bkVfkNol2YeYWNeMd+9LK1+SYfpY/oA8J631vDmur10j4moH+UTHmYM8w4frXP12L50iQznpRW5/OhfmxjQswv3XJ52UvfVkoE9uxAVHkZmvqcDel1OIb/7YCdXjenDtWc337HdWmcPiKdrVDgTUnq2avLayZp1Xgq3np9C/x5dgn4tJQaRAK3PLWJv0XEev2kcM8b1b/T6gJ5d6B4T0agDurbW8dqaXKYOS2g0omba8KT6D5VxKf6JoW65hkU78k8qzi37SkiOiybJZ2OY6WP6kBwXzV+XfT6i6FBpObfMWc6CjANcMboPv/zSWErLq/jdh593cj75SSa9u0fzpQmf3++AnrGcMyCe9zfv51hFNe9vPsA1Z/cjOiK82Zi6RUdw1dg+/HNtHtn5x/jZF8e0SVu/r4jwMIYkdWXLvhL+Z8F2bnz6M3p1jeKxL7ZtE1KdyPAwfjdzPD+6quWO87YQHRHOz744llRvc2MwKTGIBGh59hHMaHYmsZkxul/3RkNWl2UdIa/weP2QVl9doyO40JsAxg3wTwyj+3YnKS6aRTtPrjlpi7fj2VdkeBhfOW8Qi3bkk+39Rv3IvC2UV9fy1nem8vjN5zBrUgq3nJfCP1bksONgKetyCvks+wjfvHBIow/96WP6siGvmL8u283xqpr6pa9P5MZzPctR/8c5/bh4RHILpVtnWHI3Fu88zBOfZPLFcf15//sXtnkTkq8rRvdm7IDgNiOFghKDSIBWZBcwum934rs0vzTB6L7xbDtQ4rcd4yurc4nvEskXzurT5HvuuTyNB68e1agTtm7Z5iU78xtt79jcCKPK6loyD5Uyul/jMfWzzhtIZLjxwmd7+HDLQd7dtJ/vXTqMIT7r+9x7+XBio8J57J0tPLUwix6xkcyalNLoXFd5m5N+9+EOBvTsQvqgno3KNDR5SAK/nzmOn31xTItlW+vyUb0Z0TuOv3xtIo/ffA49YtumY7uzCXpiMLPpZrbdzDLN7IEmXr/DzPLNbL33zzeCHZM01h4jHTqaHQdLeeGz3VQFsGRERXUNa3MKWxyWeFa/7pRX1bLrsOdbeVFZJQsyDvDFcf3qRy81dPaAHs0umXDR8CQKy6rYvPfzTuE1ewoY+/ACVjQxlHXnoVKqalx9f4ev5LgYrhnbl9fX5PGTtzYzonccs6cN9SuT0C2aey5LY/FOzwJyd0xJpWt04yaf1MSujOrbnaoax/Xj+ze5RlBDZsaMcf1PmFhP1RfH92fBvdO4JEg1ks4iqInBzMKBJ4GrgNHALDMb3UTRV5xz47x/ng1mTNJYVU0ts+Ys5+F5Ge1yvbLKan7z/jbyCtt/1UjwjPH/9otr+MLvFvHQWxk89NbmFhPjhtxiKqprOX/IiYcl1n1Tz9hXgnOOV1blUlldy81NLOMQiLplmz/16Wf486fZVNU4nluyq1H5umaspmoMALdNSeVoRTUHSsr51Q1jm+w0vW1yKkMSuxIbFc4dU1Kbje3as/ti5vkwljNLsFdXnQRkOueyAcxsLjAD2BLk655RXlyxh5+/u5WYyHB6dIlkaHI3Hr/5nFavvNnQX5buYnm2Z2r/TekDOKtfcNtMn1u8i6cWZrE8+wiv/ufkgCY3Zecf5VhFDcP7dGu2k/OJj3eSlX+MX90wtsky1TW1/Pr9bcxZvIu46Ai+c/EwjlfV8NySXQxJ7HbChc5WePsXWhqvPiy5G1HhYfzfBzt47J2tHD5awbiBPVr9M03oFs3Y/vEs2pHP9y5LI7egjA+2HqRX1yg+3HqQfUXH6eczSmXr/lK6RIaTmtB0B+X4gT24akwf0pK7MT6l6eafqIgwnr09ncKyyhM2xXzjwsFcNDypfnE5OXMEOzH0B3zn4ucB5zVR7gYzmwbsAO51zp36/P0zxNqcQn76VgbjU3owsk93Co5V8u6m/Ty/ZJffGPrWyiss4/8+2MmFaYlszCvm1+9v54WvT2qDyJtWXFbFM4uzSekVy9qcIp5amMX3LjvxsMXisiq++ORSSsqriQw3RvSJ44YJA7hjSmr9aJOXV+bwP94lAyqra/nDrPF+SxQUl1Vx98trWbzzMF89fxD3XzmC+NhIamsd+4uP84v3ttKvRxfCDP65No+lmUeYc1t6fUfz8l1HGNmne4tt1pHhYVw6MplNe4u5MC2Riam96od3tta0tCT+9GkWJeVV/G3ZbsLMeOar53LTnz/j5ZU5/ODKEfVlt+wvZmTfuGaXZzCzEy4rUWdIAB/20RHhbToPQTqOjrAfw9vAy865CjP7T+BvwKUNC5nZbGA2QEpK486wUCivquGz7CN8ut2z3k1khBEZHsbE1F5cMbq33xj3Okcrqvnnmjx6xEY2OeTRV+GxSu5+cS19e8Tw7G0T69djr3phNc8t2cXXpg6ub691zrF1fyldo8MZ0DM2oHVbnHP89K0MzOBXN5zN/I37+fn8rSzNPFw/VPJkHT5awfxN++keE0mf+BgGJcTSN/7zb7R/XpTF0YpqXvvWZJ5emMXvP9rJtOFJ9UtBNOXpRVmUVlTz2Iyz2FtUzopdR3jk7S2syyni1zeczbqcQn7y5mYuGp7ElKEJ/PK9bfSIjeRnXxxDZU0tyzKP8Og7W8grLONXXxrLTJ/O1LAw4/GbxrG38DO+89JaAJLjoukWE8Gj72Qw/3sXUuu86wRNDOz37umvtvzBezKmDU/iiU8y+SDjIK+szuXqsX1JT+3FZSOTeXllLt+9NI2oiDCcc2zZV8K15zS/SqlIIIKdGPYCvo2rA7zH6jnnfHvQngV+09SJnHPPAM8ApKenh7SntLyqhh+9sYkFGQcoq6yhS2Q48V0iqaqppayyhr8s3U1SXDQzJw5kwiDPYlthZny87SBzV+ZS6t3o/fDRSr+NWnzV1jrue3U9h49W8vpdk/026bjn8jT+/YeDPL9kF/de4ak1PLUwq36BtuiIMIb3juPBa0bVbyTSkHOOdzft56Nth/jxNaPo36MLX508iL8u282v3tvGW9+ZGlCHoq/cgjJufW5Fo83Lv3fpMO65fDgFxyr5y9LdXHt2P0b26c4jM8awanch35+7jne/d2GTnZyHSsv5y9JdXHdOP77qnUnrnONPn3rud+eho+wrOs7gxK788ZbxdI+JpLCsiqc/zWLHwVK27i/laEU1id2ieemb5zfaIQw8s0rn3J7Os4t3MWVoAhcMS+SDLQe568W1vLo6jxF9ulFeVdvszzLYxqf0oFt0BD97dwul5dV8bWoqALeeP4gPt67i/YwDXD2mDw+/nUFJeTXnNtNEJBKoYCeGVUCamQ3GkxBmArf4FjCzvs65/d6n1wFbgxXMsYpqSsur/faKbaiuqSa/pJzDxypxDr5/eZrfRt6/eX87/1q3l1mTUvjCWb05f0hC/YiTmlrHpzsO8eLyHJ74JBPfPs3wMOOasX25fUoqzy7O5rF3tlBTW8vsaUM5WlHNkp2HWZtTSHb+UTIPHWX3kTIem3EWZzcY335Wv3i+cFZvnl+6i69fMJglOw/z2wWelSOnDU9k58GjfLD1IF/7yyqev2Mik70bkmw7UMJDb2aQmX+U4uNV1NR6Rq/UdTDGRIZz3xXD+cFrG3hxxR6+PDEl4BmdmYdKufXZlZRVVvPyN88nKS6agyXlvLF2L3/4OJP1ecX0joumorqGe70zXuO7RHrGz89Zzh8+3tnkRKEnPs6kusZxr0+zmZnx7YuHMapPd743dx2R4WE8f8fE+j6XH04fQWl5FQsyDnLN2L5MH9OHKcMSTjgBKzkuhv/ns8Ln9DF9SB/Uk//9YAc3p3vG3wd7PZzmRIaHMWVoAv/ecpBxA3swwfvBPy0tiUEJsTy3ZBevrc5l8c7DfOuioQHNKRA5EQv2MEUzuxr4HRAOPO+c+7mZPQqsds7NM7Nf4kkI1UABcJdzbtuJzpmenu5Wr1590rE89NZm3lq/j59fP4Zrz/avbpdVVvP8kl08/Wk2RyuqMYNesVGUVlQzNKkbr31rMt2iI1iefYRZc5bz1fMH8eiME4/HPlBczv7i41TVOCqraxmW3K0+KVXV1HLvK+t5Z+N+zhnYgy37iqmqcURFhDE4oStDkrpy/pAEbps8qMlZm1v2lXD1HxZz9dg+fLT1EGP6x/PiN86rT1D5pRXcMmc5eYXHee6OdLbsK+E372+ne5dIvnBWb3rGRtEjNpL/OKefX9KrqXXMeHIJm/eW0MW7L0ByXAz7i4+zv7ic+C6R3HNZGhePSMLMqK6pZUHGQX785iYiwsP4+52TGNnn8xExzjleXpnLw/MyqKyp5aZzB/Dbm87xu5f7XvX8HD7+wUUM6Pn5zODcgjIufXwhN6UP5BfXj23yZ3yopJwa5/yaq9rKupxCrn9qGeFhRlpyN97//rQ2v0agXlyxhwf/tZnfz/SfdT1nUTY/n7+ViDDj59ePabQXsogvM1vjnEtvsdzpOH69tYkhO/8o9766gQ25RVw/vj8/nD6SrftLWLzzMG9v3Ed+aQVXju7NfVcOJy3Z04G3cPsh7vzbaqYOS+SPM8dz7ROLCTPjvXsuPOUp/dU1tTzy9hZW7S5g2vAkLh2ZTPqgngEvQfytv6/h/YwDDOzVhTe/PZWEBjM865JD3eYlV4zuza++NLZRuYbqai/Ls4+wPPsIRWVV9O0RQ7/4LmzaW0xOQRmTBvdi6tBEXlmVw77icoYkdeX52yc2O11/Q24Rzy7Zxf+7emSjD/F9Rce55H8WctWYPvxu5vj64/e+sp53N+1n0X9dcsJaXjB99+V1vL1hH7dPHsQjLXwRCKbyqhre33yA/zinn3+n+vEqfvLmZmZOHNjirmMiSgzNqK6p5YlPMvnjx5n1s0mjIjxV9e9eOqzJjcTnrszhgTc20bt7NIdKK3jtPyeT3kRbdXvLyj/KY+9s4cGrR9WvLNlQfmkFP3lzM5eMTOLm9IGnvGZMZXUtc1fl8IePdnL4aCVThiZwx5RULhvV+5Q2KvnN+9t4amEW8+6eyph+8fx8/laeW7KLuy4eyg+njzylmE9FbkEZs+Ys57c3nlPfJCdyulJiaMGG3CI+3ZHPhJSepKf2bHZWap3H/72dP36cyexpQ/zaojursspqisqq/MbQn4rS8iou/u1ChiZ3I6lbNO9u2s8dU1L5ybWjg7Yzlkhno8TQxpxzrM0p5JwBPdpktylp7IXPdvOQd4/gH18zijsvGByUVTFFOqtAE0NHmMdwWjCzJpuZpO3MmpRIkjPKAAASyklEQVTC1v2lTEtL5KoTbAwvIsGlxCAdRmR4GL/8UtOjj0Sk/ahNRERE/CgxiIiIHyUGERHxo8QgIiJ+lBhERMSPEoOIiPhRYhARET9KDCIi4keJQURE/CgxiIiIHyUGERHxo8QgIiJ+lBhERMSPEoOIiPhRYhARET9KDCIi4ifoicHMppvZdjPLNLMHTlDuBjNzZtbitnMiIhI8QU0MZhYOPAlcBYwGZpnZ6CbKxQH3ACuCGY+IiLSsxcRgZuFm9kkrzz8JyHTOZTvnKoG5wIwmyj0G/Boob+V1RESkjbSYGJxzNUCtmcW34vz9gVyf53neY/XMbAIw0Dn3bivOLyIibSwiwHJHgU1m9gFwrO6gc+57p3JxMwsD/he4I4Cys4HZACkpKadyWREROYFAE8Mb3j8nay8w0Of5AO+xOnHAGGChmQH0AeaZ2XXOudW+J3LOPQM8A5Cenu5aEYuIiAQgoMTgnPubmUUBw72HtjvnqgJ46yogzcwG40kIM4FbfM5bDCTWPTezhcD9DZOCiIi0n4ASg5ldDPwN2A0YMNDMbnfOLTrR+5xz1WZ2N7AACAeed85lmNmjwGrn3LxTCV5ERNpeoE1JjwNXOue2A5jZcOBl4NyW3uicmw/Mb3DsoWbKXhxgPCIiEiSBzmOIrEsKAM65HUBkcEISEZFQCrTGsNrMngX+4X3+FUD9ACIiZ6BAE8NdwHeAuuGpi4GnghKRiIiEVIuJwbusxfPOua/gmXMgIiJnsEBnPg/yDlcVEZEzXKBNSdnAUjObh//MZ9UgRETOMIEmhizvnzA8s5VFROQMFWgfQ5xz7v52iEdEREIs0D6Gqe0Qi4iIdACBNiWt9/YvvIZ/H0NrFtYTEZEOLNDEEAMcAS71OeZo3YqrIiLSgQW6uurXgh2IiIh0DAGtlWRmw83sIzPb7H1+tpn9OLihiYhIKAS6iN4c4EdAFYBzbiOevRVEROQME2hiiHXOrWxwrLqtgxERkdALNDEcNrOheDqcMbMbgf1Bi0pEREIm0FFJ38Gz3/JIM9sL7MKz9LaIiJxhAh2VlA1cbmZdgTDnXKnv695tPv8WjABFRKR9BdqUBIBz7ljDpOB1TxvFIyIiIXZSieEErI3OIyIiIdZWicG10XlERCTEVGMQERE/bZUYljb3gplNN7PtZpZpZg808fq3zGyTma03syVmNrqNYhIRkVYIaFSSmUUDNwCpvu9xzj3q/fvuZt4XDjwJXAHkAavMbJ5zbotPsZecc097y1+HZ1/p6Sd9JyIi0iYCncfwFlAMrAEqTuL8k4BM73BXzGwuMAOoTwzOuRKf8l1Rf4WISEgFmhgGOOda8y2+P5Dr8zwPOK9hITP7DnAfEIX/0t6+ZWYDswFSUlJaEYqIiAQi0D6GZWY2NlhBOOeedM4NBX4INLlqq3PuGedcunMuPSkpKVihiIh0eoHWGC4A7jCzXXiakgxwzrmzW3jfXmCgz/MB3mPNmQv8KcCYREQkCAJNDFe18vyrgDQzG4wnIcwEbvEtYGZpzrmd3qfXADsREZGQOWFiMLPu3s7hppbBaJFzrtrM7gYWAOHA8865DDN7FFjtnJsH3G1ml+PZ66EQuL011xIRkbbRUo3hJeBaPKORHP4T2RwwpKULOOfmA/MbHHvI57HWWRIR6UBOmBicc9d6/x7cPuGIiEioBdrHgJn1BNKAmLpjzrlFwQhKRERCJ9CZz9/As7T2AGA9cD7wGc3MORARkdNXoPMY7gEmAnucc5cA44GioEUlIiIhE2hiKHfOlYNn3STn3DZgRPDCEhGRUAm0jyHPzHoAbwIfmFkhsCd4YYmISKgEuufz9d6HD5vZJ0A88H7QohIRkZBpMTF4l87OcM6NBHDOfRr0qEREJGRa7GNwztUA281MS5qKiHQCgfYx9AQyzGwlcKzuoHPuuqBEJSIiIRNoYojBszRGHQN+3fbhiIhIqAWaGCIa9i2YWZcgxCMiIiHW0uqqdwHfBoaY2Uafl+KApcEMTEREQiOQ1VXfA34JPOBzvNQ5VxC0qEREJGRaWl21GCgGZrVPOCIiEmqBLokhIiKdhBKDiIj4UWIQERE/SgwiIuJHiUFERPwoMYiIiJ+gJwYzm25m280s08weaOL1+8xsi5ltNLOPzGxQsGMSEZHmBTUxeJfsfhK4ChgNzDKz0Q2KrQPSnXNnA68DvwlmTCIicmLBrjFMAjKdc9nOuUpgLjDDt4Bz7hPnXJn36XJgQJBjEhGREwh2YugP5Po8z/Mea86deJbgEBGREAl0ddWgM7NbgXTgomZenw3MBkhJ0Z5BIiLBEuwaw15goM/zAd5jfszscuBB4DrnXEVTJ3LOPeOcS3fOpSclJQUlWBERCX5iWAWkmdlgM4sCZgLzfAuY2Xjgz3iSwqEgxyMiIi0IamJwzlUDdwMLgK3Aq865DDN71MzqtgX9LdANeM3M1pvZvGZOJyIi7SDofQzOufnA/AbHHvJ5fHmwYxARkcBp5rOIiPhRYhARET9KDCIi4keJQURE/CgxiIiIHyUGERHxo8QgIiJ+lBhERMSPEoOIiPhRYhARET9KDCIi4keJQURE/CgxiIiIHyUGERHxo8QgIiJ+lBhERMSPEoOIiPhRYhARET9KDCIi4keJQURE/CgxiIiIn6AnBjObbmbbzSzTzB5o4vVpZrbWzKrN7MZgxyMiIicW1MRgZuHAk8BVwGhglpmNblAsB7gDeCmYsYiISGAignz+SUCmcy4bwMzmAjOALXUFnHO7va/VBjkWEREJQLCbkvoDuT7P87zHRESkgzptOp/NbLaZrTaz1fn5+aEOR0TkjBXsxLAXGOjzfID32Elzzj3jnEt3zqUnJSW1KpjjmzZR9PrrrXqviEhnEezEsApIM7PBZhYFzATmBfmazSp5/30OPPIorqoqVCGIiHR4QU0Mzrlq4G5gAbAVeNU5l2Fmj5rZdQBmNtHM8oCbgD+bWUaw4okZOQpXVUVF9q5gXUJE5LQX7FFJOOfmA/MbHHvI5/EqPE1MQRczehQAFdu2EjNieHtcUkTktHPadD63hajUVCwmhvKt20IdiohIh9WpEoOFhxM9fDjl25QYRESa06kSA0DMyJFUbN2Kcy7UoYiIdEidLzGMGklNcTHV+/eHOhQRkQ6p0yWG6JEjAdScJCLSjE6XGGKGDwczyrduDXUoIiIdUqdLDGFduxI1aBAVqjGIiDSp0yUGgOhRIzVkVUSkGZ0yMcSMHEVVXh41JSWhDkVEpMPpnImhbgb09u0hjkREpOPpnImhbmSSmpNERBrplIkhIimJ8MREDVkVEWlCp0wM4Kk1aMiqiEhjnTcxjBpJRWYmNaWloQ5FRKRD6bSJofvVV0NVFUWvvBLqUEREOpROmxhiRo2i65QpFPztBWorK0MdjohIh9FpEwNArzu/TnV+PiVvvx3qUEREOoxOnRi6TplC9KhRHHnueVxtbajDERHpEDp1YjAzEu68k8rsbI4uXBjqcEREOoROnRgAuk//ApH9+nHk2ee0eY+ICEoMWEQEve78OsfXruXAw4/gqqtDHZKISEgFPTGY2XQz225mmWb2QBOvR5vZK97XV5hZarBjaqjnrFkkfPObFL3yCnl3f5fasrL2DkFEpMMIamIws3DgSeAqYDQwy8xGNyh2J1DonBsG/B/w62DG1BQLCyP5B/fR56cPcXTRIvbc+lWK33lXk99EpFOKCPL5JwGZzrlsADObC8wAtviUmQE87H38OvCEmZkLQYN/z1mziOjdm/0//Sn77r8fIiPpOmkSMWPHEJ2WRvSwYUQkJBDevTsWFdXe4YmItItgJ4b+QK7P8zzgvObKOOeqzawYSAAOBzm2JsVdeindLrqI4xs2UPrBhxxbspgjc5ZDTY1fOYuJISw6GouO9iSJ8DAMg7AwMAtF6B1XUz+Pts77ddc40wcQnMzvVjB/Fi3F0ZbXbu5aJ7rGyb7nZH9HT+X/eKA/m2au0efHD9J18uTWXz8AwU4MbcbMZgOzAVJSUoJ7rfBwYidMIHbCBPjhf1NbWUnlrl1UZmVRXVREbUkpNSUluIoKXGUFrrISV1MLtbU4dxrNh3BAW+Ww5s51ov8DzZU/2ZgaXuNkzhvI9dry53Qq523NZ+2p/Ixb82/a0rXb+lon8x7nPB+2Dd/Tmt/RUxXI71wzwrp2bYMATizYiWEvMNDn+QDvsabK5JlZBBAPHGl4IufcM8AzAOnp6e36tTAsKoqYESOIGTGiPS8rIhISwR6VtApIM7PBZhYFzATmNSgzD7jd+/hG4ONQ9C+IiIhHUGsM3j6Du4EFQDjwvHMuw8weBVY75+YBzwF/N7NMoABP8hARkRAJeh+Dc24+ML/BsYd8HpcDNwU7DhERCUynn/ksIiL+lBhERMSPEoOIiPhRYhARET9KDCIi4sdOxykDZpYP7DmJtyQSoiU2Qqwz3ndnvGfonPfdGe8ZTu2+BznnkloqdFomhpNlZqudc+mhjqO9dcb77oz3DJ3zvjvjPUP73LeakkRExI8Sg4iI+OksieGZUAcQIp3xvjvjPUPnvO/OeM/QDvfdKfoYREQkcJ2lxiAiIgE6oxKDmU03s+1mlmlmDzTxerSZveJ9fYWZpbZ/lG0rgHu+z8y2mNlGM/vIzAaFIs621tJ9+5S7wcycmZ32o1cCuWczu9n7751hZi+1d4zBEMDveIqZfWJm67y/51eHIs62ZGbPm9khM9vczOtmZn/w/kw2mtmENg3AOXdG/MGzrHcWMASIAjYAoxuU+TbwtPfxTOCVUMfdDvd8CRDrfXzX6X7Pgd63t1wcsAhYDqSHOu52+LdOA9YBPb3Pk0Mddzvd9zPAXd7Ho4HdoY67De57GjAB2NzM61cD7+HZC+58YEVbXv9MqjFMAjKdc9nOuUpgLjCjQZkZwN+8j18HLjM7rTdobvGenXOfOOfKvE+X49lF73QXyL81wGPAr4Hy9gwuSAK5528CTzrnCgGcc4faOcZgCOS+HdDd+zge2NeO8QWFc24Rnv1pmjMDeMF5LAd6mFnftrr+mZQY+gO5Ps/zvMeaLOOcqwaKgYR2iS44ArlnX3fi+ZZxumvxvr1V64HOuXfbM7AgCuTfejgw3MyWmtlyM5vebtEFTyD3/TBwq5nl4dn75bvtE1pInez//ZMS9I16pGMws1uBdOCiUMcSbGYWBvwvcEeIQ2lvEXiaky7GUzNcZGZjnXNFIY0q+GYBf3XOPW5mk/HsCDnGOVcb6sBOV2dSjWEvMNDn+QDvsSbLmFkEnmrnkXaJLjgCuWfM7HLgQeA651xFO8UWTC3ddxwwBlhoZrvxtMHOO807oAP5t84D5jnnqpxzu4AdeBLF6SyQ+74TeBXAOfcZEINnPaEzWUD/91vrTEoMq4A0MxtsZlF4OpfnNSgzD7jd+/hG4GPn7ck5TbV4z2Y2HvgznqRwJrQ5Qwv37Zwrds4lOudSnXOpePpWrnPOrQ5NuG0ikN/vN/HUFjCzRDxNS9ntGWQQBHLfOcBlAGY2Ck9iyG/XKNvfPOA27+ik84Fi59z+tjr5GdOU5JyrNrO7gQV4RjI875zLMLNHgdXOuXnAc3iqmZl4OnZmhi7iUxfgPf8W6Aa85u1nz3HOXReyoNtAgPd9RgnwnhcAV5rZFqAG+C/n3OlcIw70vn8AzDGze/F0RN9xmn/hw8xexpPkE719Jz8FIgGcc0/j6Uu5GsgEyoCvten1T/Ofn4iItLEzqSlJRETagBKDiIj4UWIQERE/SgwiIuJHiUFERPwoMYh4mdnRNjrPw2Z2fwDl/mpmN7bFNUXakhKDiIj4UWIQacDMunn3rlhrZpvMbIb3eKqZbfN+099hZi+a2eXeRet2mtkkn9OcY2afeY9/0/t+M7MnvHsLfAgk+1zzITNbZWabzeyZ03zVXznNKTGINFYOXO+cm4BnP4vHfT6ohwGPAyO9f24BLgDuB/6fzznOBi4FJgMPmVk/4HpgBJ49A24DpviUf8I5N9E5NwboAlwbpHsTadEZsySGSBsy4BdmNg2oxbOccW/va7ucc5sAzCwD+Mg558xsE5Dqc463nHPHgeNm9gmefQWmAS8752qAfWb2sU/5S8zsv4FYoBeQAbwdtDsUOQElBpHGvgIkAec656q8K7TGeF/zXZ221ud5Lf7/nxquNdPs2jNmFgM8hWeXuVwze9jneiLtTk1JIo3FA4e8SeESoDX7ZM8wsxgzS8CzGNoqPNuMftnMwr27bV3iLVuXBA6bWTc8K/+KhIxqDCKNvQi87W0eWg1sa8U5NgKf4NkX4DHn3D4z+xeefocteJaK/gzAOVdkZnOAzcABPElEJGS0uqqIiPhRU5KIiPhRYhARET9KDCIi4keJQURE/CgxiIiIHyUGERHxo8QgIiJ+lBhERMTP/wc4Ac9AtlyamQAAAABJRU5ErkJggg==\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "line_df = pd.DataFrame({'test_error': test_errors,\n", " 'train_error': train_errors,\n", " 'lambda' : lambdas})\n", "sns.lineplot(x='lambda', y='test_error', data=line_df, color='tab:blue')\n", "sns.lineplot(x='lambda', y='train_error', data=line_df, color='tab:red')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(e) Compare the test MSE of boosting to the test MSE that results from applying two of the regression approaches seen in Chapters 3 and 6." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5973063683221087" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_x = train.drop('Salary', axis=1)\n", "train_y = train.Salary\n", "reg = linear_model.Ridge(alpha=10000).fit(train_x, train_y)\n", "test_x = test.drop('Salary', axis=1)\n", "test_y = test.Salary\n", "preds = reg.predict(test_x)\n", "np.sqrt(metrics.mean_squared_error(test_y, preds))" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5953299537143887" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_x = train.drop('Salary', axis=1)\n", "train_y = train.Salary\n", "reg = linear_model.Lasso(alpha=0.5).fit(train_x, train_y)\n", "test_x = test.drop('Salary', axis=1)\n", "test_y = test.Salary\n", "preds = reg.predict(test_x)\n", "np.sqrt(metrics.mean_squared_error(test_y, preds))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(f) Which variables appear to be the most important predictors in the boosted model?" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1173d5518>" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4gAAAHICAYAAAAMd8SLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xm4JHV97/H3xxlBNIgRxg3QQUANSyQy4hJM3FC8LqBCHCSKBkWjuHvvxXsTXKJJyNUYd4OCIi4gKDoqigu4oSIDAsOgmIGgDEEdliCogOD3/lG/M9PTnD6nZ+bUOQzzfj3PeU539a+qv1VdVV2f2jpVhSRJkiRJd5rrAiRJkiRJtw8GREmSJEkSYECUJEmSJDUGREmSJEkSYECUJEmSJDUGREmSJEkSYECUJEmSJDUGREmSJEkSYECUJEmSJDXz57qA2bDNNtvUwoUL57oMSZIkSZoT55xzzlVVtWC6dptEQFy4cCFLly6d6zIkSZIkaU4k+dk47TzFVJIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSY0BUZIkSZIEGBAlSZIkSc38uS5gLqz6wMfnuoTVFvztX891CZIkSZIEeARRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktT0GhCT7Jvk4iQrkhwxyeubJzmxvX5WkoWt+15Jzmt/5yd55kA/lyVZ1l5b2mf9kiRJkrQpmd/XgJPMA94H7AOsBM5OsqSqLhpodihwbVXtlGQxcBTwHOBCYFFV3ZLkvsD5Sb5QVbe0/h5XVVf1VbskSZIkbYr6PIK4F7Ciqi6tqpuBE4D9htrsBxzXHp8MPCFJquq3A2HwLkD1WKckSZIkiX4D4rbA5QPPV7Zuk7ZpgfA6YGuAJI9IshxYBrx0IDAW8NUk5yQ5rMf6JUmSJGmT0tspphuqqs4Cdk3yJ8BxSb5cVTcCe1fVFUnuBXwtyU+q6tvD/bfweBjA/e9//1mtXZIkSZI2Rn0eQbwC2H7g+Xat26RtkswHtgKuHmxQVT8GbgB2a8+vaP9/BZxCdyrrbVTV0VW1qKoWLViwYINHRpIkSZLu6PoMiGcDOyfZIclmwGJgyVCbJcAh7fEBwOlVVa2f+QBJHgA8BLgsyd2SbNm63w14Et0NbSRJkiRJG6i3U0zbHUgPB04D5gHHVtXyJG8BllbVEuAY4PgkK4Br6EIkwN7AEUl+D/wBeFlVXZXkgcApSSZq/2RVfaWvcZAkSZKkTUmv1yBW1anAqUPdjhx4fCNw4CT9HQ8cP0n3S4GHznylkiRJkqQ+TzGVJEmSJG1EDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElqDIiSJEmSJMCAKEmSJElq5s91AZreLz7w5rkuYbX7/O0b57oESZIkST3xCKIkSZIkCTAgSpIkSZIaA6IkSZIkCTAgSpIkSZIaA6IkSZIkCTAgSpIkSZIaA6IkSZIkCfB3ENWD5e9/xlyXsNquL1sy1yVIkiRJG41ejyAm2TfJxUlWJDliktc3T3Jie/2sJAtb972SnNf+zk/yzHGHKUmSJElaP70FxCTzgPcBTwF2AQ5KsstQs0OBa6tqJ+CdwFGt+4XAoqraA9gX+Pck88ccpiRJkiRpPfR5BHEvYEVVXVpVNwMnAPsNtdkPOK49Phl4QpJU1W+r6pbW/S5ArcMwJUmSJEnroc+AuC1w+cDzla3bpG1aILwO2BogySOSLAeWAS9tr48zTFr/hyVZmmTpqlWrZmB0JEmSJOmO7XZ7F9OqOquqdgUeDrwhyV3Wsf+jq2pRVS1asGBBP0VKkiRJ0h1InwHxCmD7gefbtW6TtkkyH9gKuHqwQVX9GLgB2G3MYUqSJEmS1kOfAfFsYOckOyTZDFgMDP/mwBLgkPb4AOD0qqrWz3yAJA8AHgJcNuYwJUmSJEnrobffQayqW5IcDpwGzAOOrarlSd4CLK2qJcAxwPFJVgDX0AU+gL2BI5L8HvgD8LKqugpgsmH2NQ6SJEmStCnpLSACVNWpwKlD3Y4ceHwjcOAk/R0PHD/uMCVJkiRJG+52e5MaSZIkSdLsMiBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgCYP9cFSHPtGx9+6lyXsJYnvOhLc12CJEmSNlEeQZQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVLjTWqkjcwJH3nyXJewlsUvPG2uS5AkSdIM8QiiJEmSJAnwCKKknr3nE7evI56vONgjnpIkSaN4BFGSJEmSBBgQJUmSJEmNAVGSJEmSBBgQJUmSJEmNAVGSJEmSBPQcEJPsm+TiJCuSHDHJ65snObG9flaSha37PknOSbKs/X/8QD/fbMM8r/3dq89xkCRJkqRNRW8/c5FkHvA+YB9gJXB2kiVVddFAs0OBa6tqpySLgaOA5wBXAU+vqv9KshtwGrDtQH8HV9XSvmqXJEmSpE1Rn7+DuBewoqouBUhyArAfMBgQ9wPe1B6fDLw3SarqRwNtlgNbJNm8qm7qsV5J4vUn7zvXJazl7Qd8Za5LkCRJm5A+TzHdFrh84PlK1j4KuFabqroFuA7YeqjNs4Fzh8LhR9rppX+fJDNbtiRJkiRtmm7XN6lJsivdaacvGeh8cFXtDjym/T1vRL+HJVmaZOmqVav6L1aSJEmSNnJ9BsQrgO0Hnm/Xuk3aJsl8YCvg6vZ8O+AU4PlVdclED1V1Rft/PfBJulNZb6Oqjq6qRVW1aMGCBTMyQpIkSZJ0R9ZnQDwb2DnJDkk2AxYDS4baLAEOaY8PAE6vqkpyD+BLwBFVdeZE4yTzk2zTHt8ZeBpwYY/jIEmSJEmbjN4CYrum8HC6O5D+GPh0VS1P8pYkz2jNjgG2TrICeC0w8VMYhwM7AUcO/ZzF5sBpSS4AzqM7AvmhvsZBkiRJkjYlfd7FlKo6FTh1qNuRA49vBA6cpL+3Am8dMdg9Z7JGSdqYPeVzr5jrEtby5f3fM9clSJKkDXC7vkmNJEmSJGn2GBAlSZIkSYABUZIkSZLUGBAlSZIkSYABUZIkSZLUGBAlSZIkSYABUZIkSZLUGBAlSZIkSYABUZIkSZLUzJ/rAiRJm5anfvYdc13Cal961uvmugRJkm5XPIIoSZIkSQIMiJIkSZKkxoAoSZIkSQIMiJIkSZKkxoAoSZIkSQIMiJIkSZKkxoAoSZIkSQIMiJIkSZKkxoAoSZIkSQIMiJIkSZKkxoAoSZIkSQIMiJIkSZKkxoAoSZIkSQIMiJIkSZKkZv5cFyBJ0u3Z0z5z7FyXsNoXn/03c12CJOkOziOIkiRJkiTAI4iSJN2hPO2kk+e6hNW+eOABc12CJGkdeQRRkiRJkgSsQ0BM8oAkT2yPt0iyZX9lSZIkSZJm21gBMcmLgZOBf2+dtgM+11dRkiRJkqTZN+4RxJcDfw78GqCq/gO4V19FSZIkSZJm37gB8aaqunniSZL5QPVTkiRJkiRpLowbEL+V5P8AWyTZBzgJ+EJ/ZUmSJEmSZtu4AfEIYBWwDHgJcCrwd30VJUmSJEmafeP+DuIWwLFV9SGAJPNat9/2VZgkSZIkaXaNewTxG3SBcMIWwNdnvhxJkiRJ0lwZNyDepapumHjSHt+1n5IkSZIkSXNh3ID4myQPm3iSZE/gd/2UJEmSJEmaC+Neg/hq4KQk/wUEuA/wnN6qkiRJkiTNurECYlWdneQhwINbp4ur6vf9lSVJkiRJmm3jHkEEeDiwsPXzsCRU1cd6qUqSJEmSNOvGCohJjgd2BM4Dbm2dCzAgSpIkSdIdxLhHEBcBu1RV9VmMJEmSJGnujHsX0wvpbkwjSZIkSbqDGjcgbgNclOS0JEsm/qbrKcm+SS5OsiLJEZO8vnmSE9vrZyVZ2Lrvk+ScJMva/8cP9LNn674iybuTZMxxkCRJkiRNYdxTTN+0rgNOMg94H7APsBI4O8mSqrpooNmhwLVVtVOSxcBRdD+fcRXw9Kr6ryS7AacB27Z+PgC8GDgLOBXYF/jyutYnSZIkSVrbuD9z8a31GPZewIqquhQgyQnAfsBgQNyPNeHzZOC9SVJVPxposxzYIsnmwD2Bu1fVD9owPwbsjwFRkiRJkjbYWKeYJnlkkrOT3JDk5iS3Jvn1NL1tC1w+8Hwla44C3qZNVd0CXAdsPdTm2cC5VXVTa79ymmFO1HxYkqVJlq5atWqaUiVJkiRJ455i+l5gMXAS3R1Nnw88qK+iJiTZle600yeta79VdTRwNMCiRYu8+6okSbdD+5/89bkuYbXPHfDEuS5BkubcuDepoapWAPOq6taq+gjdtX9TuQLYfuD5dq3bpG2SzAe2Aq5uz7cDTgGeX1WXDLTfbpphSpIkSZLWw7hHEH+bZDPgvCT/AlzJ9OHybGDnJDvQhbjFwHOH2iwBDgG+DxwAnF5VleQewJeAI6rqzInGVXVlkl8neSTdTWqeD7xnzHGQJEnaIAd+5oK5LmG1k579p3NdgqQ7oHGPID6vtT0c+A3dUb9nTdVDu6bwcLo7kP4Y+HRVLU/yliTPaM2OAbZOsgJ4LTDxUxiHAzsBRyY5r/3dq732MuDDwArgErxBjSRJkiTNiHGPIO5fVe8CbgTeDJDkVcC7puqpqk6l+ymKwW5HDjy+EThwkv7eCrx1xDCXAruNWbckSZIkaUzjHkE8ZJJuL5jBOiRJkiRJc2zKI4hJDqK7bvCBSZYMvLQlcE2fhUmSJEmSZtd0p5h+j+6GNNsA7xjofj1w+7lKW5IkSZK0waYMiFX1syQrgRur6luzVJMkSZIkaQ5Me5Oaqro1yR+SbFVV181GUZIkSdpwbzvlyrkuYbX/+8z7znUJksYw7l1MbwCWJfka3c9cAFBVr+ylKkmSJEnSrBs3IH62/UmSJEmS7qDGCohVdVySzYAHtU4XV9Xv+ytLkiRJkjTbxgqISR4LHAdcBgTYPskhVfXt/kqTJEmSJM2mcU8xfQfwpKq6GCDJg4BPAXv2VZgkSZIkaXbdacx2d54IhwBV9VPgzv2UJEmSJEmaC+MeQVya5MPAx9vzg4Gl/ZQkSZIkSZoL4wbEvwVeDkz8rMV3gPf3UpEkSZIkaU6MexfTm5K8F/gG8Ae6u5je3GtlkiRJkqRZNe5dTJ8KfBC4hO4upjskeUlVfbnP4iRJkiRJs2dd7mL6uKpaAZBkR+BLgAFRkiRJku4gxr2L6fUT4bC5FLi+h3okSZIkSXNkXe5ieirwaaCAA4GzkzwLoKo+21N9kiRJkqRZMm5AvAvwS+Av2/NVwBbA0+kCowFRkiRJkjZy497F9IV9FyJJkiRJmlvj3sV0B+AVwMLBfqrqGf2UJUmSJEmabeOeYvo54BjgC3S/gyhJkiRJuoMZNyDeWFXv7rUSSZIkSdKcGjcgvivJG4GvAjdNdKyqc3upSpIkSZI068YNiLsDzwMez5pTTKs9lyRJkiTdAYwbEA8EHlhVN/dZjCRJkiRp7txpzHYXAvfosxBJkiRJ0twa9wjiPYCfJDmbta9B9GcuJEmSJOkOYtyA+MZeq5AkSZIkzbmxAmJVfavvQiRJkiRJc2vKgJjku1W1d5Lr6e5auvoloKrq7r1WJ0mSJEmaNVMGxKrau/3fcnbKkSRJkiTNlXHvYipJkiRJuoMzIEqSJEmSAAOiJEmSJKkxIEqSJEmSgPF/B1GSJEnq1edOumquS1ht/wO3mesSpDnhEURJkiRJEmBAlCRJkiQ1BkRJkiRJEmBAlCRJkiQ1BkRJkiRJEmBAlCRJkiQ1BkRJkiRJEtBzQEyyb5KLk6xIcsQkr2+e5MT2+llJFrbuWyc5I8kNSd471M832zDPa3/36nMcJEmSJGlTMb+vASeZB7wP2AdYCZydZElVXTTQ7FDg2qraKcli4CjgOcCNwN8Du7W/YQdX1dK+apckSZKkTVGfRxD3AlZU1aVVdTNwArDfUJv9gOPa45OBJyRJVf2mqr5LFxQlSZIkSbOgz4C4LXD5wPOVrdukbarqFuA6YOsxhv2Rdnrp3yfJZA2SHJZkaZKlq1atWvfqJUmSJGkTszHepObgqtodeEz7e95kjarq6KpaVFWLFixYMKsFSpIkSdLGqM+AeAWw/cDz7Vq3SdskmQ9sBVw91UCr6or2/3rgk3SnskqSJEmSNlCfAfFsYOckOyTZDFgMLBlqswQ4pD0+ADi9qmrUAJPMT7JNe3xn4GnAhTNeuSRJkiRtgnq7i2lV3ZLkcOA0YB5wbFUtT/IWYGlVLQGOAY5PsgK4hi5EApDkMuDuwGZJ9geeBPwMOK2Fw3nA14EP9TUOkiRJkrQp6S0gAlTVqcCpQ92OHHh8I3DgiH4XjhjsnjNVnyRJkiRpjY3xJjWSJEmSpB4YECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktTMn+sCJEmSpI3RDz/yq7kuYbW9XnivuS5BdxAeQZQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAQZESZIkSVJjQJQkSZIkAT0HxCT7Jrk4yYokR0zy+uZJTmyvn5VkYeu+dZIzktyQ5L1D/eyZZFnr591J0uc4SJIkSdKmoreAmGQe8D7gKcAuwEFJdhlqdihwbVXtBLwTOKp1vxH4e+D1kwz6A8CLgZ3b374zX70kSZIkbXr6PIK4F7Ciqi6tqpuBE4D9htrsBxzXHp8MPCFJquo3VfVduqC4WpL7Anevqh9UVQEfA/bvcRwkSZIkaZPRZ0DcFrh84PnK1m3SNlV1C3AdsPU0w1w5zTAlSZIkSevhDnuTmiSHJVmaZOmqVavmuhxJkiRJut3rMyBeAWw/8Hy71m3SNknmA1sBV08zzO2mGSYAVXV0VS2qqkULFixYx9IlSZIkadPTZ0A8G9g5yQ5JNgMWA0uG2iwBDmmPDwBOb9cWTqqqrgR+neSR7e6lzwc+P/OlS5IkSdKmZ35fA66qW5IcDpwGzAOOrarlSd4CLK2qJcAxwPFJVgDX0IVIAJJcBtwd2CzJ/sCTquoi4GXAR4EtgC+3P0mSJEnSBuotIAJU1anAqUPdjhx4fCNw4Ih+F47ovhTYbeaqlCRJkiTBHfgmNZIkSZKkdWNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBBkRJkiRJUmNAlCRJkiQBPQfEJPsmuTjJiiRHTPL65klObK+flWThwGtvaN0vTvLkge6XJVmW5LwkS/usX5IkSZI2JfP7GnCSecD7gH2AlcDZSZZU1UUDzQ4Frq2qnZIsBo4CnpNkF2AxsCtwP+DrSR5UVbe2/h5XVVf1VbskSZIkbYr6PIK4F7Ciqi6tqpuBE4D9htrsBxzXHp8MPCFJWvcTquqmqvpPYEUbniRJkiSpJ30GxG2Byweer2zdJm1TVbcA1wFbT9NvAV9Nck6Sw0a9eZLDkixNsnTVqlUbNCKSJEmStCnYGG9Ss3dVPQx4CvDyJH8xWaOqOrqqFlXVogULFsxuhZIkSZK0EeozIF4BbD/wfLvWbdI2SeYDWwFXT9VvVU38/xVwCp56KkmSJEkzos+AeDawc5IdkmxGd9OZJUNtlgCHtMcHAKdXVbXui9tdTncAdgZ+mORuSbYESHI34EnAhT2OgyRJkiRtMnq7i2lV3ZLkcOA0YB5wbFUtT/IWYGlVLQGOAY5PsgK4hi5E0tp9GrgIuAV4eVXdmuTewCndfWyYD3yyqr7S1zhIkiRJ0qakt4AIUFWnAqcOdTty4PGNwIEj+n0b8LahbpcCD535SiVJkiRJG+NNaiRJkiRJPTAgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAA6IkSZIkqTEgSpIkSZIAmD/XBUiSJEnq38q3/2KuS1htu9ffZ9o2v3znebNQyXju/Zo95rqEWeMRREmSJEkS4BFESZIkSdpgv3rP1+a6hNXu9Yp91rtfjyBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSpMSBKkiRJkgADoiRJkiSp6TUgJtk3ycVJViQ5YpLXN09yYnv9rCQLB157Q+t+cZInjztMSZIkSdL66S0gJpkHvA94CrALcFCSXYaaHQpcW1U7Ae8Ejmr97gIsBnYF9gXen2TemMOUJEmSJK2HPo8g7gWsqKpLq+pm4ARgv6E2+wHHtccnA09Iktb9hKq6qar+E1jRhjfOMCVJkiRJ66HPgLgtcPnA85Wt26RtquoW4Dpg6yn6HWeYkiRJkqT1kKrqZ8DJAcC+VfWi9vx5wCOq6vCBNhe2Nivb80uARwBvAn5QVR9v3Y8Bvtx6m3KYA8M+DDisPX0wcPGMjyRsA1zVw3D7Yr39st5+WW+/rLdf1tsv6+2X9fZvY6vZevvVV70PqKoF0zWa38MbT7gC2H7g+Xat22RtViaZD2wFXD1Nv9MNE4CqOho4en2LH0eSpVW1qM/3mEnW2y/r7Zf19st6+2W9/bLefllv/za2mq23X3Ndb5+nmJ4N7JxkhySb0d10ZslQmyXAIe3xAcDp1R3SXAIsbnc53QHYGfjhmMOUJEmSJK2H3o4gVtUtSQ4HTgPmAcdW1fIkbwGWVtUS4Bjg+CQrgGvoAh+t3aeBi4BbgJdX1a0Akw2zr3GQJEmSpE1Jn6eYUlWnAqcOdTty4PGNwIEj+n0b8LZxhjmHej2FtQfW2y/r7Zf19st6+2W9/bLefllv/za2mq23X3Nab283qZEkSZIkbVz6vAZRkiRJkrQRMSAOSLJ/kkrykPZ8YZLnDrz+2CTXJTkvyQVJvp7kXtMMc48k/6Pv2tt73dpquzDJF5Lco3VfmOR37bXzk3wvyYMHxumLPda01jRdx37fkuSJ0wx7lw2rcK3h3SfJCUkuSXJOklOTPKi99uokNybZaqD9Wp9tkhckWdWm8/IkJye56zTv+dgkj16HGgc/45PGGP4Lktxv4PlmSf4tyYok/5Hk80m2G+N91xrOuhoxbU9Jsv9Am4uT/N3A888kedYUw1zYfipnor73rm9962LUfDKwjF2U5GNJ7tzaj1xv9F33qHXC7V2SG4aer55OSV6a5PkD3dd7vlzP2t6Z5NUDz09L8uGB5+9I8top+r+h/e913TvivUfNuxcOtXtTkte3x6vXw209OOU6p8caJ12+5sJU8+c6DOOyJMvaOuFbSR4wwzWu82e9DsOe1fXKwPtN/B3R5/ttqHS+m+QpA90OTPKVHt+zkrxj4Pnrk7xpA4c56+uoEXXcMH2r24eBeXV5uu3t1yW5U3ttUZJ3T9P/6u+3Ea8/Y7bmfwPi2g4Cvtv+AywEnjvU5jtVtUdV/SndXVVfPs0w9wBmJSACv2u17UZ305/B2i5prz0UOA74P7NU0/A0HVtVHVlVX5+iyf7AjATEJAFOAb5ZVTtW1Z7AG4B7tyYH0X3eg4FzoLatAAAS/0lEQVRlss/2xDaddwVuBp4zzVs/Fhg7ILL2Z3wz8NJp2r8AGNyA/kdgS+DBVbUz8Dngs23812U4Y5ti2p5LG/ckWwO/AR410OujgO+tz3v2ZZr55JKq2gPYne4neP5qoNd1XW/MlKnWCRulqvpgVX2sPX0B6zlfboAzWTPf3onut6p2HXj90dzO5lsYax03qaH18KuB3gLiBixfG6vHtXXCN4G/m6bt2Nb3s14Hs71emXi/ib9/Hm6QZN7Q87HusTFuu3XR7sb/UuBfk9wlyR/Rffdu0HSaptabgGcl2WZD3kMbbGJe3RXYB3gK8EaAqlpaVa+cqueh77fJXl8y2fzfBwNi0xbgvYFDaXdTBf4ZeEzbG/Caofah29C+tj3fK8n3k/wo7Qhdup/ieAvwnDaM6cLCTPo+sO2I1+5Oq7tPk03TJPdN8u2BvY+PSTIvyUfb82UT07p1O6A9/ue25/iCJG9Pd9TtGcD/a8PaMckrB9qcsI7lPg74fVV9cKJDVZ1fVd9JsiPwR3Rf4Ae1eqb8bNuK/G6smT+enuSsNn98Pcm9kyyk+xJ5TRvGY9ax5u8AO2XgSFp7r9e3vcIHAIuAT7Th3w14IfCaibsCV9VH6L5YHr8Ow9li+POYps5Jpy3wDdaE40cDXwAWtL2vO9CtaH/R6vpOknPb35SBOslT27K4Tdtre2Hbk/ftMabpdEaNy+UDz2+l+1me2yx/w+uNWbZ6nTC8ZzjJe5O8oD2+LMmb27ReljVnVPxl1uzB/1GSLedgHFYf8ZiB+XJ9fY81OzJ2BS4Erk/yx0k2B/4EuCjJNwam4X7TjNPD2zTdscfpPO28O6K2jyY5IMkr6cL4GUnOGLXe7rvG4eUrQ0fvknwxyWPb4xuSvK0t/z9Icu/WfabXC6u1afKB9n6XtmXt2CQ/TvLREb0NLpuTrofb428mOSrJD5P8dIrvjPX9rL+Z7gj50lbvw5N8Nt3ZJm8do/Z1Xq9siDbMo5KcCxzY6v+3JEuBV7VpeXpbH3wjyf1bfx9N8sEkZwH/0scyV1UX0n2n/W/gSOBjVXVJkkPa53dekvdnzdGlo9t0X55k9c0ck6xs67UfAc9M8pqBddzHB97yFrqbmtxmOUyyIN0ZOWe3vz9v3ZcluUc6V2fNmRkfS7LPqHFLsme6o97npDuD4r6t+4vb8M9v73fX1n3HtjwsS/LWjDiLYmh+Wf0ewF0m3mOMcbrNtnjrftckn27T7pR022OL2ms3DAz3gInldNR7jKuqfgUcBhzepvFj062f7tTm3dVH3tsydu+sffbGbbZps/bZNFPN3+9u439p2nb0uur1LqYbmf2Ar1TVT9uCsidwBPD6qnoadDMzLTACE0c8Jo7E/QR4TPt5jycC/1hVz24L+qKqOny2RiTdnrQn0P2MyIQdW91b0u0BfsQslDLZNH0scFpVva3VeVe6I3Hbtr2RZOh0lXRHl54JPKSqKsk9quq/kywBvlhVJ7d2RwA7VNVNw8MYw27AOSNeWwycQBfIHpzk3lX1y+HPtq3YnpNkb+C+wE/pviCgO4r6yFb/i4D/VVWvS/JB4IaqWqeN2XQB9CnAyFNWqurkdD8L8/qqWprkT4GfV9Wvh5oupdvQvWTM4dzm85im3FHT9hxgt3Rh+9HAt4AH0m1g/xlrjsL8Ctinqm5MsjPwKbpgcBtJngm8FvgfVXVt+4yeXFVXrMc8sS7jMljDXeiWr1cNdB613pgVI9YJU7mqqh6W5GXA64EXtf8vr6oz0+38ubGfagHYok2vCfdk6DdvZ2C+XC9V9V9Jbmlfxo9mzQbyo4DrgGXAb4FnVtWv0+3R/0GSJTXJXeHS7fB4D7BfVf08yb/Rz3Seat7dcWh63wdYa51UVe9Od+rs46rqqrY+H7ne7qFG2vtMtnyNcjfgB1X1f5P8C/Bi4K10G+wbsl6Ybv78Y7r54Rmt+5/TLUNnJ9mjqgb7BdiX7myOccyvqr3SXd7wRmCyyzA25LO+uaoWJXkV8HlgT7qjhJckeWdVXT3RcIbWK+MYnt7/VFUntsdXV9XDWj0vBTar9uPiSb4AHFdVxyX5G+DddGceQXcU+tFVdWtr18cy92a6M2VuBhYl2Y1uHfXotq14NN32xSeBI6rqmvbdfkaSk6vqojacX1XVn7VxuhJ4QFXdPMm8+z7ggjavD3oX8M6q+m5bb51G9z17Jt28+TPgUuAxwMfo5t2/BR4+PELpTu2eWF+tSrdz/G3A3wCfraoPtXZvpTs48J72/u+qqk+1z2hKk7zHjQPvMd043WZbHHg28DLg2qrapX0Ow8vgZEa9x9iq6tK2nNxroNsfknyebl74SJJHAD9r25WDvU+3TfseRs/f96U7QPMQunXQyetSN3gEcdBBdCGA9n/UKZETp4ptD3wEmFgQtwJOSrfn752sfcrRbJlYif6C7lSSrw28NnGK6Y50pwnNxu1zJ5umZwMvTLdHdPequp5uxfTAJO9Jsi8wHGCuo1thH5PumrTfjni/C+iOJvw13d60GR2PqvoD8BlG/DRLc2I7Deo+dBuK/7N13w44LclEt/WdPyY+46XAzxn/i3kmjft5TKmqbgKWAw8DHgmcRbex/ej2d2ZremfgQ23ancTo04ofT7e39qlVNXGE7kzgo0leTPfbqX2a2PD6JXBlVV0w8Nqo9UbfplonTOWz7f85dKfaQzct/zXdkaR7VNVMLmPD1jqljG6DfjozMl+O6XusmU+/z23n2wD/mOQC4Ot0AXKy0/v+hG5d/PSq+nnrNpvTecIlQ9P7g9P2Mf16e6ZNtXyNcjMwcYRieF7ekPXCdPPnF9rOgGXAL6tqWfv+WD5QA3RB4Aq6nX2fGvO9J1s218V0n/VE0F0GLK+qK9u6+lJg+/baTK5XxjF8iumJA6+dONR28Pmj6MIXwPF0G8wTTmpHo6GnZa6qftPqOb5NwyfSha6lbfr9JbBja35QuiOh59KtFwa/5wbHaTnw8SQHA78fer9f0wW84dMYnwi8t73nEuDuLQh/B/iL9vcBYPck29IFqd+MGK0H0+2A+Fob3t/Rbd9At8P3O+27+mDWbOc8iu67G9Z8HlMZfo87D7zHdOM0alt8b9r2aDu6O876Y9R7zIQTWXMJ0mJuOx/D9Nu0U83fn6uqP7SdDOt1arkBEUhyT7qNyw8nuYxuA/6v6L7kp7KEbsEC+AfgjLY39enAXfqpdkq/ayv8B9DVPup898G6ezHFNJ1YIV1B9wX9/LYx/1C66zBeCnx4cFhtZb0X3R6QpzH6qNlT6fagPYxuT+26HCFfTrendHg8dgd2pltRXUa3IE97PWXbOPgCa6bze4D3VtXuwEtY//lj8IvyFVV1M92KY3BZHjXsS4D757anz+xJN/5jDWcdPo8Jk07b5ky6abRlmw9+wJoN7YkjiK+h2yh8KN2Rw81GDOsSuiPkDxqo9aV0X2DbA+e0o0wbYqpxmbhGakdgzyTPGNGu9+VvwKh1wnSf9U3t/620M02qu+7hRcAWwJmZgVPEZtJ6zJcbYuI6xN3pTjH9Ad2X9cR8ezCwANizTf9fMvnydCVdqP2ziQ49Tuep5t11Nt16ez2tz/I11bz8+4GjtoPz8kyvF4ZNLD9/GHg88Xzwe+lxdMvmeXRHm2A9ls1JbMhnPU7tM7ZemQHDQWZUsBnZX8/rtj+0P+im1bED3+EPrqp/aGfGvAp4fHXXpH6Ftafd4Dg9mS7UPxz4YYauvQT+je7I3d0Gut2J7gymiffdtqpuAL5Nd9TwMXTL8SrgALrttFFCt+NgYli7V9WT2msfBQ5v2zlvZvrtnFHzy1rvQTe/PWmo31HjtD7b4oNndgy2H/UeY0vyQLr5/VdDL32f7hKhBXRH/T473C8btk07uOxOl2UmZUDsHEC3h+cBVbWw7eX/T7qFeqpz0fdmzWl5W9GFHuhunDDh+mmGMeOq6rd0e5BeN2KGGqy7L6Om6V/Q7VH9EN0GxcPaKVh3qqrP0H1pP2xwQBN7harqVLqw8ND20uppm+48/u2r6gy6o0hb0V03OK7Tgc2THDbwvn9Kd8j+TW0cFlbV/YD7pbvj3HSf7aj545CBNjMxf/wSuFeSrdNdA/W0yYbf9ggeR7endB5AumsO7ko3/mMNZ4rPY5RJp22662e+RxeYz28vXUB3NPH+dBve0E27K9se+Ocxeo//z+hOJflYkl3b++xYVWdV1ZF0X37bj+h3XKPmk9XDraqr6E4NecOIYczG8reWSdYJPwN2SbJ5O3XlCdMNo03LZVV1FN2ZALeHgLgh8+WG+B7d8nFNVd1aVdcA92DNjZW2ojst7PdJJkLAZP6bbiPgn7Lmurm+pvO08+4YBqf3lOvtvmqcZPm6DNgj3XU929PtJJhSD+uF9dZ2bLwaeH7bsTrVenhcM/FZT2sm1is9+x5r7ilxMCOCzyyu274O/FVbdmif8f3p7gtxPfDrdNfaPXlEnfOA7arqdOB/0d0ga62bRrV10afpQuKErwKvGBjOHq3t5W0YO1fVpXSXwryeLjiOcjHdvQIe1YZ154nvW7p1w5XpThE9eKCfH9B9N8OazwNGzy9rvUd7n+GzriYdJ0Zvi59Ju7FVurvf7z7w2i+T/EnbjnzmGO8xlhb+Pkh3cGCtywva81OAfwV+XAOnbrd+x9mmHWv+Xl8GxM5BdB/UoM/QTfhb011wO3Hh78RNa86n21h9Xev+L3Rf8j9i7T1jZ9AtALN6k5qq+hHdxvbE0a4dB+r+R8Y/9399jZqmHwXOb9PpOXTneG8LfLMdxv84t92w3hL4Yjtd67t015hBd7rA/2zD2pnutItlwI+Ad1fVf49bbFtYnwk8Md1twZcD/0R3zeTweJxCN29M9tlO3LTmArqjAv/Qur+J7rSHc4CrBob1BboLz9fnJjUTtf+e7oY5P6Q71ecnAy9/FPhgG/4WdNP2RuCnSf6D7nTZZ1ZnrOEw+vMYVd+oafsLuhXcA+n2pk1sLP0KWNoCIcD7gUPavPsQpthDXFU/oVtRnpTu5kL/L92F8Re29zp/VL/jmGZcBn0OuOvAZzpqvTFrBtcJbcPg03Qh/NN0y8x0Xp3uxh4X0J3a9OXeih3fR1nP+XIDLaPbsPrBULfrWoD5BN01R8uA57P2srSWqvolXQh4X7prUXqZzusw707laOArSc5g+vV2nzUOLl9n0u18vIhuh965Y7zVjK4XNlRVXUl3iunLp1kPjzu8mfisx32vDV2vjGOLrP0zF+PexfEVdJe0XEC33h113eqsrNuqahndkbWvt/f6Kt3pf+fSzb8/oTtF9MwRg5gPfLL1ey7w9uou0xn2Drr104RX0q2PLkhyEWvf/fwsuvslQBcwtqVbf054Qrob5axMspLuyPQBwFHt++w81txs7u/b8M5k7fn21cBrW9070V0OwKj5pZ0ZNfgedwO+O1DHa6cYp1Hb4u+nC50X0V2HvHyiDrodTl+kWxdcOeZ0G2ViXl1Ot0Pgq6w5O2DYicBfM/nppfOYfpt23Pl7vaRue828JEmSJG2QdHcz/V1VVZLFdDsTpryzcw81zAPuXN3N7nakC28PbmFUk/AuppIkSZL6sCfdzV5Cd1r930zTvg93pbsp1J3prsl7meFwah5BlCRJkiQBXoMoSZIkSWoMiJIkSZIkwIAoSZIkSWoMiJIk9STJDe3//ZKcPE3bV7c7/kmSNGe8SY0kSesgybyqunXMtjdU1fAPHI9qexmwqP2W4ozXIknSODyCKElSk2Rhkp8k+USSHyc5Ocldk1yW5Kgk5wIHJtkxyVeSnJPkO0ke0vrfIcn32w+xv3VouBe2x/OSvH3ix7mTvCLJK4H70d2K/YzW7qCJH3RPctTAsG5I8o72I9KPms3pI0m64/N3ECVJWtuDgUOr6swkxwIva92vrqqHAST5BvDSqvqPJI8A3g88HngX8IGq+liSl48Y/mHAQmCPqrolyT2r6pokrwUeV1VXJbkfcBTdb4hdC3w1yf5V9TngbsBZVfW6XsZekrRJ8wiiJElru7yqzmyPPw7s3R6fCJDkj4BHAyclOQ/4d+C+rc2fA59qj48fMfwnAv9eVbcAVNU1k7R5OPDNqlrV2n0C+Iv22q3AZ9ZnxCRJmo5HECVJWtvwxfkTz3/T/t8J+O+q2mPM/mfajV53KEnqi0cQJUla2/2TTFzb91zgu4MvVtWvgf9MciBAOg9tL58JLG6PDx4x/K8BL0kyv/V/z9b9emDL9viHwF8m2SbJPOAg4FsbNlqSJE3PgChJ0touBl6e5MfAHwMfmKTNwcCh7UYxy4H9WvdXtX6XAduOGP6HgZ8DF7T+n9u6Hw18JckZVXUlcARwBnA+cE5VfX7DR02SpKn5MxeSJDVJFgJfrKrd5rgUSZLmhEcQJUmSJEmARxAlSZIkSY1HECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktQYECVJkiRJgAFRkiRJktT8f9DZR9rxhOFFAAAAAElFTkSuQmCC\n", "text/plain": [ "<Figure size 1080x540 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train_x = train.drop('Salary', axis=1)\n", "train_y = train.Salary\n", "reg = ensemble.GradientBoostingRegressor(n_estimators=1000, learning_rate=0.2).fit(train_x, train_y)\n", "\n", "# plot\n", "_,_ = plt.subplots(figsize=(15, 7.5))\n", "bar_df = pd.DataFrame({'predictor': train_x.columns, 'importance' : reg.feature_importances_})\n", "sns.barplot(x='predictor', y='importance', data=bar_df.sort_values('importance', ascending=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(g) Now apply bagging to the training set. What is the test set MSE for this approach?" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.4969384424303783" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_x = train.drop('Salary', axis=1)\n", "train_y = train.Salary\n", "reg = ensemble.RandomForestRegressor(n_estimators=100, max_features=None).fit(train_x,train_y)\n", "test_x = test.drop('Salary', axis=1)\n", "test_y = test.Salary\n", "preds = reg.predict(test_x)\n", "np.sqrt(metrics.mean_squared_error(test_y, preds))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. This question uses the Caravan data set." ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>MOSTYPE</th>\n", " <th>MAANTHUI</th>\n", " <th>MGEMOMV</th>\n", " <th>MGEMLEEF</th>\n", " <th>MOSHOOFD</th>\n", " <th>MGODRK</th>\n", " <th>MGODPR</th>\n", " <th>MGODOV</th>\n", " <th>MGODGE</th>\n", " <th>MRELGE</th>\n", " <th>...</th>\n", " <th>APERSONG</th>\n", " <th>AGEZONG</th>\n", " <th>AWAOREG</th>\n", " <th>ABRAND</th>\n", " <th>AZEILPL</th>\n", " <th>APLEZIER</th>\n", " <th>AFIETS</th>\n", " <th>AINBOED</th>\n", " <th>ABYSTAND</th>\n", " <th>Purchase</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>33</td>\n", " <td>1</td>\n", " <td>3</td>\n", " <td>2</td>\n", " <td>8</td>\n", " <td>0</td>\n", " <td>5</td>\n", " <td>1</td>\n", " <td>3</td>\n", " <td>7</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>37</td>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>2</td>\n", " <td>8</td>\n", " <td>1</td>\n", " <td>4</td>\n", " <td>1</td>\n", " <td>4</td>\n", " <td>6</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>37</td>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>2</td>\n", " <td>8</td>\n", " <td>0</td>\n", " <td>4</td>\n", " <td>2</td>\n", " <td>4</td>\n", " <td>3</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>9</td>\n", " <td>1</td>\n", " <td>3</td>\n", " <td>3</td>\n", " <td>3</td>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>2</td>\n", " <td>4</td>\n", " <td>5</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>40</td>\n", " <td>1</td>\n", " <td>4</td>\n", " <td>2</td>\n", " <td>10</td>\n", " <td>1</td>\n", " <td>4</td>\n", " <td>1</td>\n", " <td>4</td>\n", " <td>7</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 86 columns</p>\n", "</div>" ], "text/plain": [ " MOSTYPE MAANTHUI MGEMOMV MGEMLEEF MOSHOOFD MGODRK MGODPR MGODOV \\\n", "0 33 1 3 2 8 0 5 1 \n", "1 37 1 2 2 8 1 4 1 \n", "2 37 1 2 2 8 0 4 2 \n", "3 9 1 3 3 3 2 3 2 \n", "4 40 1 4 2 10 1 4 1 \n", "\n", " MGODGE MRELGE ... APERSONG AGEZONG AWAOREG ABRAND AZEILPL \\\n", "0 3 7 ... 0 0 0 1 0 \n", "1 4 6 ... 0 0 0 1 0 \n", "2 4 3 ... 0 0 0 1 0 \n", "3 4 5 ... 0 0 0 1 0 \n", "4 4 7 ... 0 0 0 1 0 \n", "\n", " APLEZIER AFIETS AINBOED ABYSTAND Purchase \n", "0 0 0 0 0 0 \n", "1 0 0 0 0 0 \n", "2 0 0 0 0 0 \n", "3 0 0 0 0 0 \n", "4 0 0 0 0 0 \n", "\n", "[5 rows x 86 columns]" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "van_df = pd.read_csv('caravan.csv')\n", "van_df = van_df.drop(van_df.columns[0], axis=1)\n", "van_df.Purchase = van_df.Purchase.map({'Yes':1, 'No': 0})\n", "van_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(a) Create a training set consisting of the first 1,000 observations, and a test set consisting of the remaining observations." ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [], "source": [ "train,test = model_selection.train_test_split(van_df, test_size=van_df.shape[0] - 1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(b) Fit a boosting model to the training set with Purchase as the response and the other variables as predictors. Use 1,000 trees, and a shrinkage value of 0.01. Which predictors appear to be the most important?" ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "classification error: 0.06574035669846534\n" ] }, { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x11b7680f0>" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4IAAAHICAYAAAAbVVRCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xu8LmVdN/7PV3YopWkhViq4UbFCTVNSy0PmKXwqsYIEzcd6KPSXZGVWlD5GZCWZ8VTYwZQi1CCpfLZFUb88ZoZsFARUdKMW2ImDoaiI4Pf5Y2bJzWLtvdeGNXutzbzfr9d+7Zlrrpn7uq81M/d87jnc1d0BAABgPu6w3g0AAABg9xIEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZ2bTeDVgrd7/73Xvz5s3r3QwAAIB1cd55513Z3futpu7tJghu3rw5W7duXe9mAAAArIuq+pfV1nVpKAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzs2m9G7A7Pfxn/mS9m7BhnPeK/7neTQAAANaJM4IAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMTBoEq+rQqrqkqrZV1XErTL9jVZ0xTj+nqjYvTPumqnp3VV1cVRdW1Z2mbCsAAMBcTBYEq2qvJK9K8tQkByc5qqoOXlbt6CSf7O77JzkpyYnjvJuSvC7J87r7gUken+QLU7UVAABgTqY8I/iIJNu6+6PdfX2S05MctqzOYUlOHYfPTPLEqqokT0ny/u6+IEm6+6ruvnHCtgIAAMzGlEHwXkkuWxi/fCxbsU5335DkmiT7JnlAkq6qs6vqvVX1syu9QFUdU1Vbq2rrFVdcseZvAAAA4PZooz4sZlOSxyR51vj/91bVE5dX6u5Xd/ch3X3Ifvvtt7vbCAAAsEeaMgh+Isn+C+P3HstWrDPeF3jXJFdlOHv4ju6+srs/m+SsJA+bsK0AAACzMWUQPDfJQVV1YFXtneTIJFuW1dmS5Dnj8OFJ3tLdneTsJA+uqi8fA+K3J/nAhG0FAACYjU1TLbi7b6iqYzOEur2SnNLdF1fVCUm2dveWJK9NclpVbUtydYawmO7+ZFX9ZoYw2UnO6u6/nqqtAAAAczJZEEyS7j4rw2Wdi2UvXRi+LskR25n3dRl+QgIAAIA1tFEfFgMAAMBEBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZjatdwPYM/3rCQ9e7yZsGAe89ML1bgIAAOwSZwQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJnxg/KwATz6dx693k3YMN714+9a7yYAANzuTXpGsKoOrapLqmpbVR23wvQ7VtUZ4/RzqmrzWL65qj5XVeeP/35/ynYCAADMyWRnBKtqrySvSvLkJJcnObeqtnT3BxaqHZ3kk919/6o6MsmJSZ4xTru0ux86VfsAAADmasozgo9Isq27P9rd1yc5Pclhy+ocluTUcfjMJE+sqpqwTQAAALM3ZRC8V5LLFsYvH8tWrNPdNyS5Jsm+47QDq+p9VfX2qnrsSi9QVcdU1daq2nrFFVesbesBAABupzbqU0P/PckB3f3NSV6Y5A1V9ZXLK3X3q7v7kO4+ZL/99tvtjQQAANgTTRkEP5Fk/4Xxe49lK9apqk1J7prkqu7+fHdflSTdfV6SS5M8YMK2AgAAzMaUQfDcJAdV1YFVtXeSI5NsWVZnS5LnjMOHJ3lLd3dV7Tc+bCZVdd8kByX56IRtBQAAmI3Jnhra3TdU1bFJzk6yV5JTuvviqjohydbu3pLktUlOq6ptSa7OEBaT5HFJTqiqLyT5YpLndffVU7UVAABgTib9QfnuPivJWcvKXrowfF2SI1aY78+T/PmUbQMAAJirjfqwGAAAACYiCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMbFrvBgCstbc/7tvXuwkbxre/4+3r3QQAYANyRhAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJmZNAhW1aFVdUlVbauq41aYfseqOmOcfk5VbV42/YCquraqXjRlOwEAAOZksiBYVXsleVWSpyY5OMlRVXXwsmpHJ/lkd98/yUlJTlw2/TeT/M1UbQQAAJijKc8IPiLJtu7+aHdfn+T0JIctq3NYklPH4TOTPLGqKkmq6ulJPpbk4gnbCAAAMDtTBsF7JblsYfzysWzFOt19Q5JrkuxbVXdO8nNJfmlHL1BVx1TV1qraesUVV6xZwwEAAG7PNq13A7bj+CQndfe14wnCFXX3q5O8OkkOOeSQ3j1NA5iPk3/6zevdhA3j2Fd+z3o3AQDWzJRB8BNJ9l8Yv/dYtlKdy6tqU5K7JrkqySOTHF5Vv57kbkm+WFXXdffJE7YXACb1Kz94+Ho3YcN48evOXO8mAMzalEHw3CQHVdWBGQLfkUmeuazOliTPSfLuJIcneUt3d5LHLlWoquOTXCsEAgAArI3JgmB331BVxyY5O8leSU7p7our6oQkW7t7S5LXJjmtqrYluTpDWAQAAGBCk94j2N1nJTlrWdlLF4avS3LETpZx/CSNAwD2aB/8lbesdxM2jG988RPWuwnAHmbSH5QHAABg4xEEAQAAZkYQBAAAmBlBEAAAYGZWHQSr6j5V9aRxeJ+qust0zQIAAGAqqwqCVfWjSc5M8gdj0b2TvGmqRgEAADCd1Z4RfH6SRyf5VJJ090eS3GOqRgEAADCd1QbBz3f39UsjVbUpSU/TJAAAAKa02iD49qr6hST7VNWTk7wxyZunaxYAAABTWW0QPC7JFUkuTPLcJGcleclUjQIAAGA6m1ZZb58kp3T3HyZJVe01ln12qoYBAAAwjdWeEfyHDMFvyT5J/v+1bw4AAABTW20QvFN3X7s0Mg5/+TRNAgAAYEqrDYKfqaqHLY1U1cOTfG6aJgEAADCl1d4j+JNJ3lhV/5akknxtkmdM1ioAAAAms6og2N3nVtU3JPn6seiS7v7CdM0CAABgKqs9I5gk35Jk8zjPw6oq3f0nk7QKAACAyawqCFbVaUnul+T8JDeOxZ1EEAQAANjDrPaM4CFJDu7unrIxAAAATG+1Tw29KMMDYgAAANjDrfaM4N2TfKCq3pPk80uF3f20SVoFAADAZFYbBI+fshEAAADsPqv9+Yi3T90QAAAAdo9V3SNYVY+qqnOr6tqqur6qbqyqT03dOAAAANbeah8Wc3KSo5J8JMk+SX4kyaumahQAAADTWW0QTHdvS7JXd9/Y3X+U5NDpmgUAAMBUVvuwmM9W1d5Jzq+qX0/y79mFEAkAAMDGsdow9+yx7rFJPpNk/yTfN1WjAAAAmM5qg+DTu/u67v5Ud/9Sd78wyXdP2TAAAACmsdog+JwVyn5oDdsBAADAbrLDewSr6qgkz0xy36rasjDpLkmunrJhAAAATGNnD4v5pwwPhrl7klculH86yfunahQAAADT2WEQ7O5/qarLk1zX3W/fTW0CAGA3O/7449e7CRuGvmAOdvrzEd19Y1V9saru2t3X7I5GAQDAnurP3viI9W7ChvEDR7xnvZvAdqz2dwSvTXJhVf19hp+PSJJ09wsmaRUAAACTWW0Q/IvxHwAAAHu4VQXB7j61qvZO8oCx6JLu/sJ0zQIAAGAqqwqCVfX4JKcm+XiSSrJ/VT2nu98xXdMAAACYwmovDX1lkqd09yVJUlUPSPKnSR4+VcMAAACYxh1WWe/LlkJgknT3h5N82TRNAgAAYEqrPSO4tapek+R14/izkmydpkkAAABMabVB8P9L8vwkSz8X8c4kvztJiwAAAJjUap8a+vmqOjnJPyT5Yoanhl4/acsAAACYxGqfGvpdSX4/yaUZnhp6YFU9t7v/ZsrGAQAAsPZ25amh39Hd25Kkqu6X5K+TCIIAAAB7mNU+NfTTSyFw9NEkn56gPQAAAExsV54aelaSP0vSSY5Icm5VfV+SdPdfTNQ+AAAA1thqg+Cdkvxnkm8fx69Isk+S78kQDAVBAACAPcRqnxr6w1M3BAAAgN1jtU8NPTDJjyfZvDhPdz9tmmYBAAAwldVeGvqmJK9N8uYMvyMIAADAHmq1Tw29rrt/u7vf2t1vX/q3s5mq6tCquqSqtlXVcStMv2NVnTFOP6eqNo/lj6iq88d/F1TV9+7SuwIAAGC7VntG8Leq6heT/F2Szy8Vdvd7tzdDVe2V5FVJnpzk8gxPGd3S3R9YqHZ0kk929/2r6sgkJyZ5RpKLkhzS3TdU1dcluaCq3tzdN+zKmwMAAOCWVhsEH5zk2UmekJsuDe1xfHsekWRbd380Sarq9CSHJVkMgoclOX4cPjPJyVVV3f3ZhTp3Gl8LAACANbDaIHhEkvt29/W7sOx7JblsYfzyJI/cXp3x7N81SfZNcmVVPTLJKUnuk+TZK50NrKpjkhyTJAcccMAuNA0AAGC+VnuP4EVJ7jZlQ5br7nO6+4FJviXJz1fVnVao8+ruPqS7D9lvv/12Z/MAAAD2WKs9I3i3JB+qqnNz83sEd/TzEZ9Isv/C+L3HspXqXF5Vm5LcNclVixW6+4NVdW2SByXZusr2AgAAsB2rDYK/eCuWfW6Sg8bfIPxEkiOTPHNZnS1JnpPk3UkOT/KW7u5xnsvGy0Xvk+Qbknz8VrQBAACAZVYVBFfzUxErzHNDVR2b5OwkeyU5pbsvrqoTkmzt7i0ZfpvwtKraluTqDGExSR6T5Liq+kKGh9P8WHdfuattAAAA4JZ2GASr6h+7+zFV9enc/MmdlaS7+yt3NH93n5XkrGVlL10Yvi7Dg2iWz3daktN23nwAAAB21Q6DYHc/Zvz/LrunOQAAAExttU8NBQAA4HZCEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmJlN690AAACA7XnImWevdxM2jAsO/841W5YzggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMzMpEGwqg6tqkuqaltVHbfC9DtW1Rnj9HOqavNY/uSqOq+qLhz/f8KU7QQAAJiTyYJgVe2V5FVJnprk4CRHVdXBy6odneST3X3/JCclOXEsvzLJ93T3g5M8J8lpU7UTAABgbqY8I/iIJNu6+6PdfX2S05MctqzOYUlOHYfPTPLEqqrufl93/9tYfnGSfarqjhO2FQAAYDamDIL3SnLZwvjlY9mKdbr7hiTXJNl3WZ3vT/Le7v78RO0EAACYlU3r3YAdqaoHZrhc9CnbmX5MkmOS5IADDtiNLQMAANhzTXlG8BNJ9l8Yv/dYtmKdqtqU5K5JrhrH753kL5P8z+6+dKUX6O5Xd/ch3X3Ifvvtt8bNBwAAuH2aMgiem+SgqjqwqvZOcmSSLcvqbMnwMJgkOTzJW7q7q+puSf46yXHd/a4J2wgAADA7kwXB8Z6/Y5OcneSDSf6suy+uqhOq6mljtdcm2beqtiV5YZKln5g4Nsn9k7y0qs4f/91jqrYCAADMyaT3CHb3WUnOWlb20oXh65IcscJ8L0vysinbBgAAMFeT/qA8AAAAG48gCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMwIggAAADMjCAIAAMyMIAgAADAzgiAAAMDMCIIAAAAzIwgCAADMjCAIAAAwM4IgAADAzAiCAAAAMyMIAgAAzIwgCAAAMDOCIAAAwMxMGgSr6tCquqSqtlXVcStMv2NVnTFOP6eqNo/l+1bVW6vq2qo6eco2AgAAzM1kQbCq9kryqiRPTXJwkqOq6uBl1Y5O8snuvn+Sk5KcOJZfl+R/J3nRVO0DAACYqynPCD4iybbu/mh3X5/k9CSHLatzWJJTx+Ezkzyxqqq7P9Pd/5ghEAIAALCGpgyC90py2cL45WPZinW6+4Yk1yTZd8I2AQAAzN4e/bCYqjqmqrZW1dYrrrhivZsDAACwR5gyCH4iyf4L4/cey1asU1Wbktw1yVWrfYHufnV3H9Ldh+y33363sbkAAADzMGUQPDfJQVV1YFXtneTIJFuW1dmS5Dnj8OFJ3tLdPWGbAAAAZm/TVAvu7huq6tgkZyfZK8kp3X1xVZ2QZGt3b0ny2iSnVdW2JFdnCItJkqr6eJKvTLJ3VT09yVO6+wNTtRcAAGAuJguCSdLdZyU5a1nZSxeGr0tyxHbm3Txl2wAAAOZqj35YDAAAALtOEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEAACYGUEQAABgZgRBAACAmREEAQAAZkYQBAAAmBlBEAAAYGYEQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmJg2CVXVoVV1SVduq6rgVpt+xqs4Yp59TVZsXpv38WH5JVX3nlO0EAACYk8mCYFXtleRVSZ6a5OAkR1XVwcuqHZ3kk919/yQnJTlxnPfgJEcmeWCSQ5P87rg8AAAAbqMpzwg+Ism27v5od1+f5PQkhy2rc1iSU8fhM5M8sapqLD+9uz/f3R9Lsm1cHgAAALfRlEHwXkkuWxi/fCxbsU5335DkmiT7rnJeAAAAboXq7mkWXHV4kkO7+0fG8WcneWR3H7tQ56KxzuXj+KVJHpnk+CT/3N2vG8tfm+RvuvvMZa9xTJJjxtGvT3LJJG9mbd09yZXr3YjbEf25tvTn2tGXa0t/ri39uXb05drSn2tLf66dPaUv79Pd+62m4qYJG/GJJPsvjN97LFupzuVVtSnJXZNctcp5092vTvLqNWzz5Kpqa3cfst7tuL3Qn2tLf64dfbm29Ofa0p9rR1+uLf25tvTn2rk99uWUl4aem+SgqjqwqvbO8PCXLcvqbEnynHH48CRv6eEU5ZYkR45PFT0wyUFJ3jNhWwEAAGZjsjOC3X1DVR2b5OwkeyU5pbsvrqoTkmzt7i1JXpvktKraluTqDGExY70/S/KBJDckeX533zhVWwEAAOZkyktD091nJTlrWdlLF4avS3LEdub9lSS/MmX71skedSnrHkB/ri39uXb05drSn2tLf64dfbm29Ofa0p9r53bXl5M9LAYAAICNacp7BAEAANiABMEFVXVjVZ1fVRdV1Rur6stXWb7077ix/G1VdUlVXVBV51bVQxde439V1YVV9f5xeYctTNtUVVdU1cuXtevjVXX3hfHHV9VfVdUPL7z29eNyz18+/3qrqq6q1y2ML73Pv1ooe/rYJx8c38fTF6Y9qqrOGd/bB6vq+B289zdW1Yerap+F+f+6qo6qqh8aX/f8qvpAVf3oOH2xfOnfwburf3bFSutiVZ1UVT+5UOfsqnrNwvgrq+qFC+M/WVXXVdVdF8oeX1XXLPTxLy573f9TVZ+oqjsslC3224eq6qfG8hcv9OPiNvKCqfplV63Qj/daaOd/jO91aXzvZfXfXFV3W7a8W/TpWP6IqnrHuD94X1W9pqqev6Ptdifbwh9X1cfG+hdU1RN3T4/d0s6262Xrx8VVdWbdtO88flkfn19Vd1u2Hn6oqn5jhdd9U1X987KyxeV9oKqOWpi22GcfWly3xzb/alV9ZKEdL56iv6ayir/D19TweXHB2DdnjeWba/gJp6X5frSqzquqr1roswtq2J/+SVXde/e/u7WzyvX15HH4+Kr6bFXdY6H+tdsZ/h9jH91nHP/Bcfu9eOy/1yztL+qmY4Olde3Mhdfrqrr/wnJ/cizbcE8o3F19OZZ9/2I/jOvt5xb68PfH8rss259cWVX/Z6E9i5/xPzJd7+zYzvpuLFvtZ8B7q+pbF8oPX/ZaN9vGx7Ljq+pF43BV1UvG/d+Hq+qtVfXAhbp3Hbf9bVV16Ti8eNxw0LhvuXTcd7y1qh63lv11W6yyrw+tqvfU8NlwflWdUVUHjNN21j8fH/8+F9awb31ZVd1pYfp2+2eFdXL9jju727/xX5JrF4Zfn+SFqy1ftpy3JTlkHP7hJH8/Dt87yaVJ7jqO3znJgQvzPTXJu8Y6tVD+8SR3Xxh/fJK/WvaaN6uzkf4luTbJ+Un2WXif5y+9hyQPSbJtqS+SHDiOf9M4fkmSh4zDeyU5eEfvPcnLk7xsHH56kr8bh38oycnj8D2SXJHkaxbLN/q/ldbFDE/c/bOx7A5Jzkvy7oV6707yqIXxc5K8M8kPr7ROJfmKJB9J8rCFZf5Lkn9O8h0L8yz2574Zfltn/+21dyP92942PY4fn+RFO6h/apIXL5u+Up9+zdhv37pQdniSr9nBuruzbeGPkxw+Dn9Hko+sZx/uZLu+2XaV5A1L/bNSH6+wHu6T5ENJHr0w/W5JLkvywST3XelvluEp059K8mUr9Nmdknx0oX9fPk6/0zh+lyTHr/f6ucZ/hz9I8hML9ZfWpc1JLhqHn53k/Uvr4rI+qyQ/leTDSfZe7/e7O9bXcX361yQnLs6/fDjJE8ft837j+KEZ9r/3Gsf3SvK/knz9OP62jMcGy9p2/Nj/L1koe1eSi1aqv97/dkdfjmV3SfKODJ89S8dUX1pvd9LG85I8bnl71vvfKvpuVz4DnpLk/cvLF17rFn2Vm+8rj83wHI8vX1jepblpf3hmFvaHSX4pyRvH4Ttl2Cc8bWH6g5L80Hr38S709YMyHOt848I8T1tYb3bWPx/PTfvMO2f4jDt1Nf2zkdZJZwS3751J7r8L5dvz7iT3GofvkeTTGVbOdPe13f2xhbpHJfmtDDvNb93VBm9wZyX5rnH4qCR/ujDtRUl+dakvxv9/LcnPjNPvkeTfx2k3dvcHdvJaJyQ5ooYzsS9P8vzlFbr7vzJs0PdZPm0PsrQu/lNuWl8emOHg4dM1fLN/xyTfmOS9SVJV98uww3pJhr/DLXT3ZzJ8iC6t549PcnGS39vBPFdl+LD6utv6ptbBbdmmd9Snz8/wofDupYLuPrO7/3MHy97ZtrDddqyTHW3XX1LD78R+RZJPrnbB3f25DB/ai+/x+5K8OcnpGZ8yvcJ8H0ny2SRftcLkpW9rP1PD2ckfTfLjPTy4LN396e4+frVt3EB29Hf4uiSXL4109/sXZ6yqH0hyXJKndPctfii5Bycl+Y8MB1J7slWtr6NTkjyjqr56pYnjN/t/mOS7u/vSsfjFGQ6yP5F86fPqlO6+ZBVte1OSw8Zl3y/JNdnYP1w9dV8myS8nOTHJdbvSsKp6QIbjhnfuyny70W05Hlr0juzaZ9dyP5fk2O7+7Phaf5fheOJZNZydfniGv8GSE5IcMq6fz8rwhfOXfhauuy/q7j++De2Zwo76+ucy9PUHlwq6e0t3v2Nh+or9s/xFuvvaJM9L8vRxPd9T+kcQXMl40PLUJBfupHyfZad1n7HC4g7NsINPkgtt2Y1IAAAMuUlEQVSS/GeSj1XVH1XV9yws+05JnpThIOdPs50D7j3Y6Rl+G/JOSb4pwxmUJQ/MEDwWbR3Lk+SkJJdU1V9W1XMXT72vZNxoX5RhJ3n6eGB4M1V13yT3zRBekuFDavFvuc/yeTaSxXWxu/8tyQ3j5QzfliEgnJMhHB4y1rl+nPXIDH+Ldyb5+qr6mhWWvW+SR2UIf8lNO8+/TPJdVfVlK8xzQIaD7Pcvn7aRbW9b30H9vTJ8c734m6jb69MH5Zbr9c7sbFtYtLhvWS872q6TcbtK8okkX51h/7bkpxa2t7cuX3BVfVWGs3vvWCheWhe3u4+sqodlOFP6XwvFrxjbcXmGfcJ/ZTiA+tfu/vTq3+6GtaO/w6uSvHa8LOnFVXXPhWn3SXJyhhD4Hzt5jfcm+Ya1bPQ62Nn6uujaDAHmJ1aYdscM297Tu/tDC+UPzPil2w68fmG9f8VC+aeSXFZVD8qwTzljJ8tZb5P25bgd79/df73CPAfWcKn926vqsStMPzLJGT2eehl9fw2XW55ZVfvv+K1N7rYcDy36nuz8s+t+i8c2GcJKquork3xFd390O691cJLze+Gn28bh88fpq1nXN4Kd9fWK72EV/XML3f2pJB/L8Lm1mv7ZEMedguDN7TNuKFsznJV77U7KP9fdD134t7jjfn1VfSzDN4SvSr60ER2a4fKwDyc5qaqOH+t/d5K3jt+C/3mGbxX2Gqct7syyg7INa/wWenOGg7ezdlz7FvOekCHQ/F2SZyb521XM8+Yk/53kd5dNWjow/dMkz+3uq8fyM5b9LT+3K23cjba3Lv5ThhC4FATfvTD+roX5j8pwIPzFDOvZ4s+3PLaq3pehn1/ew+957p3kfyR507iTOyfJdy7M84yqen+GQP27S2dW9gDb68ed1f+PDJd8/v3CtB316RReUVUfznAZyokTv9YOrWK7PqO7H5rkazMcsCx+q33Swvb2HQvlj62qCzKEx7OXAsoYsA9K8o/d/eEkXxgPmpf8VFVdnGEdXf7TQz+z0I4nVtW3LW9o3XTf8WUb4EBxl+zo79DdZ2f40usPMwS591XVfuPkKzKs/z+wipeptWrverkVn0O/neQ5VXWXZeVfyLDPPXp7M1bVg8f16dJlXxI/a2G9X36WZ+lM99MzfPG2YU3ZlzXci/6bSX56heX8e5IDuvubM9wa8YbxoH3Rkbn5mZ83J9nc3d+UYd996iraO5nbcjw0Wvpi65jsYB0cXbp4bJPk92/F6+3U+EX9RVX1F1Ms/9ZabV9X1b7j9vrhGu+hvJVW3E9up382xHGnIHhzi8HuxxfOomyvfEeeleHD99Qkv7NUOF5m857u/rUMO6vvHycdleRJVfXxDN8G7ZvkCeO0q3Lzy5y+Ohv7kpHt2ZLkN3LLS0g+kOEShEUPz01npNLdl3b372U4G/OQ8azVznxx/LdoacN7ZHdv6A/a7djeuviuDKHvwRkuDf3nDGcEvy3Dh2yq6sEZDqT/flzPjszNz6q8s7u/ubsf3t1LHxbfmeG+rAvHeR6zbJ4zxg/Xb0vy8qr62jV/x9PY1W36c+OH6H0y7Oifn+y0Ty/OLdfrndnptpAh1Dwgw2Urp+zi8qewve36S8Zv5t+cZDUPEnhndz8kwzeqR9dND9v6gQz7wY+Nfb05N18XT+ruB2bYp752pSsHxst33pZhPd6W5IClA9Pu/qPxb3xNhnu79jTb/Tt099Xd/YbufnaSc3PT3+GzGb7oeV5V3eJyp2W+OcO9mXu6na6vS7r7vzN84bL89oIvZlgfH1FVv7BQfnGSh43zXjiuT3+T4X7X1firDPdr/uv4xdtGN1Vf3iXDFRVvG7f1RyXZUlWHdPfne7gVId19XoZbPB6wtLCqekiSTeO0pde+qrs/P46+Jru+X57CrT4eyvjFVnc/ubsvyq0wrl+fGa+OWum1PpDkoXXzB8TdIclDx2lfWtfH5X1vhvveVrz8d51tr68Xt9erxu311UnuvIr+uYXxs2RzhhM9e0z/CIITGg9+/neSR1XVN1TVPcfLHZY8NMm/jN9mPTbDt1ybu3tzhp3l0kHO2zJ8OCxdmvaDSW5xKdUe4JQkv9Tdyy9l+I0kP19Vm5PhSVdJfiHJK8fx76qqpW9ZDkpyY4azfdzknzKcVb66h/tSrs4Q4L51nJYM69PxS+tYd98zyT1r4QltKzgqyY8srJcHJnlyjU9/XNLdW5OclpUv/bnd6OGy4xck+enxstId9enJGb4Bf+TS/FX1fbXC5bgLdrgtLHNykjtU1XeuMG132t52vdxjMhy0rUoP98a8PEPgTYa+PnRhXXx4VrhPsId7MrYmec7yaePf7JEZviX/bIYzwScvhcZx/7r3atu4waz4d6iqJ9RNT2u9S5L7ZTgLmORL90sfmuRXV1qXavCCDPca7vRqjD3AatfXJb+Z5LlJNi0WjuvPd2W4n2rprMyvJfmNuvkTVld9ude4zJ/LLc9ob1ST9GV3X9Pdd1/Y1v85w0M3tlbVfktXS40H6QdleADUklvcr1hVi/euPy0b4wuNW3U8tMZekeS3ly5JrKonZdhPv6G7tyV5X4Z735e8JMl7x2lvSPLoqnrawvSbHRdsINvr619P8uKq+saFssX3sN3+Wf4CVXXnDFegvam7P5k9qH827bwKO7B0udiSv+3u4xYrdPfnquqVGS6JOiHDh8Q9M9z8fEWG67W/N8lbFr6xSpL/m+TXa3jYxy8n+b3xcqnK8GH8uuxhuvvyDJeHLC8/v6p+Lsmba7j/7AtJfra7l/r22Rkuo/1skhsyXFpz4/Ll3EbPqKrHLIz/WHf/03ZrbzwXJrl7br6DujDDN1tLZ4+PzPDt/6K/HMtvcX/HePB4aMZ7CpLhQTJV9Y8Z7k1Y7sQk762qX+3bx31XK+ru942Xwx6VHfRpd59YVUdm2ObvkeGb73dkBwfTq9gWFut2Vb0syc8mOXst3tutsb3terS0Xd0hw/15P7Qw7aeq6gcXxp+eW/r9JC8aD4buk+GAcOl1P1bDT008coX5TshwydgfjuOvqKqXZAh5/5Bk6fKcF2fYv15UVZ9O8rkMV3H823bez4a1g7/DwzOE3Rsy/B1e093nLh1ojvN+bDxgOauqvncsfkVV/e8MBy9LTwxezdUwG9pO1teV6l9ZVX+Z4cmpy6ddXVWHJnlHVV3R3VvGy27/Zgwr/53hCo3F7fP1VbV0CdiV3f2kZcs8fRff0rqZui+3s5jHJTmhqr6QYZ/6vL7pFo9kOLu4fJ/8gnH9viHJ1bn5fmhd3IbjoR35gxp/MiPD05V39qyJ38lwlcWFVXVjhlsfDuubLlE8OsnvVNXSF3jvHsuWjm2/O8lvjq/5nxkehviyVbRzt9pBX19YVT+R5E/GEzJXZviS7BfHKjvrnyR563ii4g4ZPvt/eVz2avpnQxx3VvcedasZAAAAt5FLQwEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABmRhAEgNuoqq4d/79nVZ25k7o/ufy3OAFgd/PzEQCwgqraa7W/WVpV13b3nVdZ9+NJDln4jc81bQsArIYzggDMTlVtrqoPVdXrq+qDVXVmVX15VX28qk6sqvcmOaKq7ldVf1tV51XVO6vqG8b5D6yqd1fVhVX1smXLvWgc3quqfqOqLqqq91fVj1fVC5LcM8MPEb91rHfUuJyLqurEhWVdW1WvrKoLknzr7uwfAG7/Nq13AwBgnXx9kqO7+11VdUqSHxvLr+ruhyVJVf1Dkud190eq6pFJfjfJE5L8VpLf6+4/qarnb2f5xyTZnOSh3X1DVX11d19dVS9M8h3dfWVV3TPJiUkenuSTSf6uqp7e3W9K8hVJzunun57k3QMwa84IAjBXl3X3u8bh1yV5zDh8RpJU1Z2TfFuSN1bV+Un+IMnXjXUeneRPx+HTtrP8JyX5g+6+IUm6++oV6nxLkrd19xVjvdcnedw47cYkf35r3hgA7IwzggDM1fKb5JfGPzP+f4ck/93dD13l/GvtOvcFAjAVZwQBmKsDqmrp3rtnJvnHxYnd/akkH6uqI5KkBg8ZJ78ryZHj8LO2s/y/T/Lcqto0zv/VY/mnk9xlHH5Pkm+vqrtX1V5Jjkry9tv2tgBg5wRBAObqkiTPr6oPJvmqJL+3Qp1nJTl6fGDLxUkOG8t/Ypz3wiT32s7yX5PkX5O8f5z/mWP5q5P8bVW9tbv/PclxSd6a5IIk53X3/73tbw0AdszPRwAwO1W1OclfdfeD1rkpALAunBEEAACYGWcEAQAAZsYZQQAAgJkRBAEAAGZGEAQAAJgZQRAAAGBmBEEAAICZEQQBAABm5v8BsRQnQdJPymQAAAAASUVORK5CYII=\n", "text/plain": [ "<Figure size 1080x540 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train_x = train.drop('Purchase', axis=1)\n", "train_y = train.Purchase\n", "clr = ensemble.GradientBoostingClassifier(n_estimators=1000, learning_rate=0.01).fit(train_x, train_y)\n", "test_x = test.drop('Purchase', axis=1)\n", "test_y = test.Purchase\n", "preds = clr.predict(test_x)\n", "error = 1 - ((preds == test_y).sum() / preds.shape[0])\n", "\n", "print('classification error: {}'.format(error))\n", "\n", "\n", "# plot\n", "_,_ = plt.subplots(figsize=(15, 7.5))\n", "bar_df = pd.DataFrame({'predictor': train_x.columns, 'importance' : clr.feature_importances_})\n", "sns.barplot(x='predictor', y='importance', data=bar_df.sort_values('importance', ascending=False).iloc[0:10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(c) Use the boosting model to predict the response on the test data. Predict that a person will make a purchase if the estimated probability of purchase is greater than 20%. Form a confusion matrix. What fraction of the people predicted to make a purchase do in fact make one? How does this compare with the results obtained from applying KNN or logistic regression to this data set?" ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Positive Predictive Value: 0.1840277777777778\n" ] } ], "source": [ "preds = (clr.predict_proba(test_x)[:,1] > 0.2)\n", "conf_mtrx = pd.DataFrame(metrics.confusion_matrix(test_y, preds))\n", "\n", "predicted_true = conf_mtrx[0][1] + conf_mtrx[1][1]\n", "print('Positive Predictive Value: {}'.format(conf_mtrx[1][1] / predicted_true))" ] }, { "cell_type": "code", "execution_count": 169, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Positive Predictive Value: 0.1840277777777778\n" ] } ], "source": [ "logit = linear_model.LogisticRegression().fit(train_x, train_y)\n", "preds = (logit.predict_proba(test_x)[:,1] > 0.2)\n", "conf_mtrx = pd.DataFrame(metrics.confusion_matrix(test_y, preds))\n", "predicted_true = conf_mtrx[0][1] + conf_mtrx[1][1]\n", "print('Positive Predictive Value: {}'.format(conf_mtrx[1][1] / predicted_true))" ] }, { "cell_type": "code", "execution_count": 174, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Positive Predictive Value: 0.0625\n" ] } ], "source": [ "knn = neighbors.KNeighborsClassifier(n_neighbors=5).fit(train_x, train_y)\n", "preds = (knn.predict_proba(test_x)[:,1] > 0.2)\n", "conf_mtrx = pd.DataFrame(metrics.confusion_matrix(test_y, preds))\n", "predicted_true = conf_mtrx[0][1] + conf_mtrx[1][1]\n", "print('Positive Predictive Value: {}'.format(conf_mtrx[1][1] / predicted_true))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "12. Apply boosting, bagging, and random forests to a data set of your choice. Be sure to fit the models on a training set and to evaluate their performance on a test set. How accurate are the results compared to simple methods like linear or logistic regression? Which of these approaches yields the best performance?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: Come back for this one with a kaggle dataset and take XGBoost out for a spin" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }