{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Chapter 6 – Decision Trees**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_This notebook contains all the sample code and solutions to the exercises in chapter 6._\n", "\n", "\n", " \n", "
\n", " Run in Google Colab\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Warning**: this is the code for the 1st edition of the book. Please visit https://github.com/ageron/handson-ml2 for the 2nd edition code, with up-to-date notebooks using the latest library versions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# To support both python 2 and python 3\n", "from __future__ import division, print_function, unicode_literals\n", "\n", "# Common imports\n", "import numpy as np\n", "import os\n", "\n", "# to make this notebook's output stable across runs\n", "np.random.seed(42)\n", "\n", "# To plot pretty figures\n", "%matplotlib inline\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "mpl.rc('axes', labelsize=14)\n", "mpl.rc('xtick', labelsize=12)\n", "mpl.rc('ytick', labelsize=12)\n", "\n", "# Where to save the figures\n", "PROJECT_ROOT_DIR = \".\"\n", "CHAPTER_ID = \"decision_trees\"\n", "IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n", "os.makedirs(IMAGES_PATH, exist_ok=True)\n", "\n", "def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n", " path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n", " print(\"Saving figure\", fig_id)\n", " if tight_layout:\n", " plt.tight_layout()\n", " plt.savefig(path, format=fig_extension, dpi=resolution)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Training and visualizing" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=42,\n", " splitter='best')" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.datasets import load_iris\n", "from sklearn.tree import DecisionTreeClassifier\n", "\n", "iris = load_iris()\n", "X = iris.data[:, 2:] # petal length and width\n", "y = iris.target\n", "\n", "tree_clf = DecisionTreeClassifier(max_depth=2, random_state=42)\n", "tree_clf.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from sklearn.tree import export_graphviz\n", "\n", "def image_path(fig_id):\n", " return os.path.join(IMAGES_PATH, fig_id)\n", "\n", "export_graphviz(\n", " tree_clf,\n", " out_file=image_path(\"iris_tree.dot\"),\n", " feature_names=iris.feature_names[2:],\n", " class_names=iris.target_names,\n", " rounded=True,\n", " filled=True\n", " )" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure decision_tree_decision_boundaries_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from matplotlib.colors import ListedColormap\n", "\n", "def plot_decision_boundary(clf, X, y, axes=[0, 7.5, 0, 3], iris=True, legend=False, plot_training=True):\n", " x1s = np.linspace(axes[0], axes[1], 100)\n", " x2s = np.linspace(axes[2], axes[3], 100)\n", " x1, x2 = np.meshgrid(x1s, x2s)\n", " X_new = np.c_[x1.ravel(), x2.ravel()]\n", " y_pred = clf.predict(X_new).reshape(x1.shape)\n", " custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])\n", " plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)\n", " if not iris:\n", " custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])\n", " plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)\n", " if plot_training:\n", " plt.plot(X[:, 0][y==0], X[:, 1][y==0], \"yo\", label=\"Iris-Setosa\")\n", " plt.plot(X[:, 0][y==1], X[:, 1][y==1], \"bs\", label=\"Iris-Versicolor\")\n", " plt.plot(X[:, 0][y==2], X[:, 1][y==2], \"g^\", label=\"Iris-Virginica\")\n", " plt.axis(axes)\n", " if iris:\n", " plt.xlabel(\"Petal length\", fontsize=14)\n", " plt.ylabel(\"Petal width\", fontsize=14)\n", " else:\n", " plt.xlabel(r\"$x_1$\", fontsize=18)\n", " plt.ylabel(r\"$x_2$\", fontsize=18, rotation=0)\n", " if legend:\n", " plt.legend(loc=\"lower right\", fontsize=14)\n", "\n", "plt.figure(figsize=(8, 4))\n", "plot_decision_boundary(tree_clf, X, y)\n", "plt.plot([2.45, 2.45], [0, 3], \"k-\", linewidth=2)\n", "plt.plot([2.45, 7.5], [1.75, 1.75], \"k--\", linewidth=2)\n", "plt.plot([4.95, 4.95], [0, 1.75], \"k:\", linewidth=2)\n", "plt.plot([4.85, 4.85], [1.75, 3], \"k:\", linewidth=2)\n", "plt.text(1.40, 1.0, \"Depth=0\", fontsize=15)\n", "plt.text(3.2, 1.80, \"Depth=1\", fontsize=13)\n", "plt.text(4.05, 0.5, \"(Depth=2)\", fontsize=11)\n", "\n", "save_fig(\"decision_tree_decision_boundaries_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting classes and class probabilities" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 0.90740741, 0.09259259]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree_clf.predict_proba([[5, 1.5]])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree_clf.predict([[5, 1.5]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sensitivity to training set details" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[4.8, 1.8]])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[(X[:, 1]==X[:, 1][y==1].max()) & (y==1)] # widest Iris-Versicolor flower" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=40,\n", " splitter='best')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "not_widest_versicolor = (X[:, 1]!=1.8) | (y==2)\n", "X_tweaked = X[not_widest_versicolor]\n", "y_tweaked = y[not_widest_versicolor]\n", "\n", "tree_clf_tweaked = DecisionTreeClassifier(max_depth=2, random_state=40)\n", "tree_clf_tweaked.fit(X_tweaked, y_tweaked)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure decision_tree_instability_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(8, 4))\n", "plot_decision_boundary(tree_clf_tweaked, X_tweaked, y_tweaked, legend=False)\n", "plt.plot([0, 7.5], [0.8, 0.8], \"k-\", linewidth=2)\n", "plt.plot([0, 7.5], [1.75, 1.75], \"k--\", linewidth=2)\n", "plt.text(1.0, 0.9, \"Depth=0\", fontsize=15)\n", "plt.text(1.0, 1.80, \"Depth=1\", fontsize=13)\n", "\n", "save_fig(\"decision_tree_instability_plot\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure min_samples_leaf_plot\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAxYAAAEYCAYAAADBK2D+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3XucHGWV//HPmZlkJpcJuTKQQBIC4ZKwXAyIEgUUF0SXlQXXK/5A9McqC3hdf2SVBcUFL7juRVZlBQFxVVaCqCjqskK4rEEQEBJMAiEkJGRyTyaTMJnpOb8/qid0erp7uqeruqq6v+/Xq1+Zrq6uOlPpqdPnqed5ytwdERERERGRajTFHYCIiIiIiKSfCgsREREREamaCgsREREREamaCgsREREREamaCgsREREREamaCgsREREREamaCgtJDDO70MzczLaZ2YS811qyr10dU3gVy/4+F1X4nlvMbFWF7znNzK42s6a85TOzx+zCSrYnIiIiMhwqLCSJ9gP+X9xBhOBCoKLCArgG+KsK33MacBWD/55fBl4P3FPh9kREai7bEPLOuOOIWrYB6edp3peZXWxmq82sP00NfhI9FRaSRL8GLjOzjlru1Mxaa7m/Qvt29+fd/YkwtunuPe7+O3ffGMb2REQidiDws7iDkNKyPQpuAL4KTAOur8E+F2QLz29EvS+pjgoLSaIvZv/93FArmtlrzey/zWynmXWb2X1m9toy3nd19iR1tJn9ysx2AnfkvH6umf3OzHZlu2b9l5lNz9vG+8zsiey+d5jZ02b2N9nX7gdOBeZn9+PZZbldvk7JbncbsDj72qCuUGY2xsy+ZGbPm1mPma03szvNrCPbUnRVdtXegX1l31ewK5SZnW9mT5nZK2a2ycy+Z2YH5q2zysxuN7P3mNmz2WP7mJm9IW+9E83sN2a22cx2m9lKM/v3oY6/iEg+d1/v7j1xxyFDmgG0AD9395fdfWeUOzOz1wEXA3+Mcj8SDhUWkkQvA98ALjazGcVWMrNjgAeACQTdjv4PMA54wMyOLXNfd2e38ZfA17Pb/QhwJ7AUeCfwN8DR2e22Z9d5A3B79r3nZNf7D2B8druXAE8QnAhfn31ckrfv7wMvZN97RZHfcSTwG+Ay4BbgL4BLgS3Z3/s7wE3Z1d+Qs6+CzOxi4HvAs8C52f2emf3dxuat/kbgU8CVwLuBZuDnZjY+u62xwK+ADMHxPwv4AkHCEZEGZWb3m9k3zexrZrbFzDaa2cfMrNXMbsg21qw2sw/kvW9vV6ichpHzso0Xu8xsqZn9eZkxjDCzfzWzddkGmTVm9qWc1883s9+bWZeZbcg28kzLef207P7PMrPHsw0nD5rZQWZ2arZxZqeZ/dzMJuW875bsss+ZWWd2ne+a2agSsZqZfSbbeLQ720h1ft46/2BmL+Y0Lt1WznEY5r6+ZGbLsq+vMrOvmFlb9rULCXIbwMrsMZo5nFjKjHc/glx5EbA1qv1IiNxdDz0S8SD4curAYcBEYBtwc/a1luxrV+es/+PsOuNzlo0j+NK9cIh9XZ3d3sfylo8Ftg/sN2f5IcAe4OPZ558Gtgyxj/uBh0r8nl8v8NotwKqc5xdl1/3LMn6XlrzlM7PLL8w+bwY6gd/mrfeG7HqX5yxbRXASn5Cz7ITseu/Le35M3J8dPfTQIzmP7LlvR/bcNJuggcKBXwIfy57jrwF6gANz3ufAO7M/D5y//gScnd3OrcBmYGwZMXwKWAOcAkwHTgY+mPP6RcDbgFnAa4HfAotyXj8tu/9HCRpZjgGeAR4G7gNOyp4DXwD+Led9twBdwH8RNEidCawF/jVvnZ/nPP9HYBnw1myueR/QDbw9+/p52eP59uzvcgJwaZn/FxXtK7vOlcD87P/B24DVwDXZ10ZlfycHTgQOAJqL7PuXwM5SjzLi/xHw5ZzP1Tfi/nzrUfqhlkVJJHffYmZfA64ysy8DzxdY7RSCE+a2nPftMLOfEiSictyV9/z1BMXJ980s9+9jDUGCOwX4Z+D3wAQzux34IUEBsY3K5O+7kDOA9e7+0wq3XcgRwP7AZ3MXuvtDZvYiQdetf8156X/dPbeF6OnsvwNdwlYQFHbfNrMbgAfcfU0IcYpI+i1x96sBzOyfCK6O9rr7v2SXfYFgko75BI1ExXzd3X+Wfc/fE1yZPg54aIj9zwCWAw968K10NfDIwIvufnPOuivN7KPAs2Z2kLu/lPPale7+YHb/3wL+DZjn7n/ILruV4KpzrgxBEbMTeMbM/h9wk5ktcPfu3BXNbAzwSeCMgf0AL1jQpfdvCSbfmEFwJf/X7t6b/V0eG+L3H6TMfeHu1+S8bZWZXUvQmHalu+82s83Z1za6+/oSu/wwQSEyLGb2fwmK0POHWleSQ12hJMm+TnD14QtFXp9IcLLNt56gm1A58t+/f/bf/wZ68x5/BkwCcPcHgL8GDiYoEDZaMNbjmDL3W2jfhUwiaO0Kw8QS+12f8/qALblP/NW+z23Z59uBNwHrgH8HVpvZM2Z2Xkjxikh67e0Pn/1iv4FXGyfIfkHeyqvn3CG3Q3CuoYz3QNBSfxywPNv96u2WMyW3mb3GzO7Odi/q4tUv6tPztpO7/87sv0/nLcuP54++77iD/wVGAocWiHMOwTn13my3qZ0WjPn7aM76/5Vd5wUzu8nM/tqGN9lIOfvCzN5pZg9lu1ztJMjF+cdlSO6+1t2fK/Uo9l4zOwK4luAKeW+l+5b46IqFJJa77zSz64CvEcw+kW8LwWXYfAdQfl9Mz3s+0BJzIbCkwPpdOfH9GPixBWMNTgO+THDCPsjd+4ex70I2EVxOD8NAoVDsmD1e6Qbd/UngvOzVnROABcAdZnasuz8z7EhFJO3yvwx6kWVDNXDufY+7u5lRxntw9z9k+/6fCZxO0I3qqewYjVEE48P+G/gAQdEzGXiQoAAo9nt4dtv5y6pppB1479kEVyIG7dvd12S/aJ8OvIUgJ15lZiflXwGpdl8WDJT+IfB54BMEV6X/kmHM/GRmvyToRlaUu+eP7RvweoL/kyXZ/3MIuvOeYsE4yDGugf6JpMJCku7fCS7dfrHAaw8AbzOzdnfvArBgcPXZBH0xh+MRguLhMHe/tZw3ZFumfm5ms4B/IbjKsJGg/3D7MOMY8GvgPWZ29kB3gAIGTq6jyCl8ClhG0Lr2Hl4d8I2ZnUxwqf1rww3S3fuA35nZlQRJ6CiC/sgiIrHI5oWBBqBbgN8RdK1pJ/jS+vfu/gIEMwGGuOs/M7MxOV/6X0cwRq9Ql96lBOfwGe7+P8U26O6vEHRVuseCQejrCbqR/bqCuMrZ13xgbW53KCsxicoQqukK9RMGd/f6LkEX3GsJjqckkAoLSTR378n2xb2xwMvXEMySdF92HIYT9NkdTfHuU0Ptb4eZ/R1wg5lNIRh8tp1gru5Tgfvd/T+zMXUQDPhbBxwEXA486a/eN2IpcImZvZsgoXS5+7IKQ7od+L/AD7JXbxYTJMUzgX929z9l9wPwqWwLUcbdB/W/dfeMmf0DwZiI27PbnkYwmG8FcHP+e0oxs78gmALwJwQDGMcQHIMugkv/IiKxMLNPEnT7fJKgNf59BAOgXyI4V/UAl2bHhx1FkE/C0gLcnM0TU4EvAf9R6OqCu3eZ2fXA9RY0zS8imETkdUC/u9+YnYmpheD8v5Nglr5egvN22crZF8G4lGlm9n6C8/iZwHsrPQDZ/Q27G292zOI+4xbNrJtg0hQ1WiWYCgtJg+8Cf0cwK8he7v5HMzuN4IvxrYARtEid6u5PDXdn7v5tM1uT3ef7CP5O1hJcJn8yu9pigi/RXycYm7CBoOXoypxNfZlgwPR3CE7eDxB0maokll4zO4PgXhUXZ//dTDAzyUDXpp8TXNm5BPgHguNgg7cG2SS1K/u73U2QpH4BfKbCS+oQJLXdBL/zgQQFxe+BP88b/CgiUmtdvJo3nGCK1LPcfRewy8wuIGj5/luCcRSfBO4Nad8PEHSl/S1BQ9edwGdKrH8lwdXkTwPfJCiAngS+kn19G0Gj2fXACILGpHMHrrZUqOS+3P1nZvZVgklKRhHktX8gyDEiQ7JgTJWIiIiIVCPb5Wqyu/9F3LGIxEGzQomIiIiISNViLyzM7FIzeyx7N8lbSqx3oZllcqdIy3aDERGROqPcIKWY2bfy/s9zH9+KO75aMLPpJY7BTjOreIpYkWolYYzFOoIZf85k6NkD/tfd3xB9SCIiEjPlBinlHyg+BeqOWgaSy90vrOHu1hHcq6PU6yI1FXth4e4LAczsBIKZdUREpMEpN0gp7r6BYNKMhpWd5rvoTeZE4hB7YVGh481sE8FsON8Drsv+YQ1iZhcTzKLDmDGj5x155GG1i1Iqkslk2L59K7t2OX2AjezDHegZQeuIDGPGtDJu3Pi4w4xcX1+G7du3sHu30WcZWtpg0vhJNDc309fbx6ZtW+h/pYlmMowa1cTkyeXcfDa5tmzZxO7dvfRmRmCtvYwftx+jRg13ynOp1NOPP73J3afEHUdIlBtE6kBfXx/bt29h164mMk0ZWtqMyeMn0dQce8/9hlFtbkhTYbGI4A7ELwJzgR8BfcB1hVbOzsd8I8AJJxzrjz76qxqFKZXasGE9N9/8r/zpTxPpmrCZpgldeD/4iwczffwujjiinY985NNxhxm5devWcOutN/Dss/vTPaWTtsm9fPyijzG1YyrLVy7n2/95I70b2hnf3c7xx/dz6aUL4g65Krfd9k3+8IeXWbN9DE0zX+KMU97C2970trjDahjTm6e/GHcMIVFuEKkTq1ev5Pbbv82zzx7IrgPWMWpyhk99+JPsPyndDWlpUm1uSE0J6O4r3f0Fd+9396cJboD2zrjjEhGR+Cg3iIgkR2oKiwKcIjcBExGRhqXcICISk9gLCzNrMbM2oBloNrM2MxvURcvMzjKzjuzPRxLcPfLu2kYrIiK1oNwgIpI+sRcWwOeA3cAVwPnZnz+XMz/zwDzMpwN/NLNu4BfAQuDaOAIWEZHIKTeIiKRM7IO33f1q4OoiL4/NWe/TQP2P4BUREeUGEZEUSsIVCxERERERSTkVFiIiIiIiUjUVFiIiIiIiUjUVFiIiIiIiUjUVFiIiIiIiUjUVFiIiIiIiUjUVFiIiIiIiUjUVFiIiIiIiUrXYb5An4evsXMiqVdfR07OW1tZpzJy5gI6Oc+MOS0REYqTcICJRU2FRZzo7F7Jixafp798NQE/PS6xYEdyUVglERKQxKTeISC2oK1SdWbXqur2JY0B//25WrboupohERCRuyg0iUgsqLOpMT8/aipaLiEj9U24QkVpQYVFnWlunVbRcRETqn3KDiNSCCos6M3PmApqaRu2zrKlpFDNnLogpouTq7FzI4sUnsmjRVBYvPpHOzoVxhyQiEgnlhvIoL4hUR4O368zAIDzN/FGaBjKKSCNRbhia8oJI9VRY1KGOjnN1EhxCqYGMOnYiUo+UG0pTXhCpngoLCUXa5kfXQEYRkeilKTcoL4hUT2MspGoDl497el4CfO/l4yT3TdVARhGRaKUtNygviFRPhYVULY3zoydpIOPAYMHnnjuJefN+zKxZS2seg4hI2NKWG5KYFzSIXNJGXaGkamm8fJyUgYz5gwXb2rp54xt/Rc9Tr2N1z8E1jUVEJExpyw1JzQsaRC5posJCqtbaOi17qXvw8gFJ7GebhIGMhVr0RozoY/6cJ1j9hAoLEUmvNOaGpOYFDSKXtFBXKKnaUJeP09bPtpaKtdy1j+qucSQiIuFSbhietF3pEcmlwkKq1tFxLrNnX09r60GA0dp6ELNnX7/PZeU09bOtpWKDArt2j6lxJCIi4VJuGB4NIpc0U1coCUWpy8dqfSlu5swF+/SlBejtbeHhpcfHGJWISDiUGypXKC/oLumSFrpiIZFT60tx+S16r7wyhgcfPJPla2fFHZqISKSUGwob6kqPSJLpioVETq0vpQ206K1bt4Zbb72BlSv3hymdcYclIhIp5YbikjCIXGQ4VFhI5JIyhZ+IiCSHcoNI/VFhITWh1hcREcmn3CBSXzTGQkREREREqqbCQkREREREqqauUA0maXc5FRGR+Ck3iEgYYr9iYWaXmtljZtZjZrcMse4nzGy9me0ws5vNrLVGYdaFRrrLaWfnQhYvPpFFi6ayePGJdfk7itQz5YbaaZTcoLwgEr3YCwtgHfBF4OZSK5nZmcAVwOnADGAW8PnIo6sjjXKX00ZJkiJ1TrmhRhohNygviNRG7IWFuy90958Am4dY9QLgJndf4u5bgWuAC6OOr540yl1Oh5Mk1ZIlkizKDbXTCLlhuMWTcoNIZWIvLCowF3gq5/lTQIeZTYopntRplLucVpok1ZIlkmrKDVVqhNwwnOJJuUGkcmkqLMYC23OeD/zcXmhlM7s42z/3sY0bh2rwagwzZy6gqWnUPsvq8S6nlSbJSlqy1HolkjjKDVVqhNwwnOJJuUGkcmkqLHYC43KeD/zcVWhld7/R3U9w9xOmTElvw1WYJ6uOjnOZPft6WlsPAozW1oOYPfv6upv5o9IkWW5LllqvRBJJuUG5YUjDKZ6UG0Qql6bpZpcAxwJ3ZJ8fC3S6e902OQ2crAZaTAZOVsCwT/iNcJfTgd+v3KkTW1unZRPC4OW5SrVe1fsxFUkw5QblhiFVmhdAuUFkOGIvLMysJRtHM9BsZm1An7v35a16G3CLmX2fYLaQzwG31DLWWkvLySqJ859XkiRnzlywT5KGwi1ZjTDAUSQplBuKU24YnkqLJ+UGkcrFXlgQJIGrcp6fD3zezG4GlgJz3H21u99rZl8BfguMAu7Me19qlHuyTcPJKoyWs+3bf8H111/H1q0dg14bPXo7X/3qNeEFXEC5LVnltl6JSCiUGxo8NxxwwDg2bhw9aHlHR4Z16zaGF2wRyg0ilYu9sHD3q4Gri7w8Nm/dfwL+KeKQIlXJyTYNJ6tqW846Oxeyfv01bN369wVf37Vrv1DiHEo5LVnltl6JSPWUG5QbNm68pMhrzaHEWA7lBpHKpGnwdl2oZJaJNMzUUW3L2apV1+H+Ssl1jjrqn1i+/IqKYwtbIwxwFJF4KDfsa6j7SyxadCAPPzwnEQOklRtEXhX7FYtGU8nJdjiDzWqt2pazcpKMmbN+/a0AHH74lyoLMGT1PsBRROKh3LCvcnJDJrOVZcs+Dgx/0HpYlBtEAiosaqzSk225J6u4BslVewm42PEoZP3620MrLJI2qFBEGptyw77K79bVG+qgdeUGkeqoK1SNRXEJO845tKu9BDxhwukV7C0zrBjzac5xEUka5YZ9VZIbwhq0rtwgUj1dsaixKC5hxz314HAvAXd2LmTDhmDq+QkT1rN16wGD1pkwYX3Os3AG7MV9vERE8ik3vGogN0yY8KUy8kJ4g9bjPl4i9UCFRQzC7ouZhqkHC8k9iS9ceODe5Xv6mhnRnMFs3/UPOOD8UPab1uMlIvVNuSEwkBty80JxI0IbtJ7W4yWSJOoKVQeKtdYkaerBQoqdrEc0Z/jjS4fQ32+4g7txwAEXhDa+Iq3HS0SkEmk915X6It/SMmHvz83NEzjiiH8OrRhL6/ESSRIVFnWgUN9cMHp6XmLx4hMT2z+02Mm6q2cUD6x4Df9268f4yU8u5tlnPxnqbFBpmKqxXJ2dC1m8+EQWLZqa6P9rEam9wrkBMpnuRJ8rin/BP4iTT17KKae8zCmnvMz8+UtDvcJTL7lBeUHipMKiDuw7SA7AAAdI9OCzQifx3r5mHll5dKT7jWPO8ShO9BpoKCKlDJzrmpsn7LO8r29ros8VcX3Br4fcoLwgcdMYizox0Dd38eITB01ZmNTBZwPxPP/8F+ntfZnu7nYefPY4nuvevyb7rtXxqOSOupXQQEMRGUpHx7msWnUdmczWfZYn+VwR53060p4blBckbios6kycg8/27Onh3nt/wtKlT+PuZb/P/Sy6ujI8t2YS/Qevo3lUH04/Pm4bW7fvx/r1m7juus9GGHk0Ro4cyVve8nZeeeXask70O3bcy7x5P+bkk7vp2j2GR9ccU3L7GmgoIuWI+1zx8ssvcdddP2D79m0VvvOMnJ8fzz7Sr6PjQM49931lFQGV3lcj7v9rERUWdabSmyyFdTOgZ599mrvuupNlz41kp7Ux0BWrXN7WAzNfpKkFDj3sULZs3sqm/s3sGL2LP66dgPWHM9VsTWWaeH7lz3n3u9YOmuEK9j3Rd3YuZOPGa2lrewWAcaO7OfWwR+nbfi90XFRw89Xe2VZEGsNwzhVh5IY9e/bwq1/dzYMP/pEX1o+mf0RbxbHXH6fl2Z2sXPk1/vzPXyqZG4ZzRUN5QeKmwqLOVHK307Auwz766MPcc8/dLF12AJmZq7GRvcEwjwoYMHbMWN5/znuZe/hc+vr6uPf+X3HfI78lM3VjZRtLCof1W/aju7udsWO7Br2ce6Jfteo63F/Z5/URzRn6Nn0LDi9cWFR7Z1sRaQyVnivCyA2ZTIZbbrmBJUu2sGZ7GzbjJY3qBHDo7TeWvjCdN+wew+jR3YNWGcgNw+nWpLwgcVNhUWcq6ZsaVl/Ml15aRW9vC/3N0DSyn0n7T+TMN5xZUdwjWkZw9OFHM3LkSABaWlr4i7e8nde95iSeX/18RdtKgq7uLn7z0H+zu6eH3/3uNN70pl/Q3PzqncPzT/TFLlN7X2fRfcTZD1lE0qPSc0UYueGVV3azbdsWdu8ejY3ZTfOIZk48dh6Hzji0ul8m5Z7+09M8s2wJmbY+nn76dZx44gM0NfXtfT03NwynW5PygsRNhUUdKnfwWfh9MQ3DmDBhAicdd9Iwt7GvyRMnM3ni5FC2VUs7du7gvkfuA4PnnjuaadO2MHfuCvr7NxY80Re7fG0tHSX3U8uBhiKSXpWcK0LPDWaYGccceQxHHxHtrH9J17Onh2eWLQGMNWtm09HRxBFHPFuwCBhutyblBYmTCosGpr6YtbNx4yF0dFzD7NlHFXx95swFLF/+qX26Q/Vmmhl9wEdqFaKICKDcUEvbth3BSSfdVvA1dWuSNFJh0cB00orOqbMPZ/OG2/dZdtVV0NGRYd26wWNGOjrOZevWzaxe/WVaW1+dFersOW+tVcgiIoByQ5T+9u3nsn3L+/dZ9olPFM4N6tYkaaTCooHppBWdzRtGFFze2Vl8dqtx497K448/z7PP7k/3lE7aJvdGFZ6ISFHKDdHZvmXwndCheG5QtyZJGxUWNRbW9K5h0UlLRCR+yg0iUg9UWNRQVHdgFhGR9FJuEJF6ocKihsKa3lWGNm/qVDYWuLQ8pSPD4+vWxRCRiEhhyg21o9wgEi3drqaGwp/eVYoplDhKLRcRiYtyQ+0oN4hES4VFDRWbqk9T+NWfSfsXHnjd0ZEpuFxEGpdyQ+PYb+LugsuVG6ReqCtUDWkKv8bxwIrlXPfN6+jeDK2rpzNnzhre/e6Lit7HohpJG/QpIpVRbmgcN9yzkIW/vIvM6mkcNGYPhx3WwuWXfzaSfSk3SBxUWNSQpvCTsGnQp0j6KTdI2JQbJC4qLGpMU/hJmDToU6Q+KDdImJQbJC4qLKQuTenIFJ35o55o0GflNCuMSONSblBuKER5ITwqLKQuNcqJoLV1Gj09LxVcLoVpVhiRxqXcoNxQiPJCeDQrlEiKzZy5gKamUfss06BPEZHGptwgcdEVC5EU06BPERHJp9wgcVFhIZJyYQ/6jGKKwkLbFBGR6ISZG6KaujZ/u+3tHwohWomTCgsR2SuKKQqLbXPMmLcCk0OJW0REohHV1LWFtrtnz5eYPPlE4MCq45Z4JGKMhZlNNLO7zKzbzF40s/cVWe9qM+s1s505j1m1jlekXpWaojDsbU6Y8D/D3mY1is3+Um+zwtQD5QaR+EWRF4pt172HmTOfqGq7w6G8EJ6kXLG4AdgDdADHAfeY2VPuvqTAuj9y9/NrGl0BuqOl1KMopigs9t7m5u3D3mY1cmeFyZ1icGNnM9ObDwYqm2JQ0xRGKlW5QXlB6lFUU9cWe39ra3dV2x2O/HP1wHk9Ny9A+ef1Rs4LsV+xMLMxwHnAle6+090fAn4KfCDeyIobuHwXTOXmey8LdnYujDs0kaoUm4qwmikKi703k9lv2NsMSxhTDGqawmikLTcoL0i9iiIvlHp/T8+YqrYbhmrP642cF2IvLIDDgT53X56z7ClgbpH1zzazLWa2xMw+WmyjZnaxmT1mZo9t3Lg5zHgjuywoErcopigsts2tW9887G3KYPOmTmV688GDHvOmTo07tOFKVW5QXpB6FdXUtYW2a9bKqlXHV7Vd2Vetc0MSukKNBXbkLdsOtBdY9w7gRqATOAm408y2ufsP8ld09xuz63LCCcd6mAHrjpZSicMOW8JrX3sfL7/8fbZsSXb3iCimKCy2zZUrO4GXwwg7daK4TF6HLWSpyg3KC1Kpgw9ewZw5j7Jo0Q2J7joX1dS1hbbb3v4hNm16oeqY06oeckMSCoudwLi8ZeOArvwV3X1pztNHzOxfgHcCg5JHlHRHSynX4Qe+wKlHP8qIEX1AebNp3HnnlbzySvDd6YefG1h6MK1j5nHWhZ+OOuTQp68tvs1vhrqPNKnDIiAKqcoNygtSiSMOWca8eQ/Q0lJebpg6dQqdBc4PHR0Z1q3bGG2wRJMXCm139eqVwLdD309a1ENuSEJXqOVAi5nNzll2LFBocF4+ByySqErQHS2lXPOPeHJvUTFgqO4RA0VFvp7u8aHGJpJwqcoNygtSifmveWRvUTGgVG4oVFSUWi4Sl9gLC3fvBhYCXzCzMWY2H3gH8L38dc3sHWY2wQKvBS4H7q5txEGFPXv29bS2HgQYra0HMXv29Ym8hCnxah+1q+Dyge4RnZ0LWbz4RBYtmsqqVX/J5MkraxlewwtjikFNUxiNtOUG5QWpRPuYQRfegCA35OaFxYtP1AQAMaj2vN7IeSEJXaEALgFuBjYAm4GPuvsSM3sj8Et3H5td7z3Z9VqBl4Avu/utcQQc1WVBqS9du0czbvTg4qK1ddqgmwP19a3nsMOiv6Qtrwpj2r96nzowZqnKDcoLUq6u7nbGjR1cXDQ3jy9yM7pLahxhRzoOAAAgAElEQVRhY6v2vN7IeaGswsLMRgErgH5gtrv35Lz2HeCDwPvd/YfDCcLdtwDnFFj+IMEAvoHn7x3O9kXisnLDNI6dsQLL6ZQx0D2i0Cwyzc3135pRb5IyX/mUjkzROKKi3CBSudn7r6alZQ/uDMoNZpDJDJ5dTNKnUXNDWYWFu+82s6uA7xCUzV8HMLPrgA8BfzvcxJEWuvGRVGrWpBeYe9DKfRIHGPvv/y46Os5l2bJL4wpNQjScwXZRnOjjaCFTblBukMqM6nuY04/8AyPyGpFaWiZw6KFfVF6oI42aGyrpCnUL8AlggZn9B/Bh4ArgKnf/9whiq5mhEkN+l5VyZvaRyiSlsg/TiQc/yYiW/JOBs3XrfUDxWWTGj+9k27aOQctbx2yLIkyJQVo/00XcQh3mhnIKBuWGaNVjXhjXdwctBa5MNzePoaPj3OxnbnBemDhxA1u27D9oeUcD9NlvJGn9XOcqu7Bw94yZXQH8jGBQ3JuAf3P3L0QVXC2UkxhK3fhIySMcaZtirZyEN3Zk6YHbM2cu2OezB5DJNHPZZR/i6a5JtE3u5eMXfYypHVNZvnI53/7PG+ndUHjGKJG41GNuKLdgUG6IVj3mhWYK35SxVF5oahrFM8/8WJ8pSYWKZoVy958DTwBvBn4EfCz3dTNrNbP/MLOVZtZlZsvN7LLwwg1fOXdLjfrGR5oBIn3KSXg794wuuM7AvPb5s8i0tBzAc8+dzMqVc0KPVyRK9ZYbyr2LdpS5QXkhfcrJCxkmFVynWF7Q7GKSNhXNCmVm7yaYRxygy93z71raAqwHzgBWAscAvzKzTne/o9pgo1BOYojyxke6lF6/fr/mON44c/E+3aHy57XPnUVm3bo1PPDADTWPU6Ra9ZYbyi0YosoNygv1a0fLu2jvuXGfMRal8oJI2pR9xcLMzgBuA+4CfghcZGZH5a7j7t3ufqW7P+fu/e7+JPBT4A1hBh2mYgkgd3mUNz4qt2VM0mfl5kP476dPoqtrHO7Q1LS/Wp7qUCPPVw71mRvKyQsQXW5QXqhfu1vmc9+fXsOOne24w5497coLdapRc0O5082eRHCjooeB9wMHAecB11FgKsCc940A3ghcX3WkESnWnzG/9QCIZOaPqLtZSXiK9Z8tZfnLh/Di4lOZM2cN7373RXR0HDXke2bNWsoJr7uf9lHdvPL8b+liATB3mFFL1OphsN1w1WtuKCcvQHS5QXkhXSrNDSs2TOdPj72eg8bs4bDDWnjLW8r7vGgGsnRp1NwwZGFhZnOAXwDLgXOy85Q/b2Y3AR8xs/nu/nCRt38D6CJozUqkchNDVJcmo+xmlSZxzMFfqVoMGJw8eSWzZv0vI0b0AeB969m44gpsv49Hvu8kqcfZYOpNPeeGSgqGKHKD8kIgDXkBapMb1D1OeSEtShYWZjYd+BWwFTjL3XfkvHwNcAHwFWB+gff+E/B64M3uvie0iCMQZ3/GYi1jEyaczuLFJzZMy0TSTgrDuToB1Se8GTP+sLeoGOD9u7EdNwGnVbXtNEnbbDCNphFyQxLzwsyZCxqq1Vp54VWagUx5IS1KFhbuvho4uMhr64CC096Y2T8DpxMkjk3VBlnPCrWMTZhwOhs23NHQLRNxq/REtTqzpuTrU6a8QGfne3j55Y0lvwy0tnYX3kBmY0XxRK3QlxuIprugJI9yQ7SKXTEBGr7VOk7D+QJbKjccfPAK5sx5lEWLbhjynJmG7nHFit5GKoalwlmhymFm/0ow5eCb3D1Z34YSKr9lbPHiExu+ZaKeHHbYEg4//Hf09wetVqW+DPT0jKGtrUBx0Twl8jjLVeiS/LJlH8cM3Hv3LtMXHsml3FCZQldMlBvqxxGHLGPevAdoaQmuUA91zkx697hiXbW2b39UDaUNpqL7WAzFzGYAlwGHAS+Y2c7s45dh7qeedXYuLHjygGS1TEj5Tjrpfprz7rRabIaXF198Db29+9b71jQKH/ehSGOsRKFL8tC7t6gYoFlsZIByQ3UG7mmh3FA/5r/mkb1FxYBS58woZ6cMQ7GuWuvX364ZzhpMqFcs3P1FwMLcZiMZqPiLSUrLhOxrqP6zY8fuKLi80JeBTZtmsXbtuL2zQjWNOIAphy7g5e65wPNhhFu1Sr7E6AuPgHJDNfJbggtRbkimUrmhfUxXweXFzplRzk4ZhuLn+sLHQLmhfoXeFUqGr3BLcCBJLROwb1/KCRP248ADj2HZi8nprhOlocZT5Nu5cxzt7YOLi2JfBlaunMPTXZNom9zLxy/6GO0dU3l55fJhxRqFYpfki607HGmZDUYkaqXyAiQrN+TmhZEjD2TSpMN57rlj4g6rJirNC13d7YwbO7i4KHXOTPKN84rnhWYKFRfDyQ3KC+kQalcoqU6pCj5JN9AZaEELTiJOc/M25s59mMMPXRp3aKEJ88Y2ixefRiaz78kwSV8GKlXokjyMILg1wauq+R0fX7eO1Zk1gx5JmyVGJGql8kJr60GJyQ35eWHPnnXMmvUgM2cuizu00ISZFx7+w8n09e3btltveaGpaRQHHHB+aF24lBfSQVcsEqT44KyDEpE4BhRqQWtpyTD/pEU8t/itMUUVrjBPVM89N5dp0zYzd+4K+vtLzwqVBqVmrEnqZXqRtCqVF0466fcxRFRYobzQ3JzhuOMe4fF1M2KKKlxh5oVlLxzBxNY+jj/+UUaO3Jn6c2aprlr77fda5YYGosIiQcq922vcirWgtRcZS1CJWt8Ap1b727jxEDo6rmH27KHvvJ0GxS7JK1mIhCvteWFMkbEElaplbqjVvtasmU1r61FcfvlnQ9tmnErlBeWGxqGuUAnS0XEus2dfT2vrQYAl6jJ3rmJ9I7t2jqt627W+AY5uuCMiSZb2vNDd3R7K9mt5rlZeEBk+XbFImDRU9oVa0Pr6mnl48SkxRiUiUp/SmhcymWaefPLkGKMSkVrTFYuIDMw7vmjRVBYvPpHOzoVxhxSa/Ba0TGY8S5bMZ/nzc+IOTUQksRopL4wcOZWVK9/IqlVHxB2aiNSQrlhEoNgdKKF++qDntqAtXPh9Xn55ScwRSSOr9dgckUo1Wl7o7t7JokVfjjkiaWTKC/HQFYsIFLsDZa3uNFnPrWIihahPtCRd3HkBlBuksSgvxENXLCJQbHaMWtxpMq5WsW984+t0d++39/lX/0/wb6UtA7W+AY5uuFNc7s2uqpkiMKztiKRZnHkB4skN//iP17Nz56t54cdXB/8Op8W4ludq5YXSoswNcFz4AUtNqbCIQPF5x4d3F+JKlGoVi/LLXG5RkavSloFaX57U5dDCwvoSUmo7cdIlcqm1OPMCxJMbcouKXMNpMa7l36XOAcVFnRsmTvxM+EFXQLmheuoKFYFid6CsxbzjcbeKyfDt2HEv8+b9mA996Kt88M/v5NDJq2KLJaxuG0no/lFIqUvk05sPZnrzwcybOrXGUUk9izMvgHJDmiWpC1vUuWHr1m9VHWM1lBuqp8IiAnHOO16s9atWrWIyPJ2dC9m48Vra2roxg3Gjuzn1sEfp235vLPGE9SUkzV9m1A9XwhT3/SiUG9JpoGU/uNrle1v24youos4NmcyGimOqNeWG0tQVKiJxzTuelru0yr5WrboO91f2WTaiOUPfpm/B4RfVPJ6wum3UqvuH+kRLGsR5PwrlhnSKq3tzMVHnhubm/YcdWz7lhXjoikWdibtVTIanWOuN93XWOJJAWN02atX94/F161idWTPooT6xIgHlhnRK2lXfqHPDhAkfqTrGAcoL8dAVizoUR6vYmDHbCw7gVsuA0+9Of7/zzDNPsGbNqoJrNTdPxGzzoOXW0rHvgianvz+YI/5HP7ol/HBztLWdTXv7r2lu3kYmM56urjNYu3YHUNl+i21n8+aN9Pcbbv2RxC8i+6p1bhg7dnvBAdzKC+WLe9B/voHPT7WzQhXbTk/PccC3ww5bakiFRRGaIrMyl176CZa9OIWmQ1dx6BEzuOz9l8UdUqxGtY5i1KjRdI/czJ5pq1n6/GRefPFF4MWC6x9yyAmcfPJ9tLT07V3Wl2lm4sxPAjBhvwm0tIygt30b2z3D08+OZ8WK9RH/FgcCF+QtG84+C2+nLzOSHa27sKlrsaYmOiZ3FHpzJIpdIq8VzTySXsoN5fvsZz/Nc8+NZvPIV2jp2MZF77qQo484Ou6wUiWJXdjCKlALbWf16pVVb7caceaGeskLKiwKaIQ7pEq0RowYwScu/Di3/ORWnlvxPL1ta9naX7zn4dZd4+he8hrmH/407W272dXXzuRDrmDK9PcAMGXSFD72wcu4+b9uYYNtYFf7TnaV2F4t3PuNG+npHj9oeeuYbbz10ovL2IJjLf2MGjWa957zLo498tjwgywi9yRd6mQeFd24KZ2UG6TWwrpCUEtTp06hs8C5rKMjw7p1G2OIqHxx5oZ6yQuJKCzMbCJwE3AGsAlY4O7/WWA9A74EfDi76DvAFe7uYcZTy8FSav2qX+1j27ns/EtZsnwJd/3mbnr37Cm5/sa+o/nZ8mM565S3csIxJxB83F81tWMqf3/JFdz/u/t58PcP0d8fbxeiQkXFwPKJU9rL2saRs47kr956Dm2tbWGGVpE0tQQ1GuUG5QaJd9D/cBQqKkotTyrlhuFJRGEB3ADsAToIbrt4j5k95e5L8ta7GDgHOBZw4DfAC0CoEx/XarCUWr8aw9zD5zL38LmhbKupqYk3n/xm3nzym0PZXjW+W+I+d5//xNU1i0PqmnIDyg0ikh6xzwplZmOA84Ar3X2nuz8E/BT4QIHVLwC+5u4vufta4GvAhWHHVKv5vpN68zARkbgpNyg3iEj6xF5YAIcDfe6+PGfZU0ChJt652deGWg8zu9jMHjOzxzZuHDzbTim1miIzadPIiYgkiHJDmctFRJIiCYXFWGBH3rLtQKFO2mOzr+WuN9byO6MD7n6ju5/g7idMmTKpooBqNd+37oQq0piKDf7TNJz7UG4oc7mIpF+95IUkjLHYCYzLWzYO6Cpj3XHAzrAH6EFtBkslcRo5kXLprqbDp0GBZVFuUG6QFOroyBSdFUqKq5e8kITCYjnQYmaz3X1FdtmxQP7gPLLLjgUeHWK9VEjjNHIiA+rlJCiJpdyg3CAplPQpZSVasRcW7t5tZguBL5jZhwlm/ngHcHKB1W8DPmlmvyCY+eNTwL/VLNgIhNH6pWkJRaTeKDdUlxuUF0QkDkkYYwFwCTAK2AD8APiouy8xszea2c6c9b4N/Ax4GngGuIcGv/f7wLSEPT0vAb53WsLOzoVxhyYiUi3lhmFQXhCRuMR+xQLA3bcQzEGev/xBgkF5A88d+Ez2Uek+qgkxsVaturbItITXsv/+f1WjKNJzbEvdSbOcrj3Vvl9EyqfcMDzJyAvpEmduqMfP4HDpUKRfIgqLqK1bt5YvfvGzcYcRiVNOWcvgeU/glVdq9zv39zsbN43F99uKWz9NTcUvhMX9xbzQvkstD/v9Eq64P0+SbvWaG5KQFwB2786wfXc7jA/G2ys3QJNlj8F+W9m8eRLjNmypy89gNXbsaGGXvQIje4CWV49ZBeL+PDWyhigsdu9p4onVbXGHEYnju8cybuzOQcu7usfW8Hfuh4kbsdZeRo8ey9ve8Laia+qLuYRJnyepxq46zQ3JyAtAyx6YvpqmFjhk1qEcNuOwoqs2yt/yMUcdw0NPPMy6/pfZ3babpZsmQqa+fseqjdqNzVhDU0sTrzvuJCZNqGxaaGicz1MSNURhwchebGpn3FFE4uFVR/KWo55gRPOr07j1Zpp5eNWRNf2dm5qaOPGY13Le286lrbX+ErWI1B+r09yQlLxgQGvbaN579l9z3NzjKHBbkYYzbuw4PvPhv2PRow/ys9/cQ2/HprhDShwDJkyYyAffeQEzDpoRdzhSoYYoLCZPmMwH//qCuMOITG/PAzTv/h7WvwlvmkzvmA8w/5RTmV/DGDqmdDC1Y2oN9ygiUp1JEybVbW5IQl5oam7i8EMOZ1TbqKFXbiBNTU2c9rpTmXf0a3h+9fN4vwYW5BrZOpIjZx1Jc7OuLqRRQxQWo0aN4vijj487jAgdD3w87iAkJupLKjI8o0eNruPcoLyQdO1j2zluznGRbb9Rc8O8qWrkjFNSppsVqYlid4Uu927R1b4/CupLKiJSHeWG+lHvv1/SNcQVC0mOKR2Zoi0otVBtK009t/KIiMRFuUFqIc5Cr1GosJCa0slXRETyKTdILehzFj0VFtJQ0tjnNI0xi4ikSdrOs2mLVxqHxlhIQ0ljn9M0xiwikiZpO8+mLV5pHCosRFIuiYMGa6WRf3cRkVIa9fzYqL93UqgrlEjKNfJl78fXrSvYJWBjZzPzpk6N/diou4KIxKVRzzEDv3f++XdjZzPTmw9OxPm3nnODrliISKoluUtAkmMTEalnST7/Jjm2aumKhUhE0toikda4RUTSII3n2DTGLPHQFQtpKLXsexlWi0St+4vWc0uKiEghacsNyguSVLpiIQ0ljS0raYxZRCRN0naeTVu80jh0xUJERERERKqmKxYiMZjefPDen6Pso9oI/WKndGSK/o5xS3JsIpI8yg3hSfL5N8mxVUuFhUjMouyj2gj9YmuRBIebhOslQYtI7Sk3VCfq8281xVk95wYVFiIRKdYikXRpbkmJqhWuEZKwiNRGGnOD8sJgyguFqbAQiUj+CSv3EneSpbklRSd6EUm6NOYG5QUplwZvi4iIiIhI1VRYiIiIiIhI1VRYiNSxWt9ESUREkk+5QaKiMRYidSzN/WKTJM0DF0VE8ik3VE95oTAVFiI1opNQ9Mo9xpXOEqIkLCJRUW6IViXHt5LcoLxQmAoLkRppxJNQV+ddbFn1Ffp61tHSOpWJMz9De8dfRba/co+xZgkRkaRotNyQ1LwAyg1hUGEhIqHp7e3l1w/+hjVr13DA6GeYM+kempv6AOjrWcvLz36K+x76H9bvOjrmSL9Y9JVvfe/bNYxDRKRxdHXexcYVV+D9u4EgL2xccQVApMWF1I4KCxEJxbLnl/G9u77Pjh07oN846Y2/3ltUDGhu6mPG2F/zP08k99SzdNmf4g5BRKQubVn1lb1FxQDv382WVV9RYVEnkpvdRSQ1Fj26iLvuvZvMnn586wTausbR3ra74LrtbbtpWzO9xhGWL8mxiYikWV9P4W5JxZZL+qiwEJGqLV+5HM8Aa6Yza0oX0w57hb6+/RgxYvugdfv69uOMN4+tfZA5fnh78dfijq1WSh0DEZEotLROpa9nbcHlUh9iLyzMbCJwE3AGsAlY4O7/WWTdq4HPAj05i49x95VRxykiZci0MHJkPwceeBBHH30tK1Z8mv6cy95NTaM4+uhrOf30c2MMEhYsyNBZYDBeR0eGCy64JIaIau+ii66JO4SilBdE6tPEmZ/ZZ4wFgDWNYuLMz8QY1as0Q1f1Yi8sgBuAPUAHcBxwj5k95e5Liqz/I3c/v2bRiciwdHQExcOqVdfR07OW1tZpzJy5YO/yOK1btzHuEKQ05QWROjQwjqKWs0JVotFm6IpCrIWFmY0BzgOOdvedwENm9lPgA8AVccYmUqlK743QCDo6zk1EISHpobwg9UR5YbD2jr9KTCEh4WuKef+HA33uvjxn2VPA3BLvOdvMtpjZEjP7aLThiZSv1PzX05sPZnrzwcybqn6kIkNQXpC6UU5eUG6QehJ3V6ixwI68ZduB9iLr3wHcCHQCJwF3mtk2d/9B/opmdjFwMcC06dNCC1ikGmm9yY5a3aSGIssLoNwgyaTcIPUi0isWZna/mXmRx0PATmBc3tvGAV2FtufuS919nbtn3P0R4F+AdxZZ90Z3P8HdT5g4ZWKYv5ZIVeZNnbpPS1UaWqx0N1IJS5x5Ibu+coMkknKD1INIr1i4+2mlXs/2pW0xs9nuviK7+Fig2AC9QbsAbPgRitSeTsTSyJQXRApTbpB6EOsYC3fvBhYCXzCzMWY2H3gH8L1C65vZO8xsggVeC1wO3F27iEVEJErKCyIi6RX34G2AS4BRwAbgB8BHB6YUNLM3mtnOnHXfAzxHcEn8NuDL7n5rjeMVKUjzXIuERnlB6oLygjSauAdv4+5bgHOKvPYgwUC+gefvrVVcIpXKHag2vfngGCMRSTflBakX+QOYlRuk3iXhioWIJFyxVje1xomINC7lBskX+xULkUYzpSNTdHq+pNK0gSIi0VJukHqgwkIkAqUSRNpPxIXnLf88rWO2cdZfXhtLTCIiadB4uaE+fjcpnwoLkQjU20m0WMLI1dM9vkbRiIikUyPmBk2X21g0xkJEhqTEICIi+ZQbJJ8KCxERERERqZoKCxERERERqZoKCxERERERqZoKCxEJReuYbXGHICIiCZPk6XIlfJoVSkSGNNQUid/54Xd45tln6X9+Zu2DExGRWNTz9LkyPCosRGRIShAiIpJPuUHyqSuUiIiIiIhUTYWFiIiIiIhUTYWFiIiIiIhUTYWFiIiIiIhUTYWFiIiIiIhUTYWFiIiIiIhUTYWFiIiIiIhUTYWFiIiIiIhUTTfIE6nQvKlTdadRERHZS3lBJKArFiIVKpQ8Si0XEZH6prwgElBhISIiIiIiVVNhISIiIiIiVVNhISIiIiIiVVNhISIiIiIiVVNhIVKhKR2ZipaLiEh9U14QCWi6WZEKaepAERHJpbwgEtAVCxERERERqZoKCxERERERqZoKCxERERERqZoKCxERERERqZoKCxERERERqVqshYWZXWpmj5lZj5ndUsb6nzCz9Wa2w8xuNrPWGoQpIiI1pNwgIpJOcV+xWAd8Ebh5qBXN7EzgCuB0YAYwC/h8pNGJiEgclBtERFIo1sLC3Re6+0+AzWWsfgFwk7svcfetwDXAhVHGJyLlGT9uPCOaxtA2cgQjRzbT1BR3m4WkmXKDiEg6pekGeXOBu3OePwV0mNkkdx+UfMzsYuDi7NOe6c3Tn6lBjNWYDGyKO4ghJD3GpMcHDRTje9+7IIRQimqY4xihI+IOICT1nBuS/hkCxRgWxVi9pMcH6YixqtyQpsJiLLA95/nAz+0UaNVy9xuBGwHM7DF3PyHyCKugGKuX9PhAMYZFMVbPzB6LO4aQ1G1uSHp8oBjDohirl/T4ID0xVvP+yPormNn9ZuZFHg8NY5M7gXE5zwd+7qo+WhERqQXlBhGR+hXZFQt3Py3kTS4BjgXuyD4/FugsdKlbRESSSblBRKR+xT3dbIuZtQHNQLOZtZlZsWLnNuBDZjbHzMYDnwNuKXNXN1YfbeQUY/WSHh8oxrAoxuolNj7lhr2SHh8oxrAoxuolPT5ogBjN3cMKpPKdm10NXJW3+PPufrWZTQeWAnPcfXV2/U8C/w8YBdwJfMTde2oYsoiIREy5QUQknWItLEREREREpD5osnkREREREamaCgsREREREalaXRYWZnapmT1mZj1mdssQ615oZhkz25nzOC1JMWbX/4SZrTezHWZ2s5m11iDGiWZ2l5l1m9mLZva+EutebWa9ecdxVlwxWeDLZrY5+/iymVnY8VQZY02OWYH9VvL3UfPPXSUxxvj322pmN2X/f7vM7EkzO6vE+nH8/ZYdY1zHsZaUF0KLUXkh2hhjyQvZfSc6NyQ9L2T33fC5oS4LC2Ad8EXg5jLX/193H5vzuD+60PYqO0YzOxO4AjgdmAHMAj4faXSBG4A9QAfwfuCbZja3xPo/yjuOK2OM6WLgHIKpJ48Bzgb+JoJ4qokRanPM8pX12YvxcweV/Q3H8ffbAqwBTgX2I5iJ6A4zm5m/YozHsewYs+I4jrWkvBAO5YVoY4R48gIkPzckPS+AckN9FhbuvtDdf0KBu64mRYUxXgDc5O5L3H0rcA1wYZTxmdkY4DzgSnff6e4PAT8FPhDlfkOM6QLga+7+kruvBb5GxMdsGDHGooLPXs0/dwOS/jfs7t3ufrW7r3L3fnf/OfACMK/A6rEcxwpjrHtJ/0yB8kINYlJeKCHpuSElf8MNnxvqsrAYhuPNbJOZLTezK634fOlxmQs8lfP8KaDDzCZFuM/DgT53X56331ItU2eb2RYzW2JmH405pkLHrFTsYan0uEV9zKoRx+duOGL/+zWzDoL/+yUFXk7EcRwiRkjAcUyYpB8P5YXKY1JeCEcizmlDSMTfbyPmhqSdKOOwCDgaeJHgP/lHQB9wXZxB5RkLbM95PvBzO9FV7mOBHXnLtmf3WcgdBDdV6QROAu40s23u/oOYYip0zMaamXm0cyxXEmMtjlk14vjcVSr2v18zGwF8H7jV3f9UYJXYj2MZMcZ+HBMmDcdDeaHymJQXwhH7OW0Iifj7bdTckLorFmZ2v5l5kcdDlW7P3Ve6+wvZy0FPA18A3pmkGIGdwLic5wM/d0UYY/4+B/ZbcJ/uvtTd17l7xt0fAf6FKo9jAZXEVOiY7Yw4eRTa78C+B8VYo2NWjdA/d2GL4u+3EmbWBHyPoO/0pUVWi/U4lhNj3MexWsoLgPJCOTEpL4Qj0bkhCeezRs4NqSss3P00d7cijzeEsQugqlkiIohxCcFgswHHAp3uPuyKtowYlwMtZjY7b7/FLpUN2gVVHscCKomp0DErN/ZqVHPcojhm1Qj9c1cDNTuGZmbATQSDMc9z994iq8Z2HCuIMV/SPoslKS8AygvKC7WTttxQ02PY6LkhdYVFOcysxczagGag2czaivUJM7Ozsv3LMLMjgSuBu5MUI3Ab8CEzm2Nm4wlG8N8SZXzu3g0sBL5gZmPMbD7wDoLqdhAze4eZTbDAa4HLCfk4VhjTbcAnzWyamU0FPkXEx6zSGGtxzAqp4LNX889dpTHG9feb9U3gKOBsd99dYr3YjiNlxhjzcawJ5YXqKS9EH2NceSG770TnhpTkBWj03ODudfcAriaoqnIfV2dfm05w+Wl69vn1BH0Zu4GVBJd5RiQpxjcieEIAAALYSURBVOyyT2bj3AF8F2itQYwTgZ9kj81q4H05r72R4BLywPMfEPQJ3An8Cbi8ljEViMeArwBbso+vAFajz1+5MdbkmJX72UvK566SGGP8+52RjemVbDwDj/cn5ThWEmNcx7GWj2KfqexriTgelcQY4+dKeSHaGGPJC6U+fwn67JUVX5znM5Qbgj8oERERERGRatRlVygREREREaktFRYiIiIiIlI1FRYiIiIiIlI1FRYiIiIiIlI1FRYiIiIiIlI1FRYiIiIiIlI1FRYiIiIiIlI1FRYiIiIiIlI1FRYiIiIiIlI1FRYiITKzUWb2kpmtNrPWvNe+Y2YZM3tPXPGJiEjtKTdIo1BhIRIid98NXAUcDFwysNzMrgM+BFzm7j+MKTwREYmBcoM0CnP3uGMQqStm1gw8BewPzAI+DHwduMrdvxBnbCIiEg/lBmkEKixEImBmfwH8DPgf4E3AN9z98nijEhGROCk3SL1TYSESETP7A3A88EPgfZ73x2Zm7wIuB44DNrn7zJoHKSIiNaXcIPVMYyxEImBm7waOzT7tyk8cWVuBbwCfrVlgIiISG+UGqXe6YiESMjM7g+BS98+AXuCvgT9z92eLrH8O8M9qlRIRqV/KDdIIdMVCJERmdhKwEHgYeD/wOaAfuC7OuEREJD7KDdIoVFiIhMTM5gC/AJYD57h7j7s/D9wEvMPM5scaoIiI1JxygzQSFRYiITCz6cCvCPrGnuXuO3JevgbYDXwljthERCQeyg3SaFriDkCkHrj7aoIbHxV6bR0wurYRiYhI3JQbpNGosBCJSfZmSSOyDzOzNsDdvSfeyEREJC7KDZJmKixE4vMB4Ls5z3cDLwIzY4lGRESSQLlBUkvTzYqIiIiISNU0eFtERERERKqmwkJERERERKqmwkJERERERKqmwkJERERERKqmwkJERERERKqmwkJERERERKqmwkJERERERKr2/wEBYg+e/COn9QAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.datasets import make_moons\n", "Xm, ym = make_moons(n_samples=100, noise=0.25, random_state=53)\n", "\n", "deep_tree_clf1 = DecisionTreeClassifier(random_state=42)\n", "deep_tree_clf2 = DecisionTreeClassifier(min_samples_leaf=4, random_state=42)\n", "deep_tree_clf1.fit(Xm, ym)\n", "deep_tree_clf2.fit(Xm, ym)\n", "\n", "plt.figure(figsize=(11, 4))\n", "plt.subplot(121)\n", "plot_decision_boundary(deep_tree_clf1, Xm, ym, axes=[-1.5, 2.5, -1, 1.5], iris=False)\n", "plt.title(\"No restrictions\", fontsize=16)\n", "plt.subplot(122)\n", "plot_decision_boundary(deep_tree_clf2, Xm, ym, axes=[-1.5, 2.5, -1, 1.5], iris=False)\n", "plt.title(\"min_samples_leaf = {}\".format(deep_tree_clf2.min_samples_leaf), fontsize=14)\n", "\n", "save_fig(\"min_samples_leaf_plot\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "angle = np.pi / 180 * 20\n", "rotation_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])\n", "Xr = X.dot(rotation_matrix)\n", "\n", "tree_clf_r = DecisionTreeClassifier(random_state=42)\n", "tree_clf_r.fit(Xr, y)\n", "\n", "plt.figure(figsize=(8, 3))\n", "plot_decision_boundary(tree_clf_r, Xr, y, axes=[0.5, 7.5, -1.0, 1], iris=False)\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure sensitivity_to_rotation_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "np.random.seed(6)\n", "Xs = np.random.rand(100, 2) - 0.5\n", "ys = (Xs[:, 0] > 0).astype(np.float32) * 2\n", "\n", "angle = np.pi / 4\n", "rotation_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])\n", "Xsr = Xs.dot(rotation_matrix)\n", "\n", "tree_clf_s = DecisionTreeClassifier(random_state=42)\n", "tree_clf_s.fit(Xs, ys)\n", "tree_clf_sr = DecisionTreeClassifier(random_state=42)\n", "tree_clf_sr.fit(Xsr, ys)\n", "\n", "plt.figure(figsize=(11, 4))\n", "plt.subplot(121)\n", "plot_decision_boundary(tree_clf_s, Xs, ys, axes=[-0.7, 0.7, -0.7, 0.7], iris=False)\n", "plt.subplot(122)\n", "plot_decision_boundary(tree_clf_sr, Xsr, ys, axes=[-0.7, 0.7, -0.7, 0.7], iris=False)\n", "\n", "save_fig(\"sensitivity_to_rotation_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Regression trees" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Quadratic training set + noise\n", "np.random.seed(42)\n", "m = 200\n", "X = np.random.rand(m, 1)\n", "y = 4 * (X - 0.5) ** 2\n", "y = y + np.random.randn(m, 1) / 10" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,\n", " max_leaf_nodes=None, min_impurity_decrease=0.0,\n", " min_impurity_split=None, min_samples_leaf=1,\n", " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", " presort=False, random_state=42, splitter='best')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "\n", "tree_reg = DecisionTreeRegressor(max_depth=2, random_state=42)\n", "tree_reg.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure tree_regression_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "\n", "tree_reg1 = DecisionTreeRegressor(random_state=42, max_depth=2)\n", "tree_reg2 = DecisionTreeRegressor(random_state=42, max_depth=3)\n", "tree_reg1.fit(X, y)\n", "tree_reg2.fit(X, y)\n", "\n", "def plot_regression_predictions(tree_reg, X, y, axes=[0, 1, -0.2, 1], ylabel=\"$y$\"):\n", " x1 = np.linspace(axes[0], axes[1], 500).reshape(-1, 1)\n", " y_pred = tree_reg.predict(x1)\n", " plt.axis(axes)\n", " plt.xlabel(\"$x_1$\", fontsize=18)\n", " if ylabel:\n", " plt.ylabel(ylabel, fontsize=18, rotation=0)\n", " plt.plot(X, y, \"b.\")\n", " plt.plot(x1, y_pred, \"r.-\", linewidth=2, label=r\"$\\hat{y}$\")\n", "\n", "plt.figure(figsize=(11, 4))\n", "plt.subplot(121)\n", "plot_regression_predictions(tree_reg1, X, y)\n", "for split, style in ((0.1973, \"k-\"), (0.0917, \"k--\"), (0.7718, \"k--\")):\n", " plt.plot([split, split], [-0.2, 1], style, linewidth=2)\n", "plt.text(0.21, 0.65, \"Depth=0\", fontsize=15)\n", "plt.text(0.01, 0.2, \"Depth=1\", fontsize=13)\n", "plt.text(0.65, 0.8, \"Depth=1\", fontsize=13)\n", "plt.legend(loc=\"upper center\", fontsize=18)\n", "plt.title(\"max_depth=2\", fontsize=14)\n", "\n", "plt.subplot(122)\n", "plot_regression_predictions(tree_reg2, X, y, ylabel=None)\n", "for split, style in ((0.1973, \"k-\"), (0.0917, \"k--\"), (0.7718, \"k--\")):\n", " plt.plot([split, split], [-0.2, 1], style, linewidth=2)\n", "for split in (0.0458, 0.1298, 0.2873, 0.9040):\n", " plt.plot([split, split], [-0.2, 1], \"k:\", linewidth=1)\n", "plt.text(0.3, 0.5, \"Depth=2\", fontsize=13)\n", "plt.title(\"max_depth=3\", fontsize=14)\n", "\n", "save_fig(\"tree_regression_plot\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "export_graphviz(\n", " tree_reg1,\n", " out_file=image_path(\"regression_tree.dot\"),\n", " feature_names=[\"x1\"],\n", " rounded=True,\n", " filled=True\n", " )" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure tree_regression_regularization_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "tree_reg1 = DecisionTreeRegressor(random_state=42)\n", "tree_reg2 = DecisionTreeRegressor(random_state=42, min_samples_leaf=10)\n", "tree_reg1.fit(X, y)\n", "tree_reg2.fit(X, y)\n", "\n", "x1 = np.linspace(0, 1, 500).reshape(-1, 1)\n", "y_pred1 = tree_reg1.predict(x1)\n", "y_pred2 = tree_reg2.predict(x1)\n", "\n", "plt.figure(figsize=(11, 4))\n", "\n", "plt.subplot(121)\n", "plt.plot(X, y, \"b.\")\n", "plt.plot(x1, y_pred1, \"r.-\", linewidth=2, label=r\"$\\hat{y}$\")\n", "plt.axis([0, 1, -0.2, 1.1])\n", "plt.xlabel(\"$x_1$\", fontsize=18)\n", "plt.ylabel(\"$y$\", fontsize=18, rotation=0)\n", "plt.legend(loc=\"upper center\", fontsize=18)\n", "plt.title(\"No restrictions\", fontsize=14)\n", "\n", "plt.subplot(122)\n", "plt.plot(X, y, \"b.\")\n", "plt.plot(x1, y_pred2, \"r.-\", linewidth=2, label=r\"$\\hat{y}$\")\n", "plt.axis([0, 1, -0.2, 1.1])\n", "plt.xlabel(\"$x_1$\", fontsize=18)\n", "plt.title(\"min_samples_leaf={}\".format(tree_reg2.min_samples_leaf), fontsize=14)\n", "\n", "save_fig(\"tree_regression_regularization_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Exercise solutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. to 6." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See appendix A." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 7." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: train and fine-tune a Decision Tree for the moons dataset._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "a. Generate a moons dataset using `make_moons(n_samples=10000, noise=0.4)`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adding `random_state=42` to make this notebook's output constant:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import make_moons\n", "\n", "X, y = make_moons(n_samples=10000, noise=0.4, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "b. Split it into a training set and a test set using `train_test_split()`." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "c. Use grid search with cross-validation (with the help of the `GridSearchCV` class) to find good hyperparameter values for a `DecisionTreeClassifier`. Hint: try various values for `max_leaf_nodes`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fitting 3 folds for each of 294 candidates, totalling 882 fits\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.\n", "[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.4s\n", "[Parallel(n_jobs=-1)]: Done 882 out of 882 | elapsed: 3.5s finished\n" ] }, { "data": { "text/plain": [ "GridSearchCV(cv=3, error_score='raise-deprecating',\n", " estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=42,\n", " splitter='best'),\n", " fit_params=None, iid='warn', n_jobs=-1,\n", " param_grid={'max_leaf_nodes': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], 'min_samples_split': [2, 3, 4]},\n", " pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',\n", " scoring=None, verbose=1)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.model_selection import GridSearchCV\n", "\n", "params = {'max_leaf_nodes': list(range(2, 100)), 'min_samples_split': [2, 3, 4]}\n", "grid_search_cv = GridSearchCV(DecisionTreeClassifier(random_state=42), params, n_jobs=-1, verbose=1, cv=3)\n", "\n", "grid_search_cv.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,\n", " max_features=None, max_leaf_nodes=17,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=42,\n", " splitter='best')" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grid_search_cv.best_estimator_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "d. Train it on the full training set using these hyperparameters, and measure your model's performance on the test set. You should get roughly 85% to 87% accuracy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, `GridSearchCV` trains the best model found on the whole training set (you can change this by setting `refit=False`), so we don't need to do it again. We can simply evaluate the model's accuracy:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.8695" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import accuracy_score\n", "\n", "y_pred = grid_search_cv.predict(X_test)\n", "accuracy_score(y_test, y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: Grow a forest._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "a. Continuing the previous exercise, generate 1,000 subsets of the training set, each containing 100 instances selected randomly. Hint: you can use Scikit-Learn's `ShuffleSplit` class for this." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import ShuffleSplit\n", "\n", "n_trees = 1000\n", "n_instances = 100\n", "\n", "mini_sets = []\n", "\n", "rs = ShuffleSplit(n_splits=n_trees, test_size=len(X_train) - n_instances, random_state=42)\n", "for mini_train_index, mini_test_index in rs.split(X_train):\n", " X_mini_train = X_train[mini_train_index]\n", " y_mini_train = y_train[mini_train_index]\n", " mini_sets.append((X_mini_train, y_mini_train))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "b. Train one Decision Tree on each subset, using the best hyperparameter values found above. Evaluate these 1,000 Decision Trees on the test set. Since they were trained on smaller sets, these Decision Trees will likely perform worse than the first Decision Tree, achieving only about 80% accuracy." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.8054499999999999" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.base import clone\n", "\n", "forest = [clone(grid_search_cv.best_estimator_) for _ in range(n_trees)]\n", "\n", "accuracy_scores = []\n", "\n", "for tree, (X_mini_train, y_mini_train) in zip(forest, mini_sets):\n", " tree.fit(X_mini_train, y_mini_train)\n", " \n", " y_pred = tree.predict(X_test)\n", " accuracy_scores.append(accuracy_score(y_test, y_pred))\n", "\n", "np.mean(accuracy_scores)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "c. Now comes the magic. For each test set instance, generate the predictions of the 1,000 Decision Trees, and keep only the most frequent prediction (you can use SciPy's `mode()` function for this). This gives you _majority-vote predictions_ over the test set." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "Y_pred = np.empty([n_trees, len(X_test)], dtype=np.uint8)\n", "\n", "for tree_index, tree in enumerate(forest):\n", " Y_pred[tree_index] = tree.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "from scipy.stats import mode\n", "\n", "y_pred_majority_votes, n_votes = mode(Y_pred, axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "d. Evaluate these predictions on the test set: you should obtain a slightly higher accuracy than your first model (about 0.5 to 1.5% higher). Congratulations, you have trained a Random Forest classifier!" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.872" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "accuracy_score(y_test, y_pred_majority_votes.reshape([-1]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "nav_menu": { "height": "309px", "width": "468px" }, "toc": { "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }