{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Plot the decision surfaces of ensembles of trees on the iris dataset\n\nPlot the decision surfaces of forests of randomized trees trained on pairs of\nfeatures of the iris dataset.\n\nThis plot compares the decision surfaces learned by a decision tree classifier\n(first column), by a random forest classifier (second column), by an extra-\ntrees classifier (third column) and by an AdaBoost classifier (fourth column).\n\nIn the first row, the classifiers are built using the sepal width and\nthe sepal length features only, on the second row using the petal length and\nsepal length only, and on the third row using the petal width and the\npetal length only.\n\nIn descending order of quality, when trained (outside of this example) on all\n4 features using 30 estimators and scored using 10 fold cross validation,\nwe see::\n\n ExtraTreesClassifier() # 0.95 score\n RandomForestClassifier() # 0.94 score\n AdaBoost(DecisionTree(max_depth=3)) # 0.94 score\n DecisionTree(max_depth=None) # 0.94 score\n\nIncreasing `max_depth` for AdaBoost lowers the standard deviation of\nthe scores (but the average score does not improve).\n\nSee the console's output for further details about each model.\n\nIn this example you might try to:\n\n1) vary the ``max_depth`` for the ``DecisionTreeClassifier`` and\n ``AdaBoostClassifier``, perhaps try ``max_depth=3`` for the\n ``DecisionTreeClassifier`` or ``max_depth=None`` for ``AdaBoostClassifier``\n2) vary ``n_estimators``\n\nIt is worth noting that RandomForests and ExtraTrees can be fitted in parallel\non many cores as each tree is built independently of the others. AdaBoost's\nsamples are built sequentially and so do not use multiple cores.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom matplotlib.colors import ListedColormap\n\nfrom sklearn.datasets import load_iris\nfrom sklearn.ensemble import (\n AdaBoostClassifier,\n ExtraTreesClassifier,\n RandomForestClassifier,\n)\nfrom sklearn.tree import DecisionTreeClassifier\n\n# Parameters\nn_classes = 3\nn_estimators = 30\ncmap = plt.cm.RdYlBu\nplot_step = 0.02 # fine step width for decision surface contours\nplot_step_coarser = 0.5 # step widths for coarse classifier guesses\nRANDOM_SEED = 13 # fix the seed on each iteration\n\n# Load data\niris = load_iris()\n\nplot_idx = 1\n\nmodels = [\n DecisionTreeClassifier(max_depth=None),\n RandomForestClassifier(n_estimators=n_estimators),\n ExtraTreesClassifier(n_estimators=n_estimators),\n AdaBoostClassifier(DecisionTreeClassifier(max_depth=3), n_estimators=n_estimators),\n]\n\nfor pair in ([0, 1], [0, 2], [2, 3]):\n for model in models:\n # We only take the two corresponding features\n X = iris.data[:, pair]\n y = iris.target\n\n # Shuffle\n idx = np.arange(X.shape[0])\n np.random.seed(RANDOM_SEED)\n np.random.shuffle(idx)\n X = X[idx]\n y = y[idx]\n\n # Standardize\n mean = X.mean(axis=0)\n std = X.std(axis=0)\n X = (X - mean) / std\n\n # Train\n model.fit(X, y)\n\n scores = model.score(X, y)\n # Create a title for each column and the console by using str() and\n # slicing away useless parts of the string\n model_title = str(type(model)).split(\".\")[-1][:-2][: -len(\"Classifier\")]\n\n model_details = model_title\n if hasattr(model, \"estimators_\"):\n model_details += \" with {} estimators\".format(len(model.estimators_))\n print(model_details + \" with features\", pair, \"has a score of\", scores)\n\n plt.subplot(3, 4, plot_idx)\n if plot_idx <= len(models):\n # Add a title at the top of each column\n plt.title(model_title, fontsize=9)\n\n # Now plot the decision boundary using a fine mesh as input to a\n # filled contour plot\n x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n xx, yy = np.meshgrid(\n np.arange(x_min, x_max, plot_step), np.arange(y_min, y_max, plot_step)\n )\n\n # Plot either a single DecisionTreeClassifier or alpha blend the\n # decision surfaces of the ensemble of classifiers\n if isinstance(model, DecisionTreeClassifier):\n Z = model.predict(np.c_[xx.ravel(), yy.ravel()])\n Z = Z.reshape(xx.shape)\n cs = plt.contourf(xx, yy, Z, cmap=cmap)\n else:\n # Choose alpha blend level with respect to the number\n # of estimators\n # that are in use (noting that AdaBoost can use fewer estimators\n # than its maximum if it achieves a good enough fit early on)\n estimator_alpha = 1.0 / len(model.estimators_)\n for tree in model.estimators_:\n Z = tree.predict(np.c_[xx.ravel(), yy.ravel()])\n Z = Z.reshape(xx.shape)\n cs = plt.contourf(xx, yy, Z, alpha=estimator_alpha, cmap=cmap)\n\n # Build a coarser grid to plot a set of ensemble classifications\n # to show how these are different to what we see in the decision\n # surfaces. These points are regularly space and do not have a\n # black outline\n xx_coarser, yy_coarser = np.meshgrid(\n np.arange(x_min, x_max, plot_step_coarser),\n np.arange(y_min, y_max, plot_step_coarser),\n )\n Z_points_coarser = model.predict(\n np.c_[xx_coarser.ravel(), yy_coarser.ravel()]\n ).reshape(xx_coarser.shape)\n cs_points = plt.scatter(\n xx_coarser,\n yy_coarser,\n s=15,\n c=Z_points_coarser,\n cmap=cmap,\n edgecolors=\"none\",\n )\n\n # Plot the training points, these are clustered together and have a\n # black outline\n plt.scatter(\n X[:, 0],\n X[:, 1],\n c=y,\n cmap=ListedColormap([\"r\", \"y\", \"b\"]),\n edgecolor=\"k\",\n s=20,\n )\n plot_idx += 1 # move on to the next plot in sequence\n\nplt.suptitle(\"Classifiers on feature subsets of the Iris dataset\", fontsize=12)\nplt.axis(\"tight\")\nplt.tight_layout(h_pad=0.2, w_pad=0.2, pad=2.5)\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }