{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Visualizing the probabilistic predictions of a VotingClassifier\n\n.. currentmodule:: sklearn\n\nPlot the predicted class probabilities in a toy dataset predicted by three\ndifferent classifiers and averaged by the :class:`~ensemble.VotingClassifier`.\n\nFirst, three linear classifiers are initialized. Two are spline models with\ninteraction terms, one using constant extrapolation and the other using periodic\nextrapolation. The third classifier is a :class:`~kernel_approximation.Nystroem`\nwith the default \"rbf\" kernel.\n\nIn the first part of this example, these three classifiers are used to\ndemonstrate soft-voting using :class:`~ensemble.VotingClassifier` with weighted\naverage. We set `weights=[2, 1, 3]`, meaning the constant extrapolation spline\nmodel's predictions are weighted twice as much as the periodic spline model's,\nand the Nystroem model's predictions are weighted three times as much as the\nperiodic spline.\n\nThe second part demonstrates how soft predictions can be converted into hard\npredictions.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first generate a noisy XOR dataset, which is a binary classification task.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\nfrom matplotlib.colors import ListedColormap\n\nn_samples = 500\nrng = np.random.default_rng(0)\nfeature_names = [\"Feature #0\", \"Feature #1\"]\ncommon_scatter_plot_params = dict(\n cmap=ListedColormap([\"tab:red\", \"tab:blue\"]),\n edgecolor=\"white\",\n linewidth=1,\n)\n\nxor = pd.DataFrame(\n np.random.RandomState(0).uniform(low=-1, high=1, size=(n_samples, 2)),\n columns=feature_names,\n)\nnoise = rng.normal(loc=0, scale=0.1, size=(n_samples, 2))\ntarget_xor = np.logical_xor(\n xor[\"Feature #0\"] + noise[:, 0] > 0, xor[\"Feature #1\"] + noise[:, 1] > 0\n)\n\nX = xor[feature_names]\ny = target_xor.astype(np.int32)\n\nfig, ax = plt.subplots()\nax.scatter(X[\"Feature #0\"], X[\"Feature #1\"], c=y, **common_scatter_plot_params)\nax.set_title(\"The XOR dataset\")\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Due to the inherent non-linear separability of the XOR dataset, tree-based\nmodels would often be preferred. However, appropriate feature engineering\ncombined with a linear model can yield effective results, with the added\nbenefit of producing better-calibrated probabilities for samples located in\nthe transition regions affected by noise.\n\nWe define and fit the models on the whole dataset.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.ensemble import VotingClassifier\nfrom sklearn.kernel_approximation import Nystroem\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import PolynomialFeatures, SplineTransformer, StandardScaler\n\nclf1 = make_pipeline(\n SplineTransformer(degree=2, n_knots=2),\n PolynomialFeatures(interaction_only=True),\n LogisticRegression(C=10),\n)\nclf2 = make_pipeline(\n SplineTransformer(\n degree=2,\n n_knots=4,\n extrapolation=\"periodic\",\n include_bias=True,\n ),\n PolynomialFeatures(interaction_only=True),\n LogisticRegression(C=10),\n)\nclf3 = make_pipeline(\n StandardScaler(),\n Nystroem(gamma=2, random_state=0),\n LogisticRegression(C=10),\n)\nweights = [2, 1, 3]\neclf = VotingClassifier(\n estimators=[\n (\"constant splines model\", clf1),\n (\"periodic splines model\", clf2),\n (\"nystroem model\", clf3),\n ],\n voting=\"soft\",\n weights=weights,\n)\n\nclf1.fit(X, y)\nclf2.fit(X, y)\nclf3.fit(X, y)\neclf.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we use :class:`~inspection.DecisionBoundaryDisplay` to plot the\npredicted probabilities. By using a diverging colormap (such as `\"RdBu\"`), we\ncan ensure that darker colors correspond to `predict_proba` close to either 0\nor 1, and white corresponds to `predict_proba` of 0.5.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from itertools import product\n\nfrom sklearn.inspection import DecisionBoundaryDisplay\n\nfig, axarr = plt.subplots(2, 2, sharex=\"col\", sharey=\"row\", figsize=(10, 8))\nfor idx, clf, title in zip(\n product([0, 1], [0, 1]),\n [clf1, clf2, clf3, eclf],\n [\n \"Splines with\\nconstant extrapolation\",\n \"Splines with\\nperiodic extrapolation\",\n \"RBF Nystroem\",\n \"Soft Voting\",\n ],\n):\n disp = DecisionBoundaryDisplay.from_estimator(\n clf,\n X,\n response_method=\"predict_proba\",\n plot_method=\"pcolormesh\",\n cmap=\"RdBu\",\n alpha=0.8,\n ax=axarr[idx[0], idx[1]],\n )\n axarr[idx[0], idx[1]].scatter(\n X[\"Feature #0\"],\n X[\"Feature #1\"],\n c=y,\n **common_scatter_plot_params,\n )\n axarr[idx[0], idx[1]].set_title(title)\n fig.colorbar(disp.surface_, ax=axarr[idx[0], idx[1]], label=\"Probability estimate\")\n\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a sanity check, we can verify for a given sample that the probability\npredicted by the :class:`~ensemble.VotingClassifier` is indeed the weighted\naverage of the individual classifiers' soft-predictions.\n\nIn the case of binary classification such as in the present example, the\n:term:`predict_proba` arrays contain the probability of belonging to class 0\n(here in red) as the first entry, and the probability of belonging to class 1\n(here in blue) as the second entry.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "test_sample = pd.DataFrame({\"Feature #0\": [-0.5], \"Feature #1\": [1.5]})\npredict_probas = [est.predict_proba(test_sample).ravel() for est in eclf.estimators_]\nfor (est_name, _), est_probas in zip(eclf.estimators, predict_probas):\n print(f\"{est_name}'s predicted probabilities: {est_probas}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\n \"Weighted average of soft-predictions: \"\n f\"{np.dot(weights, predict_probas) / np.sum(weights)}\"\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that manual calculation of predicted probabilities above is\nequivalent to that produced by the `VotingClassifier`:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\n \"Predicted probability of VotingClassifier: \"\n f\"{eclf.predict_proba(test_sample).ravel()}\"\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To convert soft predictions into hard predictions when weights are provided,\nthe weighted average predicted probabilities are computed for each class.\nThen, the final class label is then derived from the class label with the\nhighest average probability, which corresponds to the default threshold at\n`predict_proba=0.5` in the case of binary classification.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\n \"Class with the highest weighted average of soft-predictions: \"\n f\"{np.argmax(np.dot(weights, predict_probas) / np.sum(weights))}\"\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is equivalent to the output of `VotingClassifier`'s `predict` method:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(f\"Predicted class of VotingClassifier: {eclf.predict(test_sample).ravel()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Soft votes can be thresholded as for any other probabilistic classifier. This\nallows you to set a threshold probability at which the positive class will be\npredicted, instead of simply selecting the class with the highest predicted\nprobability.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import FixedThresholdClassifier\n\neclf_other_threshold = FixedThresholdClassifier(\n eclf, threshold=0.7, response_method=\"predict_proba\"\n).fit(X, y)\nprint(\n \"Predicted class of thresholded VotingClassifier: \"\n f\"{eclf_other_threshold.predict(test_sample)}\"\n)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 0 }