{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Monotonic Constraints\n\nThis example illustrates the effect of monotonic constraints on a gradient\nboosting estimator.\n\nWe build an artificial dataset where the target value is in general\npositively correlated with the first feature (with some random and\nnon-random variations), and in general negatively correlated with the second\nfeature.\n\nBy imposing a monotonic increase or a monotonic decrease constraint, respectively,\non the features during the learning process, the estimator is able to properly follow\nthe general trend instead of being subject to the variations.\n\nThis example was inspired by the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\nimport numpy as np\n\nfrom sklearn.ensemble import HistGradientBoostingRegressor\nfrom sklearn.inspection import PartialDependenceDisplay\n\nrng = np.random.RandomState(0)\n\nn_samples = 1000\nf_0 = rng.rand(n_samples)\nf_1 = rng.rand(n_samples)\nX = np.c_[f_0, f_1]\nnoise = rng.normal(loc=0.0, scale=0.01, size=n_samples)\n\n# y is positively correlated with f_0, and negatively correlated with f_1\ny = 5 * f_0 + np.sin(10 * np.pi * f_0) - 5 * f_1 - np.cos(10 * np.pi * f_1) + noise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fit a first model on this dataset without any constraints.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "gbdt_no_cst = HistGradientBoostingRegressor()\ngbdt_no_cst.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fit a second model on this dataset with monotonic increase (1)\nand a monotonic decrease (-1) constraints, respectively.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "gbdt_with_monotonic_cst = HistGradientBoostingRegressor(monotonic_cst=[1, -1])\ngbdt_with_monotonic_cst.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's display the partial dependence of the predictions on the two features.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, ax = plt.subplots()\ndisp = PartialDependenceDisplay.from_estimator(\n gbdt_no_cst,\n X,\n features=[0, 1],\n feature_names=(\n \"First feature\",\n \"Second feature\",\n ),\n line_kw={\"linewidth\": 4, \"label\": \"unconstrained\", \"color\": \"tab:blue\"},\n ax=ax,\n)\nPartialDependenceDisplay.from_estimator(\n gbdt_with_monotonic_cst,\n X,\n features=[0, 1],\n line_kw={\"linewidth\": 4, \"label\": \"constrained\", \"color\": \"tab:orange\"},\n ax=disp.axes_,\n)\n\nfor f_idx in (0, 1):\n disp.axes_[0, f_idx].plot(\n X[:, f_idx], y, \"o\", alpha=0.3, zorder=-1, color=\"tab:green\"\n )\n disp.axes_[0, f_idx].set_ylim(-6, 6)\n\nplt.legend()\nfig.suptitle(\"Monotonic constraints effect on partial dependences\")\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the predictions of the unconstrained model capture the\noscillations of the data while the constrained model follows the general\ntrend and ignores the local variations.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n## Using feature names to specify monotonic constraints\n\nNote that if the training data has feature names, it's possible to specify the\nmonotonic constraints by passing a dictionary:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n\nX_df = pd.DataFrame(X, columns=[\"f_0\", \"f_1\"])\n\ngbdt_with_monotonic_cst_df = HistGradientBoostingRegressor(\n monotonic_cst={\"f_0\": 1, \"f_1\": -1}\n).fit(X_df, y)\n\nnp.allclose(\n gbdt_with_monotonic_cst_df.predict(X_df), gbdt_with_monotonic_cst.predict(X)\n)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }