{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Partial Dependence and Individual Conditional Expectation Plots\n\nPartial dependence plots show the dependence between the target function [2]_\nand a set of features of interest, marginalizing over the values of all other\nfeatures (the complement features). Due to the limits of human perception, the\nsize of the set of features of interest must be small (usually, one or two)\nthus they are usually chosen among the most important features.\n\nSimilarly, an individual conditional expectation (ICE) plot [3]_\nshows the dependence between the target function and a feature of interest.\nHowever, unlike partial dependence plots, which show the average effect of the\nfeatures of interest, ICE plots visualize the dependence of the prediction on a\nfeature for each :term:`sample` separately, with one line per sample.\nOnly one feature of interest is supported for ICE plots.\n\nThis example shows how to obtain partial dependence and ICE plots from a\n:class:`~sklearn.neural_network.MLPRegressor` and a\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` trained on the\nbike sharing dataset. The example is inspired by [1]_.\n\n.. [1] [Molnar, Christoph. \"Interpretable machine learning.\n A Guide for Making Black Box Models Explainable\",\n 2019.](https://christophm.github.io/interpretable-ml-book/)\n\n.. [2] For classification you can think of it as the regression score before\n the link function.\n\n.. [3] :arxiv:`Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015).\n \"Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of\n Individual Conditional Expectation\". Journal of Computational and\n Graphical Statistics, 24(1): 44-65 <1309.6392>`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bike sharing dataset preprocessing\n\nWe will use the bike sharing dataset. The goal is to predict the number of bike\nrentals using weather and season data as well as the datetime information.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.datasets import fetch_openml\n\nbikes = fetch_openml(\"Bike_Sharing_Demand\", version=2, as_frame=True)\n# Make an explicit copy to avoid \"SettingWithCopyWarning\" from pandas\nX, y = bikes.data.copy(), bikes.target\n\n# We use only a subset of the data to speed up the example.\nX = X.iloc[::5, :]\ny = y[::5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The feature `\"weather\"` has a particularity: the category `\"heavy_rain\"` is a rare\ncategory.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X[\"weather\"].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because of this rare category, we collapse it into `\"rain\"`.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X[\"weather\"] = (\n X[\"weather\"]\n .astype(object)\n .replace(to_replace=\"heavy_rain\", value=\"rain\")\n .astype(\"category\")\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a closer look at the `\"year\"` feature:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X[\"year\"].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that we have data from two years. We use the first year to train the\nmodel and the second year to test the model.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mask_training = X[\"year\"] == 0.0\nX = X.drop(columns=[\"year\"])\nX_train, y_train = X[mask_training], y[mask_training]\nX_test, y_test = X[~mask_training], y[~mask_training]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check the dataset information to see that we have heterogeneous data types. We\nhave to preprocess the different columns accordingly.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X_train.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the previous information, we will consider the `category` columns as nominal\ncategorical features. In addition, we will consider the date and time information as\ncategorical features as well.\n\nWe manually define the columns containing numerical and categorical\nfeatures.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "numerical_features = [\n \"temp\",\n \"feel_temp\",\n \"humidity\",\n \"windspeed\",\n]\ncategorical_features = X_train.columns.drop(numerical_features)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we go into the details regarding the preprocessing of the different machine\nlearning pipelines, we will try to get some additional intuition regarding the dataset\nthat will be helpful to understand the model's statistical performance and results of\nthe partial dependence analysis.\n\nWe plot the average number of bike rentals by grouping the data by season and\nby year.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from itertools import product\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\ndays = (\"Sun\", \"Mon\", \"Tue\", \"Wed\", \"Thu\", \"Fri\", \"Sat\")\nhours = tuple(range(24))\nxticklabels = [f\"{day}\\n{hour}:00\" for day, hour in product(days, hours)]\nxtick_start, xtick_period = 6, 12\n\nfig, axs = plt.subplots(nrows=2, figsize=(8, 6), sharey=True, sharex=True)\naverage_bike_rentals = bikes.frame.groupby(\n [\"year\", \"season\", \"weekday\", \"hour\"], observed=True\n).mean(numeric_only=True)[\"count\"]\nfor ax, (idx, df) in zip(axs, average_bike_rentals.groupby(\"year\")):\n df.groupby(\"season\", observed=True).plot(ax=ax, legend=True)\n\n # decorate the plot\n ax.set_xticks(\n np.linspace(\n start=xtick_start,\n stop=len(xticklabels),\n num=len(xticklabels) // xtick_period,\n )\n )\n ax.set_xticklabels(xticklabels[xtick_start::xtick_period])\n ax.set_xlabel(\"\")\n ax.set_ylabel(\"Average number of bike rentals\")\n ax.set_title(\n f\"Bike rental for {'2010 (train set)' if idx == 0.0 else '2011 (test set)'}\"\n )\n ax.set_ylim(0, 1_000)\n ax.set_xlim(0, len(xticklabels))\n ax.legend(loc=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first striking difference between the train and test set is that the number of\nbike rentals is higher in the test set. For this reason, it will not be surprising to\nget a machine learning model that underestimates the number of bike rentals. We\nalso observe that the number of bike rentals is lower during the spring season. In\naddition, we see that during working days, there is a specific pattern around 6-7\nam and 5-6 pm with some peaks of bike rentals. We can keep in mind these different\ninsights and use them to understand the partial dependence plot.\n\n## Preprocessor for machine-learning models\n\nSince we later use two different models, a\n:class:`~sklearn.neural_network.MLPRegressor` and a\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor`, we create two different\npreprocessors, specific for each model.\n\n### Preprocessor for the neural network model\n\nWe will use a :class:`~sklearn.preprocessing.QuantileTransformer` to scale the\nnumerical features and encode the categorical features with a\n:class:`~sklearn.preprocessing.OneHotEncoder`.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.compose import ColumnTransformer\nfrom sklearn.preprocessing import OneHotEncoder, QuantileTransformer\n\nmlp_preprocessor = ColumnTransformer(\n transformers=[\n (\"num\", QuantileTransformer(n_quantiles=100), numerical_features),\n (\"cat\", OneHotEncoder(handle_unknown=\"ignore\"), categorical_features),\n ]\n)\nmlp_preprocessor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preprocessor for the gradient boosting model\n\nFor the gradient boosting model, we leave the numerical features as-is and only\nencode the categorical features using a\n:class:`~sklearn.preprocessing.OrdinalEncoder`.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.preprocessing import OrdinalEncoder\n\nhgbdt_preprocessor = ColumnTransformer(\n transformers=[\n (\"cat\", OrdinalEncoder(), categorical_features),\n (\"num\", \"passthrough\", numerical_features),\n ],\n sparse_threshold=1,\n verbose_feature_names_out=False,\n).set_output(transform=\"pandas\")\nhgbdt_preprocessor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1-way partial dependence with different models\n\nIn this section, we will compute 1-way partial dependence with two different\nmachine-learning models: (i) a multi-layer perceptron and (ii) a\ngradient-boosting model. With these two models, we illustrate how to compute and\ninterpret both partial dependence plot (PDP) for both numerical and categorical\nfeatures and individual conditional expectation (ICE).\n\n### Multi-layer perceptron\n\nLet's fit a :class:`~sklearn.neural_network.MLPRegressor` and compute\nsingle-variable partial dependence plots.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from time import time\n\nfrom sklearn.neural_network import MLPRegressor\nfrom sklearn.pipeline import make_pipeline\n\nprint(\"Training MLPRegressor...\")\ntic = time()\nmlp_model = make_pipeline(\n mlp_preprocessor,\n MLPRegressor(\n hidden_layer_sizes=(30, 15),\n learning_rate_init=0.01,\n early_stopping=True,\n random_state=0,\n ),\n)\nmlp_model.fit(X_train, y_train)\nprint(f\"done in {time() - tic:.3f}s\")\nprint(f\"Test R2 score: {mlp_model.score(X_test, y_test):.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We configured a pipeline using the preprocessor that we created specifically for the\nneural network and tuned the neural network size and learning rate to get a reasonable\ncompromise between training time and predictive performance on a test set.\n\nImportantly, this tabular dataset has very different dynamic ranges for its\nfeatures. Neural networks tend to be very sensitive to features with varying\nscales and forgetting to preprocess the numeric feature would lead to a very\npoor model.\n\nIt would be possible to get even higher predictive performance with a larger\nneural network but the training would also be significantly more expensive.\n\nNote that it is important to check that the model is accurate enough on a\ntest set before plotting the partial dependence since there would be little\nuse in explaining the impact of a given feature on the prediction function of\na model with poor predictive performance. In this regard, our MLP model works\nreasonably well.\n\nWe will plot the averaged partial dependence.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nfrom sklearn.inspection import PartialDependenceDisplay\n\ncommon_params = {\n \"subsample\": 50,\n \"n_jobs\": 2,\n \"grid_resolution\": 20,\n \"random_state\": 0,\n}\n\nprint(\"Computing partial dependence plots...\")\nfeatures_info = {\n # features of interest\n \"features\": [\"temp\", \"humidity\", \"windspeed\", \"season\", \"weather\", \"hour\"],\n # type of partial dependence plot\n \"kind\": \"average\",\n # information regarding categorical features\n \"categorical_features\": categorical_features,\n}\ntic = time()\n_, ax = plt.subplots(ncols=3, nrows=2, figsize=(9, 8), constrained_layout=True)\ndisplay = PartialDependenceDisplay.from_estimator(\n mlp_model,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n)\nprint(f\"done in {time() - tic:.3f}s\")\n_ = display.figure_.suptitle(\n (\n \"Partial dependence of the number of bike rentals\\n\"\n \"for the bike rental dataset with an MLPRegressor\"\n ),\n fontsize=16,\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gradient boosting\n\nLet's now fit a :class:`~sklearn.ensemble.HistGradientBoostingRegressor` and\ncompute the partial dependence on the same features. We also use the\nspecific preprocessor we created for this model.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.ensemble import HistGradientBoostingRegressor\n\nprint(\"Training HistGradientBoostingRegressor...\")\ntic = time()\nhgbdt_model = make_pipeline(\n hgbdt_preprocessor,\n HistGradientBoostingRegressor(\n categorical_features=categorical_features,\n random_state=0,\n max_iter=50,\n ),\n)\nhgbdt_model.fit(X_train, y_train)\nprint(f\"done in {time() - tic:.3f}s\")\nprint(f\"Test R2 score: {hgbdt_model.score(X_test, y_test):.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we used the default hyperparameters for the gradient boosting model\nwithout any preprocessing as tree-based models are naturally robust to\nmonotonic transformations of numerical features.\n\nNote that on this tabular dataset, Gradient Boosting Machines are both\nsignificantly faster to train and more accurate than neural networks. It is\nalso significantly cheaper to tune their hyperparameters (the defaults tend\nto work well while this is not often the case for neural networks).\n\nWe will plot the partial dependence for some of the numerical and categorical\nfeatures.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Computing partial dependence plots...\")\ntic = time()\n_, ax = plt.subplots(ncols=3, nrows=2, figsize=(9, 8), constrained_layout=True)\ndisplay = PartialDependenceDisplay.from_estimator(\n hgbdt_model,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n)\nprint(f\"done in {time() - tic:.3f}s\")\n_ = display.figure_.suptitle(\n (\n \"Partial dependence of the number of bike rentals\\n\"\n \"for the bike rental dataset with a gradient boosting\"\n ),\n fontsize=16,\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analysis of the plots\n\nWe will first look at the PDPs for the numerical features. For both models, the\ngeneral trend of the PDP of the temperature is that the number of bike rentals is\nincreasing with temperature. We can make a similar analysis but with the opposite\ntrend for the humidity features. The number of bike rentals is decreasing when the\nhumidity increases. Finally, we see the same trend for the wind speed feature. The\nnumber of bike rentals is decreasing when the wind speed is increasing for both\nmodels. We also observe that :class:`~sklearn.neural_network.MLPRegressor` has much\nsmoother predictions than :class:`~sklearn.ensemble.HistGradientBoostingRegressor`.\n\nNow, we will look at the partial dependence plots for the categorical features.\n\nWe observe that the spring season is the lowest bar for the season feature. With the\nweather feature, the rain category is the lowest bar. Regarding the hour feature,\nwe see two peaks around the 7 am and 6 pm. These findings are in line with the\nthe observations we made earlier on the dataset.\n\nHowever, it is worth noting that we are creating potential meaningless\nsynthetic samples if features are correlated.\n\n### ICE vs. PDP\nPDP is an average of the marginal effects of the features. We are averaging the\nresponse of all samples of the provided set. Thus, some effects could be hidden. In\nthis regard, it is possible to plot each individual response. This representation is\ncalled the Individual Effect Plot (ICE). In the plot below, we plot 50 randomly\nselected ICEs for the temperature and humidity features.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Computing partial dependence plots and individual conditional expectation...\")\ntic = time()\n_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)\n\nfeatures_info = {\n \"features\": [\"temp\", \"humidity\"],\n \"kind\": \"both\",\n \"centered\": True,\n}\n\ndisplay = PartialDependenceDisplay.from_estimator(\n hgbdt_model,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n)\nprint(f\"done in {time() - tic:.3f}s\")\n_ = display.figure_.suptitle(\"ICE and PDP representations\", fontsize=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the ICE for the temperature feature gives us some additional information:\nSome of the ICE lines are flat while some others show a decrease of the dependence\nfor temperature above 35 degrees Celsius. We observe a similar pattern for the\nhumidity feature: some of the ICEs lines show a sharp decrease when the humidity is\nabove 80%.\n\nNot all ICE lines are parallel, this indicates that the model finds\ninteractions between features. We can repeat the experiment by constraining the\ngradient boosting model to not use any interactions between features using the\nparameter `interaction_cst`:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.base import clone\n\ninteraction_cst = [[i] for i in range(X_train.shape[1])]\nhgbdt_model_without_interactions = (\n clone(hgbdt_model)\n .set_params(histgradientboostingregressor__interaction_cst=interaction_cst)\n .fit(X_train, y_train)\n)\nprint(f\"Test R2 score: {hgbdt_model_without_interactions.score(X_test, y_test):.2f}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)\n\nfeatures_info[\"centered\"] = False\ndisplay = PartialDependenceDisplay.from_estimator(\n hgbdt_model_without_interactions,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n)\n_ = display.figure_.suptitle(\"ICE and PDP representations\", fontsize=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2D interaction plots\n\nPDPs with two features of interest enable us to visualize interactions among them.\nHowever, ICEs cannot be plotted in an easy manner and thus interpreted. We will show\nthe representation of available in\n:meth:`~sklearn.inspection.PartialDependenceDisplay.from_estimator` that is a 2D\nheatmap.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Computing partial dependence plots...\")\nfeatures_info = {\n \"features\": [\"temp\", \"humidity\", (\"temp\", \"humidity\")],\n \"kind\": \"average\",\n}\n_, ax = plt.subplots(ncols=3, figsize=(10, 4), constrained_layout=True)\ntic = time()\ndisplay = PartialDependenceDisplay.from_estimator(\n hgbdt_model,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n)\nprint(f\"done in {time() - tic:.3f}s\")\n_ = display.figure_.suptitle(\n \"1-way vs 2-way of numerical PDP using gradient boosting\", fontsize=16\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The two-way partial dependence plot shows the dependence of the number of bike rentals\non joint values of temperature and humidity.\nWe clearly see an interaction between the two features. For a temperature higher than\n20 degrees Celsius, the humidity has a impact on the number of bike rentals\nthat seems independent on the temperature.\n\nOn the other hand, for temperatures lower than 20 degrees Celsius, both the\ntemperature and humidity continuously impact the number of bike rentals.\n\nFurthermore, the slope of the of the impact ridge of the 20 degrees Celsius\nthreshold is very dependent on the humidity level: the ridge is steep under\ndry conditions but much smoother under wetter conditions above 70% of humidity.\n\nWe now contrast those results with the same plots computed for the model\nconstrained to learn a prediction function that does not depend on such\nnon-linear feature interactions.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Computing partial dependence plots...\")\nfeatures_info = {\n \"features\": [\"temp\", \"humidity\", (\"temp\", \"humidity\")],\n \"kind\": \"average\",\n}\n_, ax = plt.subplots(ncols=3, figsize=(10, 4), constrained_layout=True)\ntic = time()\ndisplay = PartialDependenceDisplay.from_estimator(\n hgbdt_model_without_interactions,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n)\nprint(f\"done in {time() - tic:.3f}s\")\n_ = display.figure_.suptitle(\n \"1-way vs 2-way of numerical PDP using gradient boosting\", fontsize=16\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 1D partial dependence plots for the model constrained to not model feature\ninteractions show local spikes for each features individually, in particular for\nfor the \"humidity\" feature. Those spikes might be reflecting a degraded behavior\nof the model that attempts to somehow compensate for the forbidden interactions\nby overfitting particular training points. Note that the predictive performance\nof this model as measured on the test set is significantly worse than that of\nthe original, unconstrained model.\n\nAlso note that the number of local spikes visible on those plots is depends on\nthe grid resolution parameter of the PD plot itself.\n\nThose local spikes result in a noisily gridded 2D PD plot. It is quite\nchallenging to tell whether or not there are no interaction between those\nfeatures because of the high frequency oscillations in the humidity feature.\nHowever it can clearly be seen that the simple interaction effect observed when\nthe temperature crosses the 20 degrees boundary is no longer visible for this\nmodel.\n\nThe partial dependence between categorical features will provide a discrete\nrepresentation that can be shown as a heatmap. For instance the interaction between\nthe season, the weather, and the target would be as follow:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Computing partial dependence plots...\")\nfeatures_info = {\n \"features\": [\"season\", \"weather\", (\"season\", \"weather\")],\n \"kind\": \"average\",\n \"categorical_features\": categorical_features,\n}\n_, ax = plt.subplots(ncols=3, figsize=(14, 6), constrained_layout=True)\ntic = time()\ndisplay = PartialDependenceDisplay.from_estimator(\n hgbdt_model,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n)\n\nprint(f\"done in {time() - tic:.3f}s\")\n_ = display.figure_.suptitle(\n \"1-way vs 2-way PDP of categorical features using gradient boosting\", fontsize=16\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3D representation\n\nLet's make the same partial dependence plot for the 2 features interaction,\nthis time in 3 dimensions.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# unused but required import for doing 3d projections with matplotlib < 3.2\nimport mpl_toolkits.mplot3d # noqa: F401\nimport numpy as np\n\nfrom sklearn.inspection import partial_dependence\n\nfig = plt.figure(figsize=(5.5, 5))\n\nfeatures = (\"temp\", \"humidity\")\npdp = partial_dependence(\n hgbdt_model, X_train, features=features, kind=\"average\", grid_resolution=10\n)\nXX, YY = np.meshgrid(pdp[\"grid_values\"][0], pdp[\"grid_values\"][1])\nZ = pdp.average[0].T\nax = fig.add_subplot(projection=\"3d\")\nfig.add_axes(ax)\n\nsurf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1, cmap=plt.cm.BuPu, edgecolor=\"k\")\nax.set_xlabel(features[0])\nax.set_ylabel(features[1])\nfig.suptitle(\n \"PD of number of bike rentals on\\nthe temperature and humidity GBDT model\",\n fontsize=16,\n)\n# pretty init view\nax.view_init(elev=22, azim=122)\nclb = plt.colorbar(surf, pad=0.08, shrink=0.6, aspect=10)\nclb.ax.set_title(\"Partial\\ndependence\")\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n### Custom Inspection Points\n\nNone of the examples so far specify _which_ points are evaluated to create the\npartial dependence plots. By default we use percentiles defined by the input dataset.\nIn some cases it can be helpful to specify the exact points where you would like the\nmodel evaluated. For instance, if a user wants to test the model behavior on\nout-of-distribution data or compare two models that were fit on slightly different\ndata. The `custom_values` parameter allows the user to pass in the values that they\nwant the model to be evaluated on. This overrides the `grid_resolution` and\n`percentiles` parameters. Let's return to our gradient boosting example above\nbut with custom values\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Computing partial dependence plots with custom evaluation values...\")\ntic = time()\n_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)\n\nfeatures_info = {\n \"features\": [\"temp\", \"humidity\"],\n \"kind\": \"both\",\n}\n\ndisplay = PartialDependenceDisplay.from_estimator(\n hgbdt_model,\n X_train,\n **features_info,\n ax=ax,\n **common_params,\n # we set custom values for temp feature -\n # all other features are evaluated based on the data\n custom_values={\"temp\": np.linspace(0, 40, 10)},\n)\nprint(f\"done in {time() - tic:.3f}s\")\n_ = display.figure_.suptitle(\n (\n \"Partial dependence of the number of bike rentals\\n\"\n \"for the bike rental dataset with a gradient boosting\"\n ),\n fontsize=16,\n)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }