{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Permutation Importance with Multicollinear or Correlated Features\n\nIn this example, we compute the\n:func:`~sklearn.inspection.permutation_importance` of the features to a trained\n:class:`~sklearn.ensemble.RandomForestClassifier` using the\n`breast_cancer_dataset`. The model can easily get about 97% accuracy on a\ntest dataset. Because this dataset contains multicollinear features, the\npermutation importance shows that none of the features are important, in\ncontradiction with the high test accuracy.\n\nWe demo a possible approach to handling multicollinearity, which consists of\nhierarchical clustering on the features' Spearman rank-order correlations,\npicking a threshold, and keeping a single feature from each cluster.\n\n
See also\n `sphx_glr_auto_examples_inspection_plot_permutation_importance.py`