Version 1.8#

Legend for changelogs

Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Version 1.8.dev0#

November 2025

Changes impacting many modules#

Efficiency Improved CPU and memory usage in estimators and metric functions that rely on weighted percentiles and better match NumPy and Scipy (un-weighted) implementations of percentiles. By Lucy Liu #31775

Support for Array API#

Additional estimators and functions have been updated to include support for all Array API compliant inputs.

See Array API support (experimental) for more details.

Feature sklearn.preprocessing.StandardScaler now supports Array API compliant inputs. By Alexander Fabisch, Edoardo Abati, Olivier Grisel and Charles Hill. #27113
Feature linear_model.RidgeCV, linear_model.RidgeClassifier and linear_model.RidgeClassifierCV now support array API compatible inputs with solver="svd". By Jérôme Dockès. #27961
Feature metrics.pairwise.pairwise_kernels for any kernel except “laplacian” and metrics.pairwise_distances for metrics “cosine”, “euclidean” and “l2” now support array API inputs. By Emily Chen and Lucy Liu #29822
Feature sklearn.metrics.confusion_matrix now supports Array API compatible inputs. By Stefanie Senger #30562
Feature sklearn.mixture.GaussianMixture with init_params="random" or init_params="random_from_data" and warm_start=False now supports Array API compatible inputs. By Stefanie Senger and Loïc Estève #30777
Feature sklearn.metrics.roc_curve now supports Array API compatible inputs. By Thomas Li #30878
Feature preprocessing.PolynomialFeatures now supports array API compatible inputs. By Omar Salman #31580
Feature calibration.CalibratedClassifierCV now supports array API compatible inputs with method="temperature" and when the underlying estimator also supports the array API. By Omar Salman #32246
Feature sklearn.metrics.precision_recall_curve now supports array API compatible inputs. By Lucy Liu #32249
Feature sklearn.model_selection.cross_val_predict now supports array API compatible inputs. By Omar Salman #32270
Feature sklearn.metrics.brier_score_loss, sklearn.metrics.log_loss, sklearn.metrics.d2_brier_score and sklearn.metrics.d2_log_loss_score now support array API compatible inputs. By Omar Salman #32422
Feature naive_bayes.GaussianNB now supports array API compatible inputs. By Omar Salman #32497
Feature sklearn.metrics.det_curve now supports Array API compliant inputs. By Josef Affourtit. #32586
Feature sklearn.metrics.pairwise.manhattan_distances now supports array API compatible inputs. By Omar Salman. #32597
Feature sklearn.metrics.calinski_harabasz_score now supports Array API compliant inputs. By Josef Affourtit. #32600
Feature sklearn.metrics.balanced_accuracy_score now supports array API compatible inputs. By Omar Salman. #32604
Feature sklearn.metrics.pairwise.laplacian_kernel now supports array API compatible inputs. By Zubair Shakoor. #32613
Feature sklearn.metrics.cohen_kappa_score now supports array API compatible inputs. By Omar Salman. #32619

Metadata routing#

Refer to the Metadata Routing User Guide for more details.

Fix Fixed an issue where passing sample_weight to a Pipeline inside a GridSearchCV would raise an error with metadata routing enabled. By Adrin Jalali. #31898

Free-threaded CPython 3.14 support#

scikit-learn has support for free-threaded CPython, in particular free-threaded wheels are available for all of our supported platforms on Python 3.14.

Free-threaded (also known as nogil) CPython is a version of CPython that aims at enabling efficient multi-threaded use cases by removing the Global Interpreter Lock (GIL).

If you want to try out free-threaded Python, the recommendation is to use Python 3.14, that has fixed a number of issues compared to Python 3.13. Feel free to try free-threaded on your use case and report any issues!

For more details about free-threaded CPython see py-free-threading doc, in particular how to install a free-threaded CPython and Ecosystem compatibility tracking.

By Loïc Estève and Olivier Grisel and many other people in the wider Scientific Python and CPython ecosystem, for example Nathan Goldbaum, Ralf Gommers, Edgar Andrés Margffoy Tuay. #custom-top-level-32079

`sklearn.base`#

Feature Refactored dir in BaseEstimator to recognize condition check in available_if. By John Hendricks and Miguel Parece. #31928
Fix Fixed the handling of pandas missing values in HTML display of all estimators. By :user: Dea María Léon <deamarialeon>. #32341

`sklearn.calibration`#

Feature Added temperature scaling method in calibration.CalibratedClassifierCV. By Virgil Chan and Christian Lorentzen. #31068

`sklearn.cluster`#

Efficiency cluster.kmeans_plusplus now uses np.cumsum directly without extra numerical stability checks and without casting to np.float64. By Tiziano Zito #31991
Fix The default value of the copy parameter in cluster.HDBSCAN will change from False to True in 1.10 to avoid data modification and maintain consistency with other estimators. By Sarthak Puri. #31973

`sklearn.compose`#

Fix The compose.ColumnTransformer now correctly fits on data provided as a polars.DataFrame when any transformer has a sparse output. By Phillipp Gnan. #32188

`sklearn.covariance`#

Efficiency sklearn.covariance.GraphicalLasso, sklearn.covariance.GraphicalLassoCV and sklearn.covariance.graphical_lasso with mode="cd" profit from the fit time performance improvement of sklearn.linear_model.Lasso by means of gap safe screening rules. By Christian Lorentzen. #31987
Fix Fixed uncontrollable randomness in sklearn.covariance.GraphicalLasso, sklearn.covariance.GraphicalLassoCV and sklearn.covariance.graphical_lasso. For mode="cd", they now use cyclic coordinate descent. Before, it was random coordinate descent with uncontrollable random number seeding. By Christian Lorentzen. #31987
Fix Added correction to covariance.MinCovDet to adjust for consistency at the normal distribution. This reduces the bias present when applying this method to data that is normally distributed. By Daniel Herrera-Esposito #32117

`sklearn.decomposition`#

Efficiency sklearn.decomposition.DictionaryLearning and sklearn.decomposition.MiniBatchDictionaryLearning with fit_algorithm="cd", sklearn.decomposition.SparseCoder with transform_algorithm="lasso_cd", sklearn.decomposition.MiniBatchSparsePCA, sklearn.decomposition.SparsePCA, sklearn.decomposition.dict_learning and sklearn.decomposition.dict_learning_online with method="cd", sklearn.decomposition.sparse_encode with algorithm="lasso_cd" all profit from the fit time performance improvement of sklearn.linear_model.Lasso by means of gap safe screening rules. By Christian Lorentzen. #31987
Enhancement decomposition.SparseCoder now follows the transformer API of scikit-learn. In addition, the fit method now validates the input and parameters. By François Paugam. #32077
Fix Add input checks to the inverse_transform method of decomposition.PCA and decomposition.IncrementalPCA. #29310 by Ian Faust. #29310

`sklearn.discriminant_analysis`#

Feature Added solver, covariance_estimator and shrinkage in discriminant_analysis.QuadraticDiscriminantAnalysis. The resulting class is more similar to discriminant_analysis.LinearDiscriminantAnalysis and allows for more flexibility in the estimation of the covariance matrices. By Daniel Herrera-Esposito. #32108

`sklearn.ensemble`#

Fix ensemble.BaggingClassifier, ensemble.BaggingRegressor and ensemble.IsolationForest now use sample_weight to draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore, max_samples is now interpreted as a fraction of sample_weight.sum() instead of X.shape[0] when passed as a float. By Antoine Baker. #31414

`sklearn.feature_selection`#

Enhancement feature_selection.SelectFromModel now does not force max_features to be less than or equal to the number of input features. By Thibault #31939

`sklearn.gaussian_process`#

Efficiency make GaussianProcessRegressor.predict faster when return_cov and return_std are both False. By Rafael Ayllón Gavilán. #31431

`sklearn.linear_model`#

Efficiency linear_model.ElasticNet and linear_model.Lasso with precompute=False use less memory for dense X and are a bit faster. Previously, they used twice the memory of X even for Fortran-contiguous X. By Christian Lorentzen #31665
Efficiency linear_model.ElasticNet and linear_model.Lasso avoid double input checking and are therefore a bit faster. By Christian Lorentzen. #31848
Efficiency linear_model.ElasticNet, linear_model.ElasticNetCV, linear_model.Lasso, linear_model.LassoCV, linear_model.MultiTaskElasticNet, linear_model.MultiTaskElasticNetCV, linear_model.MultiTaskLasso and linear_model.MultiTaskLassoCV are faster to fit by avoiding a BLAS level 1 (axpy) call in the innermost loop. Same for functions linear_model.enet_path and linear_model.lasso_path. By Christian Lorentzen #31956 and #31880
Efficiency linear_model.ElasticNetCV, linear_model.LassoCV, linear_model.MultiTaskElasticNetCV and linear_model.MultiTaskLassoCV avoid an additional copy of X with default copy_X=True. By Christian Lorentzen. #31946
Efficiency linear_model.ElasticNet, linear_model.ElasticNetCV, linear_model.Lasso, linear_model.LassoCV, linear_model.MultiTaskElasticNetCV, linear_model.MultiTaskLassoCV as well as linear_model.lasso_path and linear_model.enet_path now implement gap safe screening rules in the coordinate descent solver for dense and sparse X. The speedup of fitting time is particularly pronounced (10-times is possible) when computing regularization paths like the *CV-variants of the above estimators do. There is now an additional check of the stopping criterion before entering the main loop of descent steps. As the stopping criterion requires the computation of the dual gap, the screening happens whenever the dual gap is computed. By Christian Lorentzen #31882, #31986, #31987 and #32014
Enhancement linear_model.ElasticNet, linear_model.ElasticNetCV, linear_model.Lasso, linear_model.LassoCV, MultiTaskElasticNet, MultiTaskElasticNetCV, MultiTaskLasso, MultiTaskLassoCV, as well as linear_model.enet_path and linear_model.lasso_path now use dual gap <= tol instead of dual gap < tol as stopping criterion. The resulting coefficients might differ to previous versions of scikit-learn in rare cases. By Christian Lorentzen. #31906
Fix Fix the convergence criteria for SGD models, to avoid premature convergence when tol != None. This primarily impacts SGDOneClassSVM but also affects SGDClassifier and SGDRegressor. Before this fix, only the loss function without penalty was used as the convergence check, whereas now, the full objective with regularization is used. By Guillaume Lemaitre and kostayScr #31856
Fix The allowed parameter range for the initial learning rate eta0 in linear_model.SGDClassifier, linear_model.SGDOneClassSVM, linear_model.SGDRegressor and linear_model.Perceptron changed from non-negative numbers to strictly positive numbers. As a consequence, the default eta0 of linear_model.SGDClassifier and linear_model.SGDOneClassSVM changed from 0 to 0.01. But note that eta0 is not used by the default learning rate “optimal” of those two estimators. By Christian Lorentzen. #31933
Fix linear_model.LogisticRegressionCV is able to handle CV splits where some class labels are missing in some folds. Before, it raised an error whenever a class label were missing in a fold. By @Christian Lorentzen <lorentzenchr> :pr:`32747
API Change linear_model.PassiveAggressiveClassifier and linear_model.PassiveAggressiveRegressor are deprecated and will be removed in 1.10. Equivalent estimators are available with linear_model.SGDClassifier and SGDRegressor, both of which expose the options learning_rate="pa1" and "pa2". The parameter eta0 can be used to specify the aggressiveness parameter of the Passive-Aggressive-Algorithms, called C in the reference paper. By Christian Lorentzen #31932 and #29097
API Change linear_model.SGDClassifier, linear_model.SGDRegressor, and linear_model.SGDOneClassSVM now deprecate negative values for the power_t parameter. Using a negative value will raise a warning in version 1.8 and will raise an error in version 1.10. A value in the range [0.0, inf) must be used instead. By Ritvi Alagusankar #31474
API Change Raising error in sklearn.linear_model.LogisticRegression when liblinear solver is used and input X values are larger than 1e30, the liblinear solver freezes otherwise. By Shruti Nath. #31888
API Change linear_model.LogisticRegressionCV got a new parameter use_legacy_attributes to control the types and shapes of the fitted attributes C_, l1_ratio_, coefs_paths_, scores_ and n_iter_. The current default value True keeps the legacy behaviour. If False then:
- C_ is a float.
- l1_ratio_ is a float.
- coefs_paths_ is an ndarray of shape (n_folds, n_l1_ratios, n_cs, n_classes, n_features). For binary problems (n_classes=2), the 2nd last dimension is 1.
- scores_ is an ndarray of shape (n_folds, n_l1_ratios, n_cs).
- n_iter_ is an ndarray of shape (n_folds, n_l1_ratios, n_cs).
In version 1.10, the default will change to False and use_legacy_attributes will be deprecated. In 1.12 use_legacy_attributes will be removed. By Christian Lorentzen. #32114
API Change The n_jobs parameter of linear_model.LogisticRegression is deprecated and will be removed in 1.10. It has no effect since 1.8. By Loïc Estève. #32742

`sklearn.manifold`#

Major Feature manifold.ClassicalMDS was implemented to perform classical MDS (eigendecomposition of the double-centered distance matrix). By Dmitry Kobak and Meekail Zain #31322
Feature manifold.MDS now supports arbitrary distance metrics (via metric and metric_params parameters) and initialization via classical MDS (via init parameter). The dissimilarity parameter was deprecated. The old metric parameter was renamed into metric_mds. By Dmitry Kobak #32229
Feature manifold.TSNE now supports PCA initialization with sparse input matrices. By Arturo Amor. #32433

`sklearn.metrics`#

Feature metrics.d2_brier_score has been added which calculates the D^2 for the Brier score. By Omar Salman. #28971
Feature Add metrics.confusion_matrix_at_thresholds function that returns the number of true negatives, false positives, false negatives and true positives per threshold. By Success Moses. #30134
Efficiency Avoid redundant input validation in metrics.d2_log_loss_score leading to a 1.2x speedup in large scale benchmarks. By Olivier Grisel and Omar Salman #32356
Enhancement metrics.median_absolute_error now supports Array API compatible inputs. By Lucy Liu. #31406
Enhancement Improved the error message for sparse inputs for the following metrics: metrics.accuracy_score, metrics.multilabel_confusion_matrix, metrics.jaccard_score, metrics.zero_one_loss, metrics.f1_score, metrics.fbeta_score, metrics.precision_recall_fscore_support, metrics.class_likelihood_ratios, metrics.precision_score, metrics.recall_score, metrics.classification_report, metrics.hamming_loss. By Lucy Liu. #32047
Fix metrics.median_absolute_error now uses _averaged_weighted_percentile instead of _weighted_percentile to calculate median when sample_weight is not None. This is equivalent to using the “averaged_inverted_cdf” instead of the “inverted_cdf” quantile method, which gives results equivalent to numpy.median if equal weights used. By Lucy Liu #30787
Fix Additional sample_weight checking has been added to metrics.accuracy_score, metrics.balanced_accuracy_score, metrics.brier_score_loss, metrics.class_likelihood_ratios, metrics.classification_report, metrics.cohen_kappa_score, metrics.confusion_matrix, metrics.f1_score, metrics.fbeta_score, metrics.hamming_loss, metrics.jaccard_score, metrics.matthews_corrcoef, metrics.multilabel_confusion_matrix, metrics.precision_recall_fscore_support, metrics.precision_score, metrics.recall_score and metrics.zero_one_loss. sample_weight can only be 1D, consistent to y_true and y_pred in length,and all values must be finite and not complex. By Lucy Liu. #31701
Fix y_pred is deprecated in favour of y_score in metrics.DetCurveDisplay.from_predictions and metrics.PrecisionRecallDisplay.from_predictions. y_pred will be removed in v1.10. By Luis #31764
Fix repr on a scorer which has been created with a partial score_func now correctly works and uses the repr of the given partial object. By Adrin Jalali. #31891
Fix kwargs specified in the curve_kwargs parameter of metrics.RocCurveDisplay.from_cv_results now only overwrite their corresponding default value before being passed to Matplotlib’s plot. Previously, passing any curve_kwargs would overwrite all default kwargs. By Lucy Liu. #32313
Fix Registered named scorer objects for metrics.d2_brier_score and metrics.d2_log_loss_score and updated their input validation to be consistent with related metric functions. By Olivier Grisel and Omar Salman #32356
Fix metrics.RocCurveDisplay.from_cv_results will now infer pos_label as estimator.classes_[-1], using the estimator from cv_results, when pos_label=None. Previously, an error was raised when pos_label=None. By Lucy Liu. #32372
Fix All classification metrics now raise a ValueError when required input arrays (y_pred, y_true, y1, y2, pred_decision, or y_proba) are empty. Previously, accuracy_score, class_likelihood_ratios, classification_report, confusion_matrix, hamming_loss, jaccard_score, matthews_corrcoef, multilabel_confusion_matrix, and precision_recall_fscore_support did not raise this error consistently. By Stefanie Senger. #32549
API Change metrics.cluster.entropy is deprecated and will be removed in v1.10. By Lucy Liu #31294
API Change The estimator_name parameter is deprecated in favour of name in metrics.PrecisionRecallDisplay and will be removed in 1.10. By Lucy Liu. #32310

`sklearn.model_selection`#

Enhancement model_selection.StratifiedShuffleSplit will now specify which classes
have too few members when raising a ValueError if any class has less than 2 members. This is useful to identify which classes are causing the error. By Marc Bresson #32265
Fix Fix shuffle behaviour in model_selection.StratifiedGroupKFold. Now stratification among folds is also preserved when shuffle=True. By Pau Folch. #32540

`sklearn.multiclass`#

Fix Fix tie-breaking behavior in multiclass.OneVsRestClassifier to match np.argmax tie-breaking behavior. By Lakshmi Krishnan. #15504

`sklearn.naive_bayes`#

Fix naive_bayes.GaussianNB preserves the dtype of the fitted attributes according to the dtype of X. By Omar Salman #32497

`sklearn.preprocessing`#

Enhancement preprocessing.SplineTransformer can now handle missing values with the parameter handle_missing. By Stefanie Senger. #28043
Enhancement The preprocessing.PowerTransformer now returns a warning when NaN values are encountered in the inverse transform, inverse_transform, typically caused by extremely skewed data. By Roberto Mourao #29307
Enhancement preprocessing.MaxAbsScaler can now clip out-of-range values in held-out data with the parameter clip. By Hleb Levitski. #31790

`sklearn.semi_supervised`#

Fix User written kernel results are now normalized in semi_supervised.LabelPropagation so all row sums equal 1 even if kernel gives asymmetric or non-uniform row sums. By Dan Schult. #31924

`sklearn.tree`#

Efficiency tree.DecisionTreeRegressor with criterion="absolute_error" now runs much faster: O(n log n) complexity against previous O(n^2) allowing to scale to millions of data points, even hundred of millions. By Arthur Lacote #32100
Fix Make tree.export_text thread-safe. By Olivier Grisel. #30041
Fix export_graphviz now raises a ValueError if given feature names are not all strings. By Guilherme Peixoto #31036
Fix tree.DecisionTreeRegressor with criterion="absolute_error" would sometimes make sub-optimal splits (i.e. splits that don’t minimize the absolute error). Now it’s fixed. Hence retraining trees might gives slightly different results. By Arthur Lacote #32100
Fix Fixed a regression in decision trees where almost constant features were not handled properly. By Sercan Turkmen. #32259
Fix Fix handling of missing values in method decision_path of trees (tree.DecisionTreeClassifier, tree.DecisionTreeRegressor, tree.ExtraTreeClassifier and tree.ExtraTreeRegressor) By Arthur Lacote. #32280
Fix Fix decision tree splitting with missing values present in some features. In some cases the last non-missing sample would not be partitioned correctly. By Tim Head and Arthur Lacote. #32351

`sklearn.utils`#

Efficiency The function sklearn.utils.extmath.safe_sparse_dot was improved by a dedicated Cython routine for the case of a @ b with sparse 2-dimensional a and b and when a dense output is required, i.e., dense_output=True. This improves several algorithms in scikit-learn when dealing with sparse arrays (or matrices). By Christian Lorentzen. #31952
Enhancement The parameter table in the HTML representation of all scikit-learn estimators and more generally of estimators inheriting from base.BaseEstimator now displays the parameter description as a tooltip and has a link to the online documentation for each parameter. By Dea María Léon. #31564
Enhancement sklearn.utils._check_sample_weight now raises a clearer error message when the provided weights are neither a scalar nor a 1-D array-like of the same size as the input data. By Kapil Parekh. #31873
Enhancement sklearn.utils.estimator_checks.parametrize_with_checks now lets you configure strict mode for xfailing checks. Tests that unexpectedly pass will lead to a test failure. The default behaviour is unchanged. By Tim Head. #31951
Enhancement Fixed the alignment of the “?” and “i” symbols and improved the color style of the HTML representation of estimators. By Guillaume Lemaitre. #31969
Fix Changes the way color are chosen when displaying an estimator as an HTML representation. Colors are not adapted anymore to the user’s theme, but chosen based on theme declared color scheme (light or dark) for VSCode and JupyterLab. If theme does not declare a color scheme, scheme is chosen according to default text color of the page, if it fails fallbacks to a media query. By Matt J.. #32330
API Change utils.extmath.stable_cumsum is deprecated and will be removed in v1.10. Use np.cumulative_sum with the desired dtype directly instead. By Tiziano Zito. #32258

Code and documentation contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.7, including:

TODO: update at the time of the release.