Version 1.8#
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.8.dev0#
September 2025
Support for Array API#
Additional estimators and functions have been updated to include support for all Array API compliant inputs.
See Array API support (experimental) for more details.
Feature
sklearn.preprocessing.StandardScaler
now supports Array API compliant inputs. #27113 by Alexander Fabisch, Edoardo Abati, Olivier Grisel and Charles Hill. #27113Feature
sklearn.metrics.confusion_matrix
now supports Array API compatible inputs. By Stefanie Senger #30562Feature
sklearn.gaussian_mixture.GaussianMixture
withinit_params="random"
orinit_params="random_from_data"
andwarm_start=False
now supports Array API compatible inputs. By Stefanie Senger and Loïc Estève #30777Feature
sklearn.metrics.roc_curve
now supports Array API compatible inputs. By Thomas Li #30878Feature
preprocessing.PolynomialFeatures
now supports array API compatible inputs. By Omar Salman #31580Enhancement
metrics.pairwise.pairwise_kernels
now supports Array API compatible inputs, when the underlingmetric
does (the only metric NOT currently supported issklearn.metrics.pairwise.laplacian_kernel
). By Emily Chen and Lucy Liu.metrics.pairwise.pairwise_distances
now supports Array API compatible inputs, when the underlyingmetric
does (currently “cosine”, “euclidean” and “l2”). By Emily Chen and Lucy Liu. #29822
Metadata routing#
Refer to the Metadata Routing User Guide for more details.
Fix Fixed an issue where passing
sample_weight
to aPipeline
inside aGridSearchCV
would raise an error with metadata routing enabled. By Adrin Jalali. #31898
sklearn.base
#
Feature Refactored :method:`dir` in
BaseEstimator
to recognize condition check in :method:`available_if`. By John Hendricks and Miguel Parece. #31928
sklearn.calibration
#
Feature Added temperature scaling method in
caliabration.CalibratedClassifierCV
. By Virgil Chan and Christian Lorentzen. #31068
sklearn.cluster
#
Efficiency
cluster.kmeans_plusplus
now usesnp.cumsum
directly without extra numerical stability checks and without casting tonp.float64
. By Tiziano Zito #31991
sklearn.compose
#
Fix
compose.TransformedTargetRegressor
now passes the transformed target to the regressor with the same number of dimensions as the original target. By kryggird. #31563
sklearn.decomposition
#
Fix Add input checks to the
inverse_transform
method ofdecomposition.PCA
anddecomposition.IncrementalPCA
. #29310 by Ian Faust. #29310
sklearn.ensemble
#
Fix
ensemble.BaggingClassifier
,ensemble.BaggingRegressor
andensemble.IsolationForest
now usesample_weight
to draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore,max_samples
is now interpreted as a fraction ofsample_weight.sum()
instead ofX.shape[0]
when passed as a float. By Antoine Baker. #31414
sklearn.feature_extraction
#
Fix Set the tag
requires_fit=False
for the classesfeature_extraction.FeatureHasher
andfeature_extraction.HashingVectorizer
. By hakan çanakcı. #31851
sklearn.gaussian_process
#
Efficiency make
GaussianProcessRegressor.predict
faster whenreturn_cov
andreturn_std
are bothFalse
. By Rafael Ayllón Gavilán. #31431
sklearn.impute
#
Fix Fixed a bug in
impute.SimpleImputer
withstrategy="most_frequent"
when there is a tie in the most frequent value and the input data has mixed types. By Alexandre Abraham. #31820
sklearn.linear_model
#
Efficiency class:
linear_model:ElasticNet
and class:linear_model:Lasso
withprecompute=False
use less memory for denseX
and are a bit faster. Previously, they used twice the memory ofX
even for Fortran-contiguousX
. By Christian Lorentzen #31665Efficiency
linear_model.ElasticNet
andlinear_model.Lasso
avoid double input checking and are therefore a bit faster. By Christian Lorentzen. #31848Efficiency
linear_model.ElasticNet
,linear_model.ElasticNetCV
,linear_model.Lasso
,linear_model.LassoCV
,linear_model.MultiTaskElasticNet
,linear_model.MultiTaskElasticNetCV
,linear_model.MultiTaskLasso
andlinear_model.MultiTaskLassoCV
are faster to fit by avoiding a BLAS level 1 (axpy) call in the innermost loop. Same for functionslinear_model.enet_path
andlinear_model.lasso_path
. By Christian Lorentzen #31956 and #31880Efficiency
linear_model.ElasticNet
,linear_model.ElasticNetCV
,linear_model.Lasso
,linear_model.LassoCV
as well aslinear_model.lasso_path
andlinear_model.enet_path
now implement gap safe screening rules in the coordinate descent solver for denseX
andprecompute=False
or"auto"
withn_samples < n_features
. The speedup of fitting time is particularly pronounced (10-times is possible) when computing regularization paths like the *CV-variants of the above estimators do. There is now an additional check of the stopping criterion before entering the main loop of descent steps. As the stopping criterion requires the computation of the dual gap, the screening happens whenever the dual gap is computed. By Christian Lorentzen. #31882Efficiency
linear_model.ElasticNetCV
,linear_model.LassoCV
,linear_model.MultiTaskElasticNetCV
andlinear_model.MultiTaskLassoCV
avoid an additional copy ofX
with defaultcopy_X=True
. By Christian Lorentzen. #31946Enhancement
linear_model.ElasticNet
,linear_model.ElasticNetCV
,linear_model.Lasso
,linear_model.LassoCV
,MultiTaskElasticNet
,MultiTaskElasticNetCV
,MultiTaskLasso
,MultiTaskLassoCV
, as well aslinear_model.enet_path
andlinear_model.lasso_path
now usedual gap <= tol
instead ofdual gap < tol
as stopping criterion. The resulting coefficients might differ to previous versions of scikit-learn in rare cases. By Christian Lorentzen. #31906Fix Fixed a bug in class:
linear_model:LogisticRegression
when used with
solver="newton-cholesky"`and `warm_start=True
on multi-class problems, eitherwith
fit_intercept=True
or withpenalty=None
(both resulting in unpenalized parameters for the solver). The coefficients and intercepts of the last class as provided by warm start were partially wrongly overwritten by zero. By Christian Lorentzen #31866
API Change
PassiveAggressiveClassifier
andPassiveAggressiveRegressor
are deprecated and will be removed in 1.10. Equivalent estimators are available withSGDClassifier
andSGDRegressor
, both of which expose the optionslearning_rate="pa1"
and"pa2"
. The parametereta0
can be used to specify the aggressiveness parameter of the Passive-Aggressive-Algorithms, called C in the reference paper. By Christian Lorentzen #31932 and #29097API Change
linear_model.SGDClassifier
,linear_model.SGDRegressor
, andlinear_model.SGDOneClassSVM
now deprecate negative values for thepower_t
parameter. Using a negative value will raise a warning in version 1.8 and will raise an error in version 1.10. A value in the range [0.0, inf) must be used instead. By Ritvi Alagusankar #31474
sklearn.metrics
#
Feature
metrics.d2_brier_score
has been added which calculates the D^2 for the Brier score. By Omar Salman. #28971Enhancement
metrics.median_absolute_error
now supports Array API compatible inputs. By Lucy Liu. #31406Fix
metrics.median_absolute_error
now uses_averaged_weighted_percentile
instead of_weighted_percentile
to calculate median whensample_weight
is notNone
. This is equivalent to using the “averaged_inverted_cdf” instead of the “inverted_cdf” quantile method, which gives results equivalent tonumpy.median
if equal weights used. By Lucy Liu #30787Fix
y_pred
is deprecated in favour ofy_score
inmetrics.DetCurveDisplay.from_predictions
andmetrics.PrecisionRecallDisplay.from_predictions
.y_pred
will be removed in v1.10. By Luis #31764Fix
repr
on a scorer which has been created with apartial
score_func
now correctly works and uses therepr
of the givenpartial
object. By Adrin Jalali. #31891
- Fix - Additional
sample_weight
checking has been added to
metrics.accuracy_score
,metrics.balanced_accuracy_score
,metrics.brier_score_loss
,metrics.class_likelihood_ratios
,metrics.classification_report
,metrics.cohen_kappa_score
,metrics.confusion_matrix
,metrics.f1_score
,metrics.fbeta_score
,metrics.hamming_loss
,metrics.jaccard_score
,metrics.matthews_corrcoef
,metrics.multilabel_confusion_matrix
,metrics.precision_recall_fscore_support
,metrics.precision_score
,metrics.recall_score
andmetrics.zero_one_loss
.sample_weight
can only be 1D, consistent toy_true
andy_pred
in length,and all values must be finite and not complex. By Lucy Liu. #31701
sklearn.multiclass
#
Fix Fix tie-breaking behavior in
multiclass.OneVsRestClassifier
to matchnp.argmax
tie-breaking behavior. By Lakshmi Krishnan. #15504
sklearn.pipeline
#
Fix
pipeline.FeatureUnion
now validates that all transformers return 2D outputs and raises an informative error when transformers return 1D outputs, preventing silent failures that previously produced meaningless concatenated results. By gguiomar. #31559
sklearn.preprocessing
#
Enhancement
preprocessing.SplineTransformer
can now handle missing values with the parameterhandle_missing
. By Stefanie Senger. #28043Enhancement
preprocessing.MaxAbsScaler
can now clip out-of-range values in held-out data with the parameterclip
. By Hleb Levitski. #31790
sklearn.tree
#
Fix Make
tree.export_text
thread-safe. By Olivier Grisel. #30041
sklearn.utils
#
Efficiency The function
linear_model.utils.safe_sparse_dot
was improved by a dedicated Cython routine for the case ofa @ b
with sparse 2-dimensionala
andb
and when a dense output is required, i.e.,dense_output=True
. This improves several algorithms in scikit-learn when dealing with sparse arrays (or matrices). By Christian Lorentzen. #31952Enhancement
sklearn.utils.estimator_checks.parametrize_with_checks
now lets you configure strict mode for xfailing checks. Tests that unexpectedly pass will lead to a test failure. The default behaviour is unchanged. By Tim Head. #31951
` |Enhancement|`sklearn.utils._check_sample_weight`` now raises a clearer error message when the provided weights are neither a scalar nor a 1-D array-like of the same size as the input data. #31712 by Kapil Parekh. #31873
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.7, including:
TODO: update at the time of the release.