.. include:: _contributors.rst .. currentmodule:: sklearn .. _release_notes_1_8: =========== Version 1.8 =========== .. -- UNCOMMENT WHEN 1.8.0 IS RELEASED -- For a short description of the main highlights of the release, please refer to :ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_7_0.py`. .. DELETE WHEN 1.8.0 IS RELEASED Since October 2024, DO NOT add your changelog entry in this file. .. Instead, create a file named `..rst` in the relevant sub-folder in `doc/whats_new/upcoming_changes/`. For full details, see: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md .. include:: changelog_legend.inc .. towncrier release notes start .. _changes_1_8_dev0: Version 1.8.dev0 ================ **September 2025** Support for Array API --------------------- Additional estimators and functions have been updated to include support for all `Array API `_ compliant inputs. See :ref:`array_api` for more details. - |Feature| :class:`sklearn.preprocessing.StandardScaler` now supports Array API compliant inputs. :pr:`27113` by :user:`Alexander Fabisch `, :user:`Edoardo Abati `, :user:`Olivier Grisel ` and :user:`Charles Hill `. :pr:`27113` - |Feature| :func:`sklearn.metrics.confusion_matrix` now supports Array API compatible inputs. By :user:`Stefanie Senger ` :pr:`30562` - |Feature| :class:`sklearn.gaussian_mixture.GaussianMixture` with `init_params="random"` or `init_params="random_from_data"` and `warm_start=False` now supports Array API compatible inputs. By :user:`Stefanie Senger ` and :user:`Loïc Estève ` :pr:`30777` - |Feature| :func:`sklearn.metrics.roc_curve` now supports Array API compatible inputs. By :user:`Thomas Li ` :pr:`30878` - |Feature| :class:`preprocessing.PolynomialFeatures` now supports array API compatible inputs. By :user:`Omar Salman ` :pr:`31580` - |Enhancement| :func:`metrics.pairwise.pairwise_kernels` now supports Array API compatible inputs, when the underling `metric` does (the only metric NOT currently supported is :func:`sklearn.metrics.pairwise.laplacian_kernel`). By :user:`Emily Chen ` and :user:`Lucy Liu `. - :func:`metrics.pairwise.pairwise_distances` now supports Array API compatible inputs, when the underlying `metric` does (currently "cosine", "euclidean" and "l2"). By :user:`Emily Chen ` and :user:`Lucy Liu `. :pr:`29822` Metadata routing ---------------- Refer to the :ref:`Metadata Routing User Guide ` for more details. - |Fix| Fixed an issue where passing `sample_weight` to a :class:`Pipeline` inside a :class:`GridSearchCV` would raise an error with metadata routing enabled. By `Adrin Jalali`_. :pr:`31898` :mod:`sklearn.base` ------------------- - |Feature| Refactored :method:`dir` in :class:`BaseEstimator` to recognize condition check in :method:`available_if`. By :user:`John Hendricks ` and :user:`Miguel Parece `. :pr:`31928` :mod:`sklearn.calibration` -------------------------- - |Feature| Added temperature scaling method in :class:`caliabration.CalibratedClassifierCV`. By :user:`Virgil Chan ` and :user:`Christian Lorentzen `. :pr:`31068` :mod:`sklearn.cluster` ---------------------- - |Efficiency| :func:`cluster.kmeans_plusplus` now uses `np.cumsum` directly without extra numerical stability checks and without casting to `np.float64`. By :user:`Tiziano Zito ` :pr:`31991` :mod:`sklearn.compose` ---------------------- - |Fix| :class:`compose.TransformedTargetRegressor` now passes the transformed target to the regressor with the same number of dimensions as the original target. By :user:`kryggird `. :pr:`31563` :mod:`sklearn.decomposition` ---------------------------- - |Fix| Add input checks to the `inverse_transform` method of :class:`decomposition.PCA` and :class:`decomposition.IncrementalPCA`. :pr:`29310` by :user:`Ian Faust `. :pr:`29310` :mod:`sklearn.ensemble` ----------------------- - |Fix| :class:`ensemble.BaggingClassifier`, :class:`ensemble.BaggingRegressor` and :class:`ensemble.IsolationForest` now use `sample_weight` to draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore, `max_samples` is now interpreted as a fraction of `sample_weight.sum()` instead of `X.shape[0]` when passed as a float. By :user:`Antoine Baker `. :pr:`31414` :mod:`sklearn.feature_extraction` --------------------------------- - |Fix| Set the tag `requires_fit=False` for the classes :class:`feature_extraction.FeatureHasher` and :class:`feature_extraction.HashingVectorizer`. By :user:`hakan çanakcı `. :pr:`31851` :mod:`sklearn.gaussian_process` ------------------------------- - |Efficiency| make :class:`GaussianProcessRegressor.predict` faster when `return_cov` and `return_std` are both `False`. By :user:`Rafael Ayllón Gavilán `. :pr:`31431` :mod:`sklearn.impute` --------------------- - |Fix| Fixed a bug in :class:`impute.SimpleImputer` with `strategy="most_frequent"` when there is a tie in the most frequent value and the input data has mixed types. By :user:`Alexandre Abraham `. :pr:`31820` :mod:`sklearn.linear_model` --------------------------- - |Efficiency| class:`linear_model:ElasticNet` and class:`linear_model:Lasso` with `precompute=False` use less memory for dense `X` and are a bit faster. Previously, they used twice the memory of `X` even for Fortran-contiguous `X`. By :user:`Christian Lorentzen ` :pr:`31665` - |Efficiency| :class:`linear_model.ElasticNet` and :class:`linear_model.Lasso` avoid double input checking and are therefore a bit faster. By :user:`Christian Lorentzen `. :pr:`31848` - |Efficiency| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV`, :class:`linear_model.MultiTaskElasticNet`, :class:`linear_model.MultiTaskElasticNetCV`, :class:`linear_model.MultiTaskLasso` and :class:`linear_model.MultiTaskLassoCV` are faster to fit by avoiding a BLAS level 1 (axpy) call in the innermost loop. Same for functions :func:`linear_model.enet_path` and :func:`linear_model.lasso_path`. By :user:`Christian Lorentzen ` :pr:`31956` and :pr:`31880` - |Efficiency| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV` as well as :func:`linear_model.lasso_path` and :func:`linear_model.enet_path` now implement gap safe screening rules in the coordinate descent solver for dense `X` and `precompute=False` or `"auto"` with `n_samples < n_features`. The speedup of fitting time is particularly pronounced (10-times is possible) when computing regularization paths like the \*CV-variants of the above estimators do. There is now an additional check of the stopping criterion before entering the main loop of descent steps. As the stopping criterion requires the computation of the dual gap, the screening happens whenever the dual gap is computed. By :user:`Christian Lorentzen `. :pr:`31882` - |Efficiency| :class:`linear_model.ElasticNetCV`, :class:`linear_model.LassoCV`, :class:`linear_model.MultiTaskElasticNetCV` and :class:`linear_model.MultiTaskLassoCV` avoid an additional copy of `X` with default `copy_X=True`. By :user:`Christian Lorentzen `. :pr:`31946` - |Enhancement| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV`, :class:`MultiTaskElasticNet`, :class:`MultiTaskElasticNetCV`, :class:`MultiTaskLasso`, :class:`MultiTaskLassoCV`, as well as :func:`linear_model.enet_path` and :func:`linear_model.lasso_path` now use `dual gap <= tol` instead of `dual gap < tol` as stopping criterion. The resulting coefficients might differ to previous versions of scikit-learn in rare cases. By :user:`Christian Lorentzen `. :pr:`31906` - |Fix| Fixed a bug in class:`linear_model:LogisticRegression` when used with `solver="newton-cholesky"`and `warm_start=True` on multi-class problems, either with `fit_intercept=True` or with `penalty=None` (both resulting in unpenalized parameters for the solver). The coefficients and intercepts of the last class as provided by warm start were partially wrongly overwritten by zero. By :user:`Christian Lorentzen ` :pr:`31866` - |API| `PassiveAggressiveClassifier` and `PassiveAggressiveRegressor` are deprecated and will be removed in 1.10. Equivalent estimators are available with `SGDClassifier` and `SGDRegressor`, both of which expose the options `learning_rate="pa1"` and `"pa2"`. The parameter `eta0` can be used to specify the aggressiveness parameter of the Passive-Aggressive-Algorithms, called C in the reference paper. By :user:`Christian Lorentzen ` :pr:`31932` and :pr:`29097` - |API| :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDRegressor`, and :class:`linear_model.SGDOneClassSVM` now deprecate negative values for the `power_t` parameter. Using a negative value will raise a warning in version 1.8 and will raise an error in version 1.10. A value in the range [0.0, inf) must be used instead. By :user:`Ritvi Alagusankar ` :pr:`31474` :mod:`sklearn.metrics` ---------------------- - |Feature| :func:`metrics.d2_brier_score` has been added which calculates the D^2 for the Brier score. By :user:`Omar Salman `. :pr:`28971` - |Enhancement| :func:`metrics.median_absolute_error` now supports Array API compatible inputs. By :user:`Lucy Liu `. :pr:`31406` - |Fix| :func:`metrics.median_absolute_error` now uses `_averaged_weighted_percentile` instead of `_weighted_percentile` to calculate median when `sample_weight` is not `None`. This is equivalent to using the "averaged_inverted_cdf" instead of the "inverted_cdf" quantile method, which gives results equivalent to `numpy.median` if equal weights used. By :user:`Lucy Liu ` :pr:`30787` - |Fix| `y_pred` is deprecated in favour of `y_score` in :func:`metrics.DetCurveDisplay.from_predictions` and :func:`metrics.PrecisionRecallDisplay.from_predictions`. `y_pred` will be removed in v1.10. By :user:`Luis ` :pr:`31764` - |Fix| `repr` on a scorer which has been created with a `partial` `score_func` now correctly works and uses the `repr` of the given `partial` object. By `Adrin Jalali`_. :pr:`31891` |Fix|- Additional `sample_weight` checking has been added to :func:`metrics.accuracy_score`, :func:`metrics.balanced_accuracy_score`, :func:`metrics.brier_score_loss`, :func:`metrics.class_likelihood_ratios`, :func:`metrics.classification_report`, :func:`metrics.cohen_kappa_score`, :func:`metrics.confusion_matrix`, :func:`metrics.f1_score`, :func:`metrics.fbeta_score`, :func:`metrics.hamming_loss`, :func:`metrics.jaccard_score`, :func:`metrics.matthews_corrcoef`, :func:`metrics.multilabel_confusion_matrix`, :func:`metrics.precision_recall_fscore_support`, :func:`metrics.precision_score`, :func:`metrics.recall_score` and :func:`metrics.zero_one_loss`. `sample_weight` can only be 1D, consistent to `y_true` and `y_pred` in length,and all values must be finite and not complex. By :user:`Lucy Liu `. :pr:`31701` - |API| :func:`metrics.cluster.entropy` is deprecated and will be removed in v1.10. By :user:`Lucy Liu ` :pr:`31294` :mod:`sklearn.multiclass` ------------------------- - |Fix| Fix tie-breaking behavior in :class:`multiclass.OneVsRestClassifier` to match `np.argmax` tie-breaking behavior. By :user:`Lakshmi Krishnan `. :pr:`15504` :mod:`sklearn.pipeline` ----------------------- - |Fix| :class:`pipeline.FeatureUnion` now validates that all transformers return 2D outputs and raises an informative error when transformers return 1D outputs, preventing silent failures that previously produced meaningless concatenated results. By :user:`gguiomar `. :pr:`31559` :mod:`sklearn.preprocessing` ---------------------------- - |Enhancement| :class:`preprocessing.SplineTransformer` can now handle missing values with the parameter `handle_missing`. By :user:`Stefanie Senger `. :pr:`28043` - |Enhancement| The :class:`preprocessing.PowerTransformer` now returns a warning when NaN values are encountered in the inverse transform, `inverse_transform`, typically caused by extremely skewed data. By :user:Roberto Mourao :pr:`29307` - |Enhancement| :class:`preprocessing.MaxAbsScaler` can now clip out-of-range values in held-out data with the parameter `clip`. By :user:`Hleb Levitski `. :pr:`31790` :mod:`sklearn.tree` ------------------- - |Fix| Make :func:`tree.export_text` thread-safe. By :user:`Olivier Grisel `. :pr:`30041` :mod:`sklearn.utils` -------------------- - |Efficiency| The function :func:`linear_model.utils.safe_sparse_dot` was improved by a dedicated Cython routine for the case of `a @ b` with sparse 2-dimensional `a` and `b` and when a dense output is required, i.e., `dense_output=True`. This improves several algorithms in scikit-learn when dealing with sparse arrays (or matrices). By :user:`Christian Lorentzen `. :pr:`31952` - |Enhancement| ``sklearn.utils.estimator_checks.parametrize_with_checks`` now lets you configure strict mode for xfailing checks. Tests that unexpectedly pass will lead to a test failure. The default behaviour is unchanged. By :user:`Tim Head `. :pr:`31951` ` |Enhancement|`sklearn.utils._check_sample_weight`` now raises a clearer error message when the provided weights are neither a scalar nor a 1-D array-like of the same size as the input data. :issue:`31712` by :user:`Kapil Parekh `. :pr:`31873` .. rubric:: Code and documentation contributors Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.7, including: TODO: update at the time of the release.