.. include:: _contributors.rst .. currentmodule:: sklearn .. _release_notes_1_9: =========== Version 1.9 =========== .. -- UNCOMMENT WHEN 1.9.0 IS RELEASED -- For a short description of the main highlights of the release, please refer to :ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_9_0.py`. .. DELETE WHEN 1.9.0 IS RELEASED Since October 2024, DO NOT add your changelog entry in this file. .. Instead, create a file named `..rst` in the relevant sub-folder in `doc/whats_new/upcoming_changes/`. For full details, see: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md .. include:: changelog_legend.inc .. towncrier release notes start .. _changes_1_9_dev0: Version 1.9.dev0 ================ **February 2026** Changed models -------------- - |Enhancement| The :meth:`transform` method of :class:`preprocessing.PowerTransformer` with `method="yeo-johnson"` now uses the numerical more stable function `scipy.stats.yeojohnson` instead of an own implementation. The results may deviate in numerical edge cases or within the precision of floating-point arithmetic. By :user:`Christian Lorentzen `. :pr:`33272` Changes impacting many modules ------------------------------ - |Enhancement| :class:`pipeline.Pipeline`, :class:`pipeline.FeatureUnion` and :class:`compose.ColumnTransformer` now raise a clearer error message when an estimator class is passed instead of an instance. By :user:`Anne Beyer ` :pr:`32888` - |Fix| Raise ValueError when `sample_weight` contains only zero values to prevent meaningless input data during fitting. This change applies to all estimators that support the parameter `sample_weight`. This change also affects metrics that validate sample weights. By :user:`Lucy Liu ` and :user:`John Hendricks `. :pr:`32212` - |Fix| Some parameter descriptions in the HTML representation of estimators were not properly escaped, which could lead to malformed HTML if the description contains characters like `<` or `>`. By :user:`Olivier Grisel `. :pr:`32942` Support for Array API --------------------- Additional estimators and functions have been updated to include support for all `Array API `_ compliant inputs. See :ref:`array_api` for more details. - |Feature| :func:`sklearn.metrics.d2_absolute_error_score` and :func:`sklearn.metrics.d2_pinball_score` now support array API compatible inputs. By :user:`Virgil Chan `. :pr:`31671` - |Feature| :class:`linear_model.LogisticRegression` now supports array API compatible inputs with `solver="lbfgs"`. By :user:`Omar Salman ` and :user:`Olivier Grisel `. :pr:`32644` - |Feature| :func:`sklearn.metrics.ranking.average_precision_score` now supports Array API compliant inputs. By :user:`Stefanie Senger `. :pr:`32909` - |Feature| :func:`sklearn.metrics.pairwise.paired_manhattan_distances` now supports array API compatible inputs. By :user:`Bharat Raghunathan `. :pr:`32979` - |Feature| :func:`sklearn.metrics.pairwise.pairwise_distances_argmin` now supports array API compatible inputs. By :user:`Bharat Raghunathan `. :pr:`32985` - |Feature| :class:`pipeline.FeatureUnion` now supports Array API compliant inputs when all its transformers do. By :user:`Olivier Grisel `. :pr:`33263` - |Enhancement| :class:`kernel_approximation.Nystroem` now supports array API compatible inputs. By :user:`Emily Chen ` :pr:`29661` - |Fix| Fixed a bug that would cause Cython-based estimators to fail when fit on NumPy inputs when setting `sklearn.set_config(array_api_dispatch=True)`. By :user:`Olivier Grisel `. :pr:`32846` - |Fix| Fixes how `pos_label` is inferred when `pos_label` is set to `None`, in :func:`sklearn.metrics.brier_score_loss` and :func:`sklearn.metrics.d2_brier_score`. By :user:`Lucy Liu `. :pr:`32923` Metadata routing ---------------- Refer to the :ref:`Metadata Routing User Guide ` for more details. - |Enhancement| :class:`~preprocessing.TargetEncoder` now routes `groups` to the :term:`CV splitter` internally used for :term:`cross fitting` in its :meth:`~preprocessing.TargetEncoder.fit_transform`. By :user:`Samruddhi Baviskar ` and :user:`Stefanie Senger `. :pr:`33089` :mod:`sklearn.cluster` ---------------------- - |Fix| :class:`cluster.MiniBatchKMeans` now correctly handles sample weights during fitting. When sample weights are not None, mini-batch indices are created by sub-sampling with replacement using the normalized sample weights as probabilities. By :user:`Shruti Nath `, :user:`Olivier Grisel `, and :user:`Jeremie du Boisberranger `. :pr:`30751` :mod:`sklearn.compose` ---------------------- - |Fix| The dotted line for :class:`compose.ColumnTransformer` in its HTML display now includes only its elements. The behaviour when a remainder is used, has also been corrected. By :user:`Dea María Léon ` :pr:`32713` :mod:`sklearn.datasets` ----------------------- - |Efficiency| Re-enabled compressed caching for :func:`datasets.fetch_kddcup99`, reducing on-disk cache size without changing the public API. By :user:`Unique Shrestha `. :pr:`33118` :mod:`sklearn.ensemble` ----------------------- - |Fix| :class:`ensemble.RandomForestClassifier`, :class:`ensemble.RandomForestRegressor`, :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor` now use `sample_weight` to draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore, when `max_samples` is a float, it is now interpreted as a fraction of `sample_weight.sum()` instead of `X.shape[0]`. As sampling is done with replacement, a float `max_samples` greater than `1.0` is now allowed, as well as an integer `max_samples` greater then `X.shape[0]`. The default `max_samples=None` draws `X.shape[0]` samples, irrespective of `sample_weight`. By :user:`Antoine Baker `. :pr:`31529` - |Fix| Both :class:`ensemble.GradientBoostingRegressor` and :class:`ensemble.GradientBoostingClassifier` with the default `"friedman_mse"` criterion were computing impurity values with an incorrect scaling, leading to unexpected trees in some cases. The implementation now uses `"squared_error"`, which is exactly equivalent to `"friedman_mse"` up to floating-point error discrepancies but computes correct impurity values. By :user:`Arthur Lacote `. :pr:`32708` - |API| The `criterion` parameter is now deprecated for classes :class:`ensemble.GradientBoostingRegressor` and :class:`ensemble.GradientBoostingClassifier`, as both options (`"friedman_mse"` and `"squared_error"`) were producing the same results, up to floating-point rounding discrepancies and a bug in `"friedman_mse"`. By :user:`Arthur Lacote ` :pr:`32708` :mod:`sklearn.inspection` ------------------------- - |Fix| In :class:`inspection.DecisionBoundaryDisplay`, `multiclass_colors` is now also used for multiclass plotting when `response_method="predict"`. By :user:`Anne Beyer `. :pr:`33015` - |Fix| In :class:`inspection.DecisionBoundaryDisplay`, `n_classes` is now inferred more robustly from the estimator. If it fails for custom estimators, a comprehensive error message is shown. By :user:`Anne Beyer `. :pr:`33202` :mod:`sklearn.linear_model` --------------------------- - |Efficiency| :class:`linear_model.LogisticRegression` with `solver="lbfgs"` now estimates the gradient of the loss at `float32` precision when fitted with `float32` data (`X`) to improve training speed and memory efficiency. Previously, the input data would be implicitly cast to `float64`. If you relied on the previous behavior for numerical reasons, you can explicitly cast your data to `float64` before fitting to reproduce it. By :user:`Omar Salman ` and :user:`Olivier Grisel `. :pr:`32644` - |Efficiency| The :class:`linear_model.LinearRegression`, :class:`linear_model.Ridge`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV`, :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV` and :class:`linear_model.BayesianRidge` classes now no longer make an unnecessary copy of dense `X, y` input during preprocessing when `copy_X=False` and `sample_weight` is provided. By :user:`Junteng Li `. :pr:`33041` - |Enhancement| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV` and :func:`linear_model.enet_path` now are able to fit Ridge regression, i.e. setting `l1_ratio=0`. Before this PR, the stopping criterion was a formulation of the dual gap that breaks down for `l1_ratio=0`. Now, an alternative dual gap formulation is used for this setting. This reduces the noise of raised warnings. By :user:`Christian Lorentzen `. :pr:`32845` - |Enhancement| |Efficiency| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV`, :class:`linear_model.MultiTaskElasticNet`, :class:`linear_model.MultiTaskElasticNetCV` :class:`linear_model.MultiTaskLasso`, :class:`linear_model.MultiTaskLassoCV` as well as :func:`linear_model.lasso_path` and :func:`linear_model.enet_path` are now faster when fit with strong L1 penalty and many features. During gap safe screening of features, the update of the residual is now only performed if the coefficient is not zero. By :user:`Christian Lorentzen `. :pr:`33161` - |Fix| :class:`linear_model.LassoCV` and :class:`linear_model.ElasticNetCV` now take the `positive` parameter into account to compute the maximum `alpha` parameter, where all coefficients are zero. This impacts the search grid for the internally tuned `alpha` hyper-parameter stored in the attribute `alphas_`. By :user:`Junteng Li ` :pr:`32768` - |Fix| Correct the formulation of `alpha` within :class:`linear_model.SGDOneClassSVM`. The corrected value is `alpha = nu` instead of `alpha = nu / 2`. Note: This might result in changed values for the fitted attributes like `coef_` and `offset_` as well as the predictions made using this class. By :user:`Omar Salman `. :pr:`32778` - |Fix| :func:`linear_model.enet_path` now correctly handles the ``precompute`` parameter when ``check_input=False``. Previously, the value of ``precompute`` was not properly treated which could lead to a ValueError. This also affects :class:`linear_model.ElasticNetCV`, :class:`linear_model.LassoCV`, :class:`linear_model.MultiTaskElasticNetCV` and :class:`linear_model.MultiTaskLassoCV`. By :user:`Albert Dorador ` :pr:`33014` - |Fix| Fixed a bug in :class:`linear_model.SGDClassifier` for multiclass settings where large negative values of :method:`decision_function` could lead to NaN values. In this case, this fix assigns equal probability for each class. By :user:`Christian Lorentzen `. :pr:`33168` :mod:`sklearn.metrics` ---------------------- - |Enhancement| :func:`~metrics.cohen_kappa_score` now has a `replace_undefined_by` param, that can be set to define the function's return value when the metric is undefined (division by zero). By :user:`Stefanie Senger ` :pr:`31172` - |Fix| :func:`metrics.d2_pinball_score` and :func:`metrics.d2_absolute_error_score` now always use the `"averaged_inverted_cdf"` quantile method, both with and without sample weights. Previously, the `"linear"` quantile method was used only for the unweighted case leading the surprising discrepancies when comparing the results with unit weights. Note that all quantile interpolation methods are asymptotically equivalent in the large sample limit, but this fix can cause score value changes on small evaluation sets (without weights). By :user:`Virgil Chan `. :pr:`31671` :mod:`sklearn.pipeline` ----------------------- - |Fix| Fixed :class:`pipeline.FeatureUnion` to properly handle column renaming when using Polars output, preventing duplicate column names. By :user:`Levente Csibi `. :pr:`32853` :pr:`32853` :mod:`sklearn.svm` ------------------ - |Fix| Raise more informative error when fitting :class:`NuSVR` with all zero sample weights. By :user:`Lucy Liu ` and :user:`John Hendricks `. :pr:`32212` :mod:`sklearn.tree` ------------------- - |Fix| Fixed feature-wise NaN detection in trees. Features could be seen as NaN-free for some edge-case patterns, which led to not considering splits with NaNs assigned to the left node for those features. This affects: - :class:`tree.DecisionTreeRegressor` - :class:`tree.ExtraTreeRegressor` - :class:`ensemble.RandomForestRegressor` - :class:`ensemble.ExtraTreesRegressor` By :user:`Arthur Lacote ` :pr:`32193` - |API| `criterion="friedman_mse"` is now deprecated. This criterion was intended for gradient boosting but was incorrectly implemented in scikit-learn's trees and was actually behaving identically to `criterion="squared_error"`. Use `criterion="squared_error"` instead. This affects: - :class:`tree.DecisionTreeRegressor` - :class:`tree.ExtraTreeRegressor` - :class:`ensemble.RandomForestRegressor` - :class:`ensemble.ExtraTreesRegressor` By :user:`Arthur Lacote ` :pr:`32708` :mod:`sklearn.utils` -------------------- - |Enhancement| ``sklearn.utils._tags.get_tags`` now provides a clearer error message when a class is passed instead of an estimator instance. By :user:`Achyuthan S ` and :user:`Anne Beyer `. :pr:`32565` - |Enhancement| ``sklearn.utils._response._get_response_values`` now provides a clearer error message when estimator does not implement the given ``response_method``. By :user:`Quentin Barthélemy `. :pr:`33126` - |Fix| The parameter table in the HTML representation of all scikit-learn estimators inheritiging from :class:`base.BaseEstimator`, displays each parameter documentation as a tooltip. The last tooltip of a parameter in the last table of any HTML representation was partially hidden. This issue has been fixed. By :user:`Dea María Léon ` :pr:`32887` - |Fix| Fixed ``_weighted_percentile`` with ``average=True`` so zero-weight samples just before the end of the array are handled correctly. This can change results when using ``sample_weight`` with :class:`preprocessing.KBinsDiscretizer` (``strategy="quantile"``, ``quantile_method="averaged_inverted_cdf"``) and in :func:`metrics.median_absolute_error`, :func:`metrics.d2_pinball_score`, and :func:`metrics.d2_absolute_error_score`. By :user:`Arthur Lacote `. :pr:`33127` .. rubric:: Code and documentation contributors Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.8, including: TODO: update at the time of the release.