{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Release Highlights for scikit-learn 1.6\n\n.. currentmodule:: sklearn\n\nWe are pleased to announce the release of scikit-learn 1.6! Many bug fixes\nand improvements were added, as well as some key new features. Below we\ndetail the highlights of this release. **For an exhaustive list of\nall the changes**, please refer to the `release notes `.\n\nTo install the latest version (with pip)::\n\n pip install --upgrade scikit-learn\n\nor with conda::\n\n conda install -c conda-forge scikit-learn\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## FrozenEstimator: Freezing an estimator\n\nThis meta-estimator allows you to take an estimator and freeze its fit method, meaning\nthat calling `fit` does not perform any operations; also, `fit_predict` and\n`fit_transform` call `predict` and `transform` respectively without calling `fit`. The\noriginal estimator's other methods and properties are left unchanged. An interesting\nuse case for this is to use a pre-fitted model as a transformer step in a pipeline\nor to pass a pre-fitted model to some of the meta-estimators. Here's a short example:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import time\nfrom sklearn.datasets import make_classification\nfrom sklearn.frozen import FrozenEstimator\nfrom sklearn.linear_model import SGDClassifier\nfrom sklearn.model_selection import FixedThresholdClassifier\n\nX, y = make_classification(n_samples=1000, random_state=0)\n\nstart = time.time()\nclassifier = SGDClassifier().fit(X, y)\nprint(f\"Fitting the classifier took {(time.time() - start) * 1_000:.2f} milliseconds\")\n\nstart = time.time()\nthreshold_classifier = FixedThresholdClassifier(\n estimator=FrozenEstimator(classifier), threshold=0.9\n).fit(X, y)\nprint(\n f\"Fitting the threshold classifier took {(time.time() - start) * 1_000:.2f} \"\n \"milliseconds\"\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fitting the threshold classifier skipped fitting the inner `SGDClassifier`. For more\ndetails refer to the example `sphx_glr_auto_examples_frozen_plot_frozen_examples.py`.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transforming data other than X in a Pipeline\n\nThe :class:`~pipeline.Pipeline` now supports transforming passed data other than `X`\nif necessary. This can be done by setting the new `transform_input` parameter. This\nis particularly useful when passing a validation set through the pipeline.\n\nAs an example, imagine `EstimatorWithValidationSet` is an estimator which accepts\na validation set. We can now have a pipeline which will transform the validation set\nand pass it to the estimator::\n\n sklearn.set_config(enable_metadata_routing=True)\n est_gs = GridSearchCV(\n Pipeline(\n (\n StandardScaler(),\n EstimatorWithValidationSet(...).set_fit_request(X_val=True, y_val=True),\n ),\n # telling pipeline to transform these inputs up to the step which is\n # requesting them.\n transform_input=[\"X_val\"],\n ),\n param_grid={\"estimatorwithvalidationset__param_to_optimize\": list(range(5))},\n cv=5,\n ).fit(X, y, X_val=X_val, y_val=y_val)\n\nIn the above code, the key parts are the call to `set_fit_request` to specify that\n`X_val` and `y_val` are required by the `EstimatorWithValidationSet.fit` method, and\nthe `transform_input` parameter to tell the pipeline to transform `X_val` before\npassing it to `EstimatorWithValidationSet.fit`.\n\nNote that at this time scikit-learn estimators have not yet been extended to accept\nuser specified validation sets. This feature is released early to collect feedback\nfrom third-party libraries who might benefit from it.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multiclass support for `LogisticRegression(solver=\"newton-cholesky\")`\n\nThe `\"newton-cholesky\"` solver (originally introduced in scikit-learn version\n1.2) was previously limited to binary\n:class:`~linear_model.LogisticRegression` and some other generalized linear\nregression estimators (namely :class:`~linear_model.PoissonRegressor`,\n:class:`~linear_model.GammaRegressor` and\n:class:`~linear_model.TweedieRegressor`).\n\nThis new release includes support for multiclass (multinomial)\n:class:`~linear_model.LogisticRegression`.\n\nThis solver is particularly useful when the number of features is small to\nmedium. It has been empirically shown to converge more reliably and faster\nthan other solvers on some medium sized datasets with one-hot encoded\ncategorical features as can be seen in the [benchmark results of the\npull-request](https://github.com/scikit-learn/scikit-learn/pull/28840#issuecomment-2065368727).\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Missing value support for Extra Trees\n\nThe classes :class:`ensemble.ExtraTreesClassifier` and\n:class:`ensemble.ExtraTreesRegressor` now support missing values. More details in the\n`User Guide `.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\nfrom sklearn.ensemble import ExtraTreesClassifier\n\nX = np.array([0, 1, 6, np.nan]).reshape(-1, 1)\ny = [0, 0, 1, 1]\n\nforest = ExtraTreesClassifier(random_state=0).fit(X, y)\nforest.predict(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download any dataset from the web\n\nThe function :func:`datasets.fetch_file` allows downloading a file from any given URL.\nThis convenience function provides built-in local disk caching, sha256 digest\nintegrity check and an automated retry mechanism on network error.\n\nThe goal is to provide the same convenience and reliability as dataset fetchers while\ngiving the flexibility to work with data from arbitrary online sources and file\nformats.\n\nThe dowloaded file can then be loaded with generic or domain specific functions such\nas `pandas.read_csv`, `pandas.read_parquet`, etc.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array API support\n\nMany more estimators and functions have been updated to support array API compatible\ninputs since version 1.5, in particular the meta-estimators for hyperparameter tuning\nfrom the :mod:`sklearn.model_selection` module and the metrics from the\n:mod:`sklearn.metrics` module.\n\nPlease refer to the `array API support` page for instructions to use\nscikit-learn with array API compatible libraries such as PyTorch or CuPy.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Almost complete Metadata Routing support\n\nSupport for routing metadata has been added to all remaining estimators and\nfunctions except AdaBoost. See `Metadata Routing User Guide `\nfor more details.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Free-threaded CPython 3.13 support\n\nscikit-learn has preliminary support for free-threaded CPython, in particular\nfree-threaded wheels are available for all of our supported platforms.\n\nFree-threaded (also known as nogil) CPython 3.13 is an experimental version of\nCPython 3.13 which aims at enabling efficient multi-threaded use cases by\nremoving the Global Interpreter Lock (GIL).\n\nFor more details about free-threaded CPython see [py-free-threading doc](https://py-free-threading.github.io),\nin particular [how to install a free-threaded CPython](https://py-free-threading.github.io/installing_cpython/)\nand [Ecosystem compatibility tracking](https://py-free-threading.github.io/tracking/).\n\nFeel free to try free-threaded CPython on your use case and report any issues!\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Improvements to the developer API for third party libraries\n\nWe have been working on improving the developer API for third party libraries.\nThis is still a work in progress, but a fair amount of work has been done in this\nrelease. This release includes:\n\n- :func:`sklearn.utils.validation.validate_data` is introduced and replaces the\n previously private `BaseEstimator._validate_data` method. This function extends\n :func:`~sklearn.utils.validation.check_array` and adds support for remembering\n input feature counts and names.\n- Estimator tags are now revamped and a part of the public API via\n :class:`sklearn.utils.Tags`. Estimators should now override the\n :meth:`BaseEstimator.__sklearn_tags__` method instead of implementing a `_more_tags`\n method. If you'd like to support multiple scikit-learn versions, you can implement\n both methods in your class.\n- As a consequence of developing a public tag API, we've removed the `_xfail_checks`\n tag and tests which are expected to fail are directly passed to\n :func:`~sklearn.utils.estimator_checks.check_estimator` and\n :func:`~sklearn.utils.estimator_checks.parametrize_with_checks`. See their\n corresponding API docs for more details.\n- Many tests in the common test suite are updated and raise more helpful error\n messages. We've also added some new tests, which should help you more easily fix\n potential issues with your estimators.\n\nAn updated version of our `develop` is also available, which we recommend you\ncheck out.\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }