{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Displaying Pipelines\n\nThe default configuration for displaying a pipeline in a Jupyter Notebook is\n`'diagram'` where `set_config(display='diagram')`. To deactivate HTML representation,\nuse `set_config(display='text')`.\n\nTo see more detailed steps in the visualization of the pipeline, click on the\nsteps in the pipeline.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Displaying a Pipeline with a Preprocessing Step and Classifier\nThis section constructs a :class:`~sklearn.pipeline.Pipeline` with a preprocessing\nstep, :class:`~sklearn.preprocessing.StandardScaler`, and classifier,\n:class:`~sklearn.linear_model.LogisticRegression`, and displays its visual\nrepresentation.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn import set_config\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nsteps = [\n (\"preprocessing\", StandardScaler()),\n (\"classifier\", LogisticRegression()),\n]\npipe = Pipeline(steps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To visualize the diagram, the default is `display='diagram'`.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "set_config(display=\"diagram\")\npipe # click on the diagram below to see the details of each step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To view the text pipeline, change to `display='text'`.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "set_config(display=\"text\")\npipe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Put back the default display\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "set_config(display=\"diagram\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Displaying a Pipeline Chaining Multiple Preprocessing Steps & Classifier\nThis section constructs a :class:`~sklearn.pipeline.Pipeline` with multiple\npreprocessing steps, :class:`~sklearn.preprocessing.PolynomialFeatures` and\n:class:`~sklearn.preprocessing.StandardScaler`, and a classifier step,\n:class:`~sklearn.linear_model.LogisticRegression`, and displays its visual\nrepresentation.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import PolynomialFeatures, StandardScaler\n\nsteps = [\n (\"standard_scaler\", StandardScaler()),\n (\"polynomial\", PolynomialFeatures(degree=3)),\n (\"classifier\", LogisticRegression(C=2.0)),\n]\npipe = Pipeline(steps)\npipe # click on the diagram below to see the details of each step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Displaying a Pipeline and Dimensionality Reduction and Classifier\nThis section constructs a :class:`~sklearn.pipeline.Pipeline` with a\ndimensionality reduction step, :class:`~sklearn.decomposition.PCA`,\na classifier, :class:`~sklearn.svm.SVC`, and displays its visual\nrepresentation.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.decomposition import PCA\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.svm import SVC\n\nsteps = [(\"reduce_dim\", PCA(n_components=4)), (\"classifier\", SVC(kernel=\"linear\"))]\npipe = Pipeline(steps)\npipe # click on the diagram below to see the details of each step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Displaying a Complex Pipeline Chaining a Column Transformer\nThis section constructs a complex :class:`~sklearn.pipeline.Pipeline` with a\n:class:`~sklearn.compose.ColumnTransformer` and a classifier,\n:class:`~sklearn.linear_model.LogisticRegression`, and displays its visual\nrepresentation.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.impute import SimpleImputer\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.pipeline import Pipeline, make_pipeline\nfrom sklearn.preprocessing import OneHotEncoder, StandardScaler\n\nnumeric_preprocessor = Pipeline(\n steps=[\n (\"imputation_mean\", SimpleImputer(missing_values=np.nan, strategy=\"mean\")),\n (\"scaler\", StandardScaler()),\n ]\n)\n\ncategorical_preprocessor = Pipeline(\n steps=[\n (\n \"imputation_constant\",\n SimpleImputer(fill_value=\"missing\", strategy=\"constant\"),\n ),\n (\"onehot\", OneHotEncoder(handle_unknown=\"ignore\")),\n ]\n)\n\npreprocessor = ColumnTransformer(\n [\n (\"categorical\", categorical_preprocessor, [\"state\", \"gender\"]),\n (\"numerical\", numeric_preprocessor, [\"age\", \"weight\"]),\n ]\n)\n\npipe = make_pipeline(preprocessor, LogisticRegression(max_iter=500))\npipe # click on the diagram below to see the details of each step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Displaying a Grid Search over a Pipeline with a Classifier\nThis section constructs a :class:`~sklearn.model_selection.GridSearchCV`\nover a :class:`~sklearn.pipeline.Pipeline` with\n:class:`~sklearn.ensemble.RandomForestClassifier` and displays its visual\nrepresentation.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.impute import SimpleImputer\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.pipeline import Pipeline, make_pipeline\nfrom sklearn.preprocessing import OneHotEncoder, StandardScaler\n\nnumeric_preprocessor = Pipeline(\n steps=[\n (\"imputation_mean\", SimpleImputer(missing_values=np.nan, strategy=\"mean\")),\n (\"scaler\", StandardScaler()),\n ]\n)\n\ncategorical_preprocessor = Pipeline(\n steps=[\n (\n \"imputation_constant\",\n SimpleImputer(fill_value=\"missing\", strategy=\"constant\"),\n ),\n (\"onehot\", OneHotEncoder(handle_unknown=\"ignore\")),\n ]\n)\n\npreprocessor = ColumnTransformer(\n [\n (\"categorical\", categorical_preprocessor, [\"state\", \"gender\"]),\n (\"numerical\", numeric_preprocessor, [\"age\", \"weight\"]),\n ]\n)\n\npipe = Pipeline(\n steps=[(\"preprocessor\", preprocessor), (\"classifier\", RandomForestClassifier())]\n)\n\nparam_grid = {\n \"classifier__n_estimators\": [200, 500],\n \"classifier__max_features\": [\"auto\", \"sqrt\", \"log2\"],\n \"classifier__max_depth\": [4, 5, 6, 7, 8],\n \"classifier__criterion\": [\"gini\", \"entropy\"],\n}\n\ngrid_search = GridSearchCV(pipe, param_grid=param_grid, n_jobs=1)\ngrid_search # click on the diagram below to see the details of each step" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }