{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Regularization path of L1- Logistic Regression\n\n\nTrain l1-penalized logistic regression models on a binary classification\nproblem derived from the Iris dataset.\n\nThe models are ordered from strongest regularized to least regularized. The 4\ncoefficients of the models are collected and plotted as a \"regularization\npath\": on the left-hand side of the figure (strong regularizers), all the\ncoefficients are exactly 0. When regularization gets progressively looser,\ncoefficients can get non-zero values one after the other.\n\nHere we choose the liblinear solver because it can efficiently optimize for the\nLogistic Regression loss with a non-smooth, sparsity inducing l1 penalty.\n\nAlso note that we set a low value for the tolerance to make sure that the model\nhas converged before collecting the coefficients.\n\nWe also use warm_start=True which means that the coefficients of the models are\nreused to initialize the next model fit to speed-up the computation of the\nfull-path.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn import datasets\n\niris = datasets.load_iris()\nX = iris.data\ny = iris.target\n\nX = X[y != 2]\ny = y[y != 2]\n\nX /= X.max() # Normalize X to speed-up convergence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute regularization path\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n\nfrom sklearn import linear_model\nfrom sklearn.svm import l1_min_c\n\ncs = l1_min_c(X, y, loss=\"log\") * np.logspace(0, 10, 16)\n\nclf = linear_model.LogisticRegression(\n penalty=\"l1\",\n solver=\"liblinear\",\n tol=1e-6,\n max_iter=int(1e6),\n warm_start=True,\n intercept_scaling=10000.0,\n)\ncoefs_ = []\nfor c in cs:\n clf.set_params(C=c)\n clf.fit(X, y)\n coefs_.append(clf.coef_.ravel().copy())\n\ncoefs_ = np.array(coefs_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot regularization path\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nplt.plot(np.log10(cs), coefs_, marker=\"o\")\nymin, ymax = plt.ylim()\nplt.xlabel(\"log(C)\")\nplt.ylabel(\"Coefficients\")\nplt.title(\"Logistic Regression Path\")\nplt.axis(\"tight\")\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }