{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Non-negative least squares\n\nIn this example, we fit a linear model with positive constraints on the\nregression coefficients and compare the estimated coefficients to a classic\nlinear regression.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom sklearn.metrics import r2_score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate some random data\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.random.seed(42)\n\nn_samples, n_features = 200, 50\nX = np.random.randn(n_samples, n_features)\ntrue_coef = 3 * np.random.randn(n_features)\n# Threshold coefficients to render them non-negative\ntrue_coef[true_coef < 0] = 0\ny = np.dot(X, true_coef)\n\n# Add some noise\ny += 5 * np.random.normal(size=(n_samples,))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Split the data in train set and test set\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fit the Non-Negative least squares.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.linear_model import LinearRegression\n\nreg_nnls = LinearRegression(positive=True)\ny_pred_nnls = reg_nnls.fit(X_train, y_train).predict(X_test)\nr2_score_nnls = r2_score(y_test, y_pred_nnls)\nprint(\"NNLS R2 score\", r2_score_nnls)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fit an OLS.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "reg_ols = LinearRegression()\ny_pred_ols = reg_ols.fit(X_train, y_train).predict(X_test)\nr2_score_ols = r2_score(y_test, y_pred_ols)\nprint(\"OLS R2 score\", r2_score_ols)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing the regression coefficients between OLS and NNLS, we can observe\nthey are highly correlated (the dashed line is the identity relation),\nbut the non-negative constraint shrinks some to 0.\nThe Non-Negative Least squares inherently yield sparse results.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, ax = plt.subplots()\nax.plot(reg_ols.coef_, reg_nnls.coef_, linewidth=0, marker=\".\")\n\nlow_x, high_x = ax.get_xlim()\nlow_y, high_y = ax.get_ylim()\nlow = max(low_x, low_y)\nhigh = min(high_x, high_y)\nax.plot([low, high], [low, high], ls=\"--\", c=\".3\", alpha=0.5)\nax.set_xlabel(\"OLS regression coefficients\", fontweight=\"bold\")\nax.set_ylabel(\"NNLS regression coefficients\", fontweight=\"bold\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }