{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Joint feature selection with multi-task Lasso\n\nThe multi-task lasso allows to fit multiple regression problems\njointly enforcing the selected features to be the same across\ntasks. This example simulates sequential measurements, each task\nis a time instant, and the relevant features vary in amplitude\nover time while being the same. The multi-task lasso imposes that\nfeatures that are selected at one time point are select for all time\npoint. This makes feature selection by the Lasso more stable.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate data\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n\nrng = np.random.RandomState(42)\n\n# Generate some 2D coefficients with sine waves with random frequency and phase\nn_samples, n_features, n_tasks = 100, 30, 40\nn_relevant_features = 5\ncoef = np.zeros((n_tasks, n_features))\ntimes = np.linspace(0, 2 * np.pi, n_tasks)\nfor k in range(n_relevant_features):\n coef[:, k] = np.sin((1.0 + rng.randn(1)) * times + 3 * rng.randn(1))\n\nX = rng.randn(n_samples, n_features)\nY = np.dot(X, coef.T) + rng.randn(n_samples, n_tasks)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fit models\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.linear_model import Lasso, MultiTaskLasso\n\ncoef_lasso_ = np.array([Lasso(alpha=0.5).fit(X, y).coef_ for y in Y.T])\ncoef_multi_task_lasso_ = MultiTaskLasso(alpha=1.0).fit(X, Y).coef_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot support and time series\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nfig = plt.figure(figsize=(8, 5))\nplt.subplot(1, 2, 1)\nplt.spy(coef_lasso_)\nplt.xlabel(\"Feature\")\nplt.ylabel(\"Time (or Task)\")\nplt.text(10, 5, \"Lasso\")\nplt.subplot(1, 2, 2)\nplt.spy(coef_multi_task_lasso_)\nplt.xlabel(\"Feature\")\nplt.ylabel(\"Time (or Task)\")\nplt.text(10, 5, \"MultiTaskLasso\")\nfig.suptitle(\"Coefficient non-zero location\")\n\nfeature_to_plot = 0\nplt.figure()\nlw = 2\nplt.plot(coef[:, feature_to_plot], color=\"seagreen\", linewidth=lw, label=\"Ground truth\")\nplt.plot(\n coef_lasso_[:, feature_to_plot], color=\"cornflowerblue\", linewidth=lw, label=\"Lasso\"\n)\nplt.plot(\n coef_multi_task_lasso_[:, feature_to_plot],\n color=\"gold\",\n linewidth=lw,\n label=\"MultiTaskLasso\",\n)\nplt.legend(loc=\"upper center\")\nplt.axis(\"tight\")\nplt.ylim([-1.1, 1.1])\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }