{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Ordinary Least Squares and Ridge Regression\n\n1. Ordinary Least Squares:\n   We illustrate how to use the ordinary least squares (OLS) model,\n   :class:`~sklearn.linear_model.LinearRegression`, on a single feature of\n   the diabetes dataset. We train on a subset of the data, evaluate on a\n   test set, and visualize the predictions.\n\n2. Ordinary Least Squares and Ridge Regression Variance:\n   We then show how OLS can have high variance when the data is sparse or\n   noisy, by fitting on a very small synthetic sample repeatedly. Ridge\n   regression, :class:`~sklearn.linear_model.Ridge`, reduces this variance\n   by penalizing (shrinking) the coefficients, leading to more stable\n   predictions.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data Loading and Preparation\n\nLoad the diabetes dataset. For simplicity, we only keep a single feature in the data.\nThen, we split the data and target into training and test sets.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\n\nX, y = load_diabetes(return_X_y=True)\nX = X[:, [2]]  # Use only one feature\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20, shuffle=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Linear regression model\n\nWe create a linear regression model and fit it on the training data. Note that by\ndefault, an intercept is added to the model. We can control this behavior by setting\nthe `fit_intercept` parameter.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.linear_model import LinearRegression\n\nregressor = LinearRegression().fit(X_train, y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Model evaluation\n\nWe evaluate the model's performance on the test set using the mean squared error\nand the coefficient of determination.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.metrics import mean_squared_error, r2_score\n\ny_pred = regressor.predict(X_test)\n\nprint(f\"Mean squared error: {mean_squared_error(y_test, y_pred):.2f}\")\nprint(f\"Coefficient of determination: {r2_score(y_test, y_pred):.2f}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Plotting the results\n\nFinally, we visualize the results on the train and test data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n\nfig, ax = plt.subplots(ncols=2, figsize=(10, 5), sharex=True, sharey=True)\n\nax[0].scatter(X_train, y_train, label=\"Train data points\")\nax[0].plot(\n    X_train,\n    regressor.predict(X_train),\n    linewidth=3,\n    color=\"tab:orange\",\n    label=\"Model predictions\",\n)\nax[0].set(xlabel=\"Feature\", ylabel=\"Target\", title=\"Train set\")\nax[0].legend()\n\nax[1].scatter(X_test, y_test, label=\"Test data points\")\nax[1].plot(X_test, y_pred, linewidth=3, color=\"tab:orange\", label=\"Model predictions\")\nax[1].set(xlabel=\"Feature\", ylabel=\"Target\", title=\"Test set\")\nax[1].legend()\n\nfig.suptitle(\"Linear Regression\")\n\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "OLS on this single-feature subset learns a linear function that minimizes\nthe mean squared error on the training data. We can see how well (or poorly)\nit generalizes by looking at the R^2 score and mean squared error on the\ntest set. In higher dimensions, pure OLS often overfits, especially if the\ndata is noisy. Regularization techniques (like Ridge or Lasso) can help\nreduce that.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Ordinary Least Squares and Ridge Regression Variance\n\nNext, we illustrate the problem of high variance more clearly by using\na tiny synthetic dataset. We sample only two data points, then repeatedly\nadd small Gaussian noise to them and refit both OLS and Ridge. We plot\neach new line to see how much OLS can jump around, whereas Ridge remains\nmore stable thanks to its penalty term.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\nimport numpy as np\n\nfrom sklearn import linear_model\n\nX_train = np.c_[0.5, 1].T\ny_train = [0.5, 1]\nX_test = np.c_[0, 2].T\n\nnp.random.seed(0)\n\nclassifiers = dict(\n    ols=linear_model.LinearRegression(), ridge=linear_model.Ridge(alpha=0.1)\n)\n\nfor name, clf in classifiers.items():\n    fig, ax = plt.subplots(figsize=(4, 3))\n\n    for _ in range(6):\n        this_X = 0.1 * np.random.normal(size=(2, 1)) + X_train\n        clf.fit(this_X, y_train)\n\n        ax.plot(X_test, clf.predict(X_test), color=\"gray\")\n        ax.scatter(this_X, y_train, s=3, c=\"gray\", marker=\"o\", zorder=10)\n\n    clf.fit(X_train, y_train)\n    ax.plot(X_test, clf.predict(X_test), linewidth=2, color=\"blue\")\n    ax.scatter(X_train, y_train, s=30, c=\"red\", marker=\"+\", zorder=10)\n\n    ax.set_title(name)\n    ax.set_xlim(0, 2)\n    ax.set_ylim((0, 1.6))\n    ax.set_xlabel(\"X\")\n    ax.set_ylabel(\"y\")\n\n    fig.tight_layout()\n\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Conclusion\n\n- In the first example, we applied OLS to a real dataset, showing\n  how a plain linear model can fit the data by minimizing the squared error\n  on the training set.\n\n- In the second example, OLS lines varied drastically each time noise\n  was added, reflecting its high variance when data is sparse or noisy. By\n  contrast, **Ridge** regression introduces a regularization term that shrinks\n  the coefficients, stabilizing predictions.\n\nTechniques like :class:`~sklearn.linear_model.Ridge` or\n:class:`~sklearn.linear_model.Lasso` (which applies an L1 penalty) are both\ncommon ways to improve generalization and reduce overfitting. A well-tuned\nRidge or Lasso often outperforms pure OLS when features are correlated, data\nis noisy, or sample size is small.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.18"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}