{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression (From Scratch)\n", "In the following notebook, my goal is to demonstrate an understanding of linear regression by building a linear regression model from scratch.\n", "\n", "## What is Linear Regression?\n", "Linear regression is a statistical method used to model the relationship between a dependent variable $Y$ and one or more independent variables $X$. It assumes a linear relationship between the variables, where the dependent variable can be expressed as a linear function of the independent variables plus an error term.\n", "\n", "For a simple linear regression with one independent variable $X$, the model is:\n", "$Y=β_0 + β_1X+ϵ$\n", "\n", "For multiple linear regression with multiple independent variables $X_1, X_2, ..., X_p$, the model is:\n", "$Y=β_0+β_1X_1+β_2X_2+...+β_pX_p+ϵ$\n", "\n", "where:\n", "- $Y$ is the dependent variable,\n", "- $X$ is the independent variable,\n", "- $β_0$ is the intercept,\n", "- $β_n$ is the coefficient (contributes to slope)\n", "- $ϵ$ is the error term" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "import numpy as np # linear algebra\n", "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n", "from sklearn.datasets import load_diabetes\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 188, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | age | \n", "sex | \n", "bmi | \n", "bp | \n", "s1 | \n", "s2 | \n", "s3 | \n", "s4 | \n", "s5 | \n", "s6 | \n", "target | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "59.0 | \n", "2.0 | \n", "32.1 | \n", "101.00 | \n", "157.0 | \n", "93.2 | \n", "38.0 | \n", "4.00 | \n", "4.8598 | \n", "87.0 | \n", "151.0 | \n", "
| 1 | \n", "48.0 | \n", "1.0 | \n", "21.6 | \n", "87.00 | \n", "183.0 | \n", "103.2 | \n", "70.0 | \n", "3.00 | \n", "3.8918 | \n", "69.0 | \n", "75.0 | \n", "
| 2 | \n", "72.0 | \n", "2.0 | \n", "30.5 | \n", "93.00 | \n", "156.0 | \n", "93.6 | \n", "41.0 | \n", "4.00 | \n", "4.6728 | \n", "85.0 | \n", "141.0 | \n", "
| 3 | \n", "24.0 | \n", "1.0 | \n", "25.3 | \n", "84.00 | \n", "198.0 | \n", "131.4 | \n", "40.0 | \n", "5.00 | \n", "4.8903 | \n", "89.0 | \n", "206.0 | \n", "
| 4 | \n", "50.0 | \n", "1.0 | \n", "23.0 | \n", "101.00 | \n", "192.0 | \n", "125.4 | \n", "52.0 | \n", "4.00 | \n", "4.2905 | \n", "80.0 | \n", "135.0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 437 | \n", "60.0 | \n", "2.0 | \n", "28.2 | \n", "112.00 | \n", "185.0 | \n", "113.8 | \n", "42.0 | \n", "4.00 | \n", "4.9836 | \n", "93.0 | \n", "178.0 | \n", "
| 438 | \n", "47.0 | \n", "2.0 | \n", "24.9 | \n", "75.00 | \n", "225.0 | \n", "166.0 | \n", "42.0 | \n", "5.00 | \n", "4.4427 | \n", "102.0 | \n", "104.0 | \n", "
| 439 | \n", "60.0 | \n", "2.0 | \n", "24.9 | \n", "99.67 | \n", "162.0 | \n", "106.6 | \n", "43.0 | \n", "3.77 | \n", "4.1271 | \n", "95.0 | \n", "132.0 | \n", "
| 440 | \n", "36.0 | \n", "1.0 | \n", "30.0 | \n", "95.00 | \n", "201.0 | \n", "125.2 | \n", "42.0 | \n", "4.79 | \n", "5.1299 | \n", "85.0 | \n", "220.0 | \n", "
| 441 | \n", "36.0 | \n", "1.0 | \n", "19.6 | \n", "71.00 | \n", "250.0 | \n", "133.2 | \n", "97.0 | \n", "3.00 | \n", "4.5951 | \n", "92.0 | \n", "57.0 | \n", "
442 rows × 11 columns
\n", "