{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%html\n",
    "<!-- The customized css for the slides -->\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../styles/python-programming-introduction.css\"/>\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../styles/basic.css\"/>\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../../assets/styles/basic.css\" />"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Install the necessary dependencies\n",
    "\n",
    "import os\n",
    "import sys\n",
    "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "source": [
    "---\n",
    "license:\n",
    "    code: MIT\n",
    "    content: CC-BY-4.0\n",
    "github: https://github.com/ocademy-ai/machine-learning\n",
    "venue: By Ocademy\n",
    "open_access: true\n",
    "bibliography:\n",
    "  - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Linear Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## What is Linear Regression?\n",
    "\n",
    "Finding a straight line of best fit through the data. This works well when the true underlying function is linear.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Example\n",
    "\n",
    "We use features $\\mathbf{x}$ to predict a \"response\" $y$. For example we might want to regress `num_hours_studied` onto `exam_score` - in other words we predict exam score from number of hours studied.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Let's generate some example data for this case and examine the relationship between $\\mathbf{x}$ and $y$.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "num_hours_studied = np.array([1, 3, 3, 4, 5, 6, 7, 7, 8, 8, 10])\n",
    "exam_score = np.array([18, 26, 31, 40, 55, 62, 71, 70, 75, 85, 97])\n",
    "plt.scatter(num_hours_studied, exam_score)\n",
    "plt.xlabel('num_hours_studied')\n",
    "plt.ylabel('exam_score')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We can see the this is nearly a straight line. We suspect with such a high linear correlation that linear regression will be a successful technique for this task."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We will now build a linear model to fit this data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Linear Model\n",
    "\n",
    "#### Hypothesis\n",
    "\n",
    "A linear model makes a \"hypothesis\" about the true nature of the underlying function - that it is linear. We express this hypothesis in the univariate case as\n",
    "\n",
    "$$h_\\theta(x) = ax + b$$\n",
    "\n",
    "Our simple example above was an example of \"univariate regression\" - i.e. just one variable (or \"feature\") - number of hours studied. Below we will have more than one feature (\"multivariate regression\") which is given by\n",
    "\n",
    "$$h_\\theta(\\mathbf{x}) = \\mathbf{a}^\\top \\mathbf{X}$$\n",
    "\n",
    "Here $\\mathbf{a}$ is a vector of learned parameters, and $\\mathbf{X}$ is the \"design matrix\" with all the data points. In this formulation the intercept term has been added to the design matrix as the first column (of all ones)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Design Matrix\n",
    "\n",
    "In general with $n$ data points and $p$ features our design matrix will have $n$ rows and $p$ columns.\n",
    "\n",
    "Returning to our exam score regression example, let's add one more feature - number of hours slept the night before the exam. If we have 4 data points and 2 features, then our matrix will be of shape $4 \\times 3$ (remember we add a bias column). It might look like\n",
    "\n",
    "$$\n",
    "\\begin{bmatrix}\n",
    "    1 & 1 & 8 \\\\\n",
    "    1 & 5 & 6 \\\\\n",
    "    1 & 7 & 6 \\\\\n",
    "    1 & 8 & 4 \\\\\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "Notice we do **not** include the response (label/target) in the design matrix."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Univariate Example\n",
    "\n",
    "Let's now see what our univariate example looks like"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn import linear_model\n",
    "\n",
    "# Fit the model\n",
    "exam_model = linear_model.LinearRegression(normalize=True)\n",
    "x = np.expand_dims(num_hours_studied, 1)\n",
    "y = exam_score\n",
    "exam_model.fit(x, y)\n",
    "a = exam_model.coef_\n",
    "b = exam_model.intercept_\n",
    "print(exam_model.coef_)\n",
    "print(exam_model.intercept_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Visualize the results\n",
    "plt.scatter(num_hours_studied, exam_score)\n",
    "x = np.linspace(0, 10)\n",
    "y = a * x + b\n",
    "plt.plot(x, y, 'r')\n",
    "plt.xlabel('num_hours_studied')\n",
    "plt.ylabel('exam_score')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The line fits pretty well using the eye, as it should, because the true function is linear and the data has just a little noise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "But we need a mathematical way to define a good fit in order to find the optimal parameters for our hypothesis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### What is a Good Fit?\n",
    "\n",
    "Typically we use \"mean squared error\" to measure the goodness of fit in a regression problem.\n",
    "\n",
    "$$\\text{MSE} = \\frac{1}{n} \\sum_{i=1}^n (y^{(i)} - h_\\theta^{(i)})^2$$\n",
    "\n",
    "You can see that this is measuring how far away each of the real data points are from our predicted point which makes good sense. Here is a visualization\n",
    "\n",
    "![](../images/ml-fundamentals/linear-regression/residual.webp)\n",
    "\n",
    "\n",
    "This function is then taken to be our \"loss\" function - a measure of how badly we are doing. In general we want to minimize this."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Optimization Problem\n",
    "\n",
    "The typical recipe for machine learning algorithms is to define a loss function of the parameters of a hypothesis, then to minimize the loss function. In our case we have the optimization problem\n",
    "\n",
    "$$\\min_{\\mathbf{a}} \\frac{1}{n} \\sum_{i=1}^n (y^{(i)} - \\mathbf{a}^\\top\\mathbf{X}^{(i)})^2$$\n",
    "\n",
    "Note that we have added the bias into the design matrix."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Normal Equations\n",
    "\n",
    "Linear regression actually has a closed-form solution - the normal equation. It is beyond our scope to show the derivation, but here it is:\n",
    "\n",
    "$$\\mathbf{a}^* = (\\mathbf{X}^\\top\\mathbf{X})^{-1}\\mathbf{X}^\\top \\mathbf{y}$$\n",
    "\n",
    "We won't be implementing this equation, but you should know this is what `sklearn.linear_model.LinearRegression` is doing under the hood. We will talk more about optimization in later tutorials, where we have no closed-form solution."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Normalization\n",
    "\n",
    "It is generally a good idea to normalize all the values in the design matrix. This means all values should be in the range $(0, 1)$ and centered around zero.\n",
    "\n",
    "![](../images/ml-fundamentals/linear-regression/normalization.jpeg)\n",
    "\n",
    "Normalization usually helps the learning algorithm perform better."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Example 1\n",
    "\n",
    "Simple Linear Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "dataset = pd.read_csv('../../assets/data/Salary_Data.csv')\n",
    "X = dataset.iloc[:, :-1].values\n",
    "y = dataset.iloc[:, -1].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Splitting the dataset into the Training set and Test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Training the Simple Linear Regression model on the Training set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "regressor = LinearRegression()\n",
    "regressor.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Predicting the Test set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "y_pred = regressor.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising the Training set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "plt.scatter(X_train, y_train, color = 'red')\n",
    "plt.plot(X_train, regressor.predict(X_train), color = 'blue')\n",
    "plt.title('Salary vs Experience (Training set)')\n",
    "plt.xlabel('Years of Experience')\n",
    "plt.ylabel('Salary')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising the Test set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "plt.scatter(X_test, y_test, color = 'red')\n",
    "plt.plot(X_train, regressor.predict(X_train), color = 'blue')\n",
    "plt.title('Salary vs Experience (Test set)')\n",
    "plt.xlabel('Years of Experience')\n",
    "plt.ylabel('Salary')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Example 2\n",
    "\n",
    "Multiple Linear Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "dataset = pd.read_csv('../../assets/data/50_Startups.csv')\n",
    "X = dataset.iloc[:, :-1].values\n",
    "y = dataset.iloc[:, -1].values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "print(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Encoding categorical data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.compose import ColumnTransformer\n",
    "from sklearn.preprocessing import OneHotEncoder\n",
    "ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')\n",
    "X = np.array(ct.fit_transform(X))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "print(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Splitting the dataset into the Training set and Test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Training the Multiple Linear Regression model on the Training set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "regressor = LinearRegression()\n",
    "regressor.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Predicting the Test set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "y_pred = regressor.predict(X_test)\n",
    "np.set_printoptions(precision=2)\n",
    "print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Example 3\n",
    "\n",
    "Polynomial Linear Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "dataset = pd.read_csv('../../assets/data/Position_Salaries.csv')\n",
    "X = dataset.iloc[:, 1:-1].values\n",
    "y = dataset.iloc[:, -1].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Training the Linear Regression model on the whole dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "lin_reg = LinearRegression()\n",
    "lin_reg.fit(X, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Training the Polynomial Regression model on the whole dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import PolynomialFeatures\n",
    "poly_transformer = PolynomialFeatures(degree = 4)\n",
    "X_poly = poly_transformer.fit_transform(X)\n",
    "lin_reg_2 = LinearRegression()\n",
    "lin_reg_2.fit(X_poly, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising the Polynomial Linear Regression results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "plt.scatter(X, y, color = 'red')\n",
    "plt.plot(X, lin_reg.predict(X), color = 'blue')\n",
    "plt.title('Truth or Bluff (Linear Regression)')\n",
    "plt.xlabel('Position Level')\n",
    "plt.ylabel('Salary')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Implementation of Linear Regression for scratch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "class MyOwnLinearRegression:\n",
    "    def __init__(self, learning_rate=0.0001, n_iters=30000):\n",
    "        self.lr = learning_rate\n",
    "        self.n_iters = n_iters\n",
    "        self.weights = None\n",
    "        self.bias = None\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        n_samples, n_features = X.shape\n",
    "\n",
    "        # init parameters\n",
    "        self.weights = np.zeros(n_features)\n",
    "        self.bias = 0\n",
    "\n",
    "        # gradient descent\n",
    "        for _ in range(self.n_iters):\n",
    "            # approximate y with linear combination of weights and x, plus bias\n",
    "            y_predicted = np.dot(X, self.weights) + self.bias\n",
    "\n",
    "            # compute gradients\n",
    "            dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))\n",
    "            db = (1 / n_samples) * np.sum(y_predicted - y)\n",
    "            # update parameters\n",
    "            self.weights  = self.weights - self.lr * dw\n",
    "            self.bias  = self.bias -  self.lr * db\n",
    "\n",
    "    def predict(self, X):\n",
    "        y_predicted = np.dot(X, self.weights) + self.bias\n",
    "        return y_predicted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Traning model with our own Linear Regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "dataset = pd.read_csv('../../assets/data/Salary_Data.csv')\n",
    "X = dataset.iloc[:, :-1].values\n",
    "y = dataset.iloc[:, -1].values\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)\n",
    "regressor = MyOwnLinearRegression()\n",
    "regressor.fit(X_train, y_train)\n",
    "y_pred = regressor.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising the Training set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "plt.scatter(X_train, y_train, color = 'red')\n",
    "plt.plot(X_train, regressor.predict(X_train), color = 'blue')\n",
    "plt.title('Salary vs Experience (Training set)')\n",
    "plt.xlabel('Years of Experience')\n",
    "plt.ylabel('Salary')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Acknowledgments\n",
    "\n",
    "Thanks to TIM NIVEN for creating the open-source [Kaggle jupyter notebook](https://www.kaggle.com/code/timniven/linear-regression-tutorial), licensed under Apache 2.0. It inspires the majority of the content of this assignment."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}