{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Course 2 week 1 lecture notebook 01\n",
    "# Create a Linear Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear model using scikit-learn\n",
    "\n",
    "We'll practice using a scikit-learn model for linear regression. You will do something similar in this week's assignment (but with a logistic regression model).\n",
    "\n",
    "[sklearn.linear_model.LinearRegression()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, import `LinearRegression`, which is a Python 'class'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the module 'LinearRegression' from sklearn\n",
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, use the class to create an object of type LinearRegression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an object of type LinearRegression\n",
    "model = LinearRegression()\n",
    "model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generate some data by importing a module 'load_data', which is implemented for you.  The features in `X' are: \n",
    "\n",
    "- Age: (years)\n",
    "- Systolic_BP: Systolic blood pressure (mmHg)\n",
    "- Diastolic_BP: Diastolic blood pressure (mmHg)\n",
    "- Cholesterol: (mg/DL)\n",
    "\n",
    "The labels in `y` indicate whether the patient has a disease (diabetic retinopathy).\n",
    "- y = 1 : patient has retinopathy.\n",
    "- y = 0 : patient does not have retinopathy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the load_data function from the utils module\n",
    "from utils import load_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate features and labels using the imported function\n",
    "X, y = load_data(100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explore the data by viewing the features and the labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# View the features\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a histogram of the Age feature\n",
    "X['Age'].hist();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a histogram of the systolic blood pressure feature\n",
    "X['Systolic_BP'].hist();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a histogram of the diastolic blood pressure feature\n",
    "X['Diastolic_BP'].hist();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a histogram of the cholesterol feature\n",
    "X['Cholesterol'].hist();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also take a look at the labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# View a few values of the labels\n",
    "y.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a histogram of the labels\n",
    "y.hist();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fit the LinearRegression using the features in `X` and the labels in `y`.  To \"fit\" the model is another way of saying that we are training the model on the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fit the linear regression model\n",
    "model.fit(X, y)\n",
    "model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- View the coefficients of the trained model.\n",
    "- The coefficients are the 'weights' or $\\beta$s associated with each feature\n",
    "- You'll use the coefficients for making predictions.\n",
    "$$\\hat{y} = \\beta_1x_1 + \\beta_2x_2 + ... \\beta_N x_N$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# View the coefficients of the model\n",
    "model.coef_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the assignment, you will do something similar, but using a logistic regression, so that the output of the prediction will be bounded between 0 and 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### This is the end of this practice section.\n",
    "\n",
    "Please continue on with the lecture videos!\n",
    "\n",
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}