{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "committed-protest",
   "metadata": {},
   "source": [
    "# Lecture 3.3: Logistic Regression\n",
    "\n",
    "Part of the codes and demonstrations by courtesy of [Sunil Kumar Dash](https://www.analyticsvidhya.com/blog/2022/02/implementing-logistic-regression-from-scratch-using-python/)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "universal-steering",
   "metadata": {},
   "source": [
    "##  1. Mathematical Formulation\n",
    "\n",
    "In the last lecture, we introduced the logistic regression, which tries to use $\\textbf{x}$ to predict the binary label $y \\in \\{-1, +1\\}$. The optimization problem is:\n",
    "$$(\\mathbf{w}^\\star,b^\\star) = \\mathop{\\mathrm{argmin}}_{\\mathbf{w},b} \\frac{1}{C} \\mathbf{w}^T\\mathbf{w} + \\sum_{i=1}^M \\log (1+\\exp(-y^{(i)} (\\mathbf{w}^T \\textbf{x}^{(i)}+b)))$$\n",
    "\n",
    "with the logistic loss function: \n",
    "$$L(z^{(i)}) = \\log (1+\\exp(-z^{(i)})),$$\n",
    "where $z^{(i)}=y^{(i)}(\\mathbf{w}^T \\textbf{x}^{(i)} + b)$.\n",
    "\n",
    "While $L(z^{(i)})$ is of mathematical interest for loss comparison, it is not often used in practice. Instead, the cross-entropy loss\n",
    "$$\\ell_\\mathrm{CE} = \\sum_{i=1}^M -y^{(i)}\\log\\hat{y}^{(i)} - (1-y^{(i)})\\log(1-\\hat{y}^{(i)})$$\n",
    "\n",
    "is often used, due to the convinence it brings to calculate both $\\nabla \\ell_\\mathrm{CE}$ and $\\nabla^2 \\ell_\\mathrm{CE}$, which are often needed for convergence analysis. Here $\\hat{y}^{(i)}$ is the predicted value of $y$. Below we will give a simple derivation and implementation. For a more detailed mathematical treatment, please refer to Section 3.2 of the lecture note.\n",
    "\n",
    "\n",
    "### 1.1 New (but Equivalent) Setting\n",
    "\n",
    "Firstly, we call our 2 classes the positive class $y^{(i)}=1$ and the negative class $y^{(i)} = 0$, i.e., $y^{(i)}\\in \\{0, 1\\}$ (remember we define the labels as $y^{(i)}=\\pm1$ in lecture).\n",
    "\n",
    "Recall the prediction after the sigmoid activation function $\\sigma$:\n",
    "\n",
    "$$ \\hat{y} =\\sigma(\\mathbf{w}^T \\textbf{x} + b) = \\sigma(z). $$\n",
    "\n",
    "Here, we \"abuse\" the mathematical notation $z$ to indicate $\\mathbf{w}^T \\textbf{x} + b$ not $y(\\mathbf{w}^T \\textbf{x} + b)$ as done in previous lectures. The sigmoid function $\\sigma$, sometimes called a squashing function or a *logistic* function - thus the name logistic regression - maps a real-valued input to the range 0 to 1. Indeed, the logistic function $\\sigma(z)$ is a good choice since it has the form of a probability, i.e. $\\sigma(-z)=1-\\sigma(z)$ and $\\sigma(z)\\in (0,1)$ as $z\\rightarrow \\pm \\infty$. If we pick the labels $y\\in\\{0,1\\}$, we may assign  \n",
    "\n",
    "\\begin{equation}\n",
    "\\begin{aligned}\n",
    "p(y=1|z) & =\\sigma(z)=\\frac{1}{1+e^{-z}}\\\\\n",
    "p(y=0|z) & =1-\\sigma(z)=\\frac{1}{1+e^{z}}\\\\\n",
    "\\end{aligned}\n",
    "\\end{equation}\n",
    "\n",
    "which can be written more compactly as $p(y|\\mathbf{x})  =\\sigma(f(\\mathbf{x}))^y(1-\\sigma(f(\\mathbf{x})))^{1-y}$ (equivalent to $p(y|\\textbf{x})= \\sigma(yf(\\textbf{x}))$ under $y=\\pm1$). \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "grave-victorian",
   "metadata": {},
   "source": [
    "### 1.2 Cross-Entropy Loss\n",
    "\n",
    "Since now we're thinking about output probabilities,\n",
    "one natural objective is that we should choose the weights (or parameters $\\textbf{w}, b$ )\n",
    "that give the actual labels in the training data highest probability. So here we give a maximum likelihood estimation. For $M$ samples $\\{\\textbf{x}^{(i)},y^{(i)}\\}$ we want to maximize\n",
    "\n",
    "$$\\max_{\\textbf{w},b} p\\big( y^{(1)},\\dots,y^{(M)} \\big|\\,\\textbf{x}^{(1)},\\dots,\\textbf{x}^{(M)} \\big)$$\n",
    "\n",
    "Because each example is independent of the others, and each label depends only on the features of the corresponding examples, we can rewirte the above as\n",
    "\n",
    "$$\\max_{\\textbf{w},b} \\prod_{i=1}^{M} p(y^{(i)}| \\textbf{x}^{(i)})=\\max_{\\textbf{w},b} p\\big(y^{(1)}|\\textbf{x}^{(1)}\\big) p\\big(y^{(2)}|\\textbf{x}^{(2)}\\big)\\cdots p\\big(y^{(M)}|\\textbf{x}^{(M)}\\big)$$\n",
    "\n",
    "This function is a product over the examples, but in general, it's a lot easier to work with a loss function that breaks down as a sum over the training examples. So we use log function:\n",
    "\n",
    "$$\\max_{\\textbf{w},b} \\log\\big(\\prod_{i=1}^M p(y^{(i)}|\\textbf{x}^{(i)})\\big)= \\sum_{i=1}^M\\log\\big(p(y^{(i)}|\\textbf{x}^{(i)})\\big)=\\log\\big(p(y^{(1)}|\\textbf{x}^{(1)})\\big)+\\cdots+\\log\\big(p(y^{(M)}|\\textbf{x}^{(M)})\\big)$$\n",
    "\n",
    "Because we typically express our objective as a *loss* we can just flip the sign, giving us the *negative log probability:*\n",
    "\n",
    "$$  \\min_{\\textbf{w},b} \\Big(- \\sum_{i=1}^M\\log\\big(p(y^{(i)}|\\textbf{x}^{(i)})\\big)\\Big)$$\n",
    "\n",
    "Recall that we can write $p(y^{(i)}|z^{(i)})$ compactly as\n",
    "\n",
    "$$p(y^{(i)}|z^{(i)}) =\\sigma(z^{(i)})^{y^{(i)}}(1-\\sigma(z^{(i)}))^{1-y^{(i)}},$$\n",
    "\n",
    "where $\\hat{y}^{(i)} = \\sigma(z^{(i)}) = \\sigma(\\textbf{w}^T \\textbf{x}^{(i)} + b)$ . Let us work through this expression. We have\n",
    "\n",
    "\\begin{equation}\n",
    "\\begin{aligned}\n",
    "\\log\\big(p(y|z)\\big)&=\n",
    "\\log\\big(\\sigma(z)^{y}(1-\\sigma(z))^{1-y}\\big)\\\\\n",
    "&=y\\log\\sigma(z) + (1-y)\\log(1-\\sigma(z))\\\\\n",
    "&=y\\log\\hat{y} + (1-y)\\log(1-\\hat{y})\n",
    "\\end{aligned}\n",
    "\\end{equation}\n",
    "\n",
    "Therefore we take the negative of this expression and minimize the objective function \n",
    "\n",
    "$$\\ell_\\mathrm{CE} = \\sum_{i=1}^M -y^{(i)}\\log\\hat{y}^{(i)} - (1-y^{(i)})\\log(1-\\hat{y}^{(i)})$$\n",
    "\n",
    "We find that the loss function is depend on two terms:\n",
    "\n",
    "* $y^{(i)}\\log \\hat{y}^{(i)}$\n",
    "* $(1-y^{(i)})\\log (1-\\hat{y}^{(i)})$\n",
    "\n",
    "But recall that we are intepreting $\\hat{y}^{(i)}=\\sigma(z^{(i)})$ as a probability that $\\textbf{x}^{(i)}$ has a given label, namely $p(y^{(i)}=1|z^{(i)})=\\sigma(z^{(i)})$ and $p(y^{(i)}=0|z^{(i)})=1-\\sigma(z^{(i)})$. Because $y^{(i)}$ only takes values $0$ or $1$, for an given data point, one of these terms disappears. \n",
    "\n",
    "When $y^{(i)}$ is $1$, this loss says that we should maximize $\\log \\hat{y}^{(i)}$, giving higher probability to the *correct* answer. \n",
    "When $y^{(i)}$ is $0$, this loss function takes value $\\log (1-\\hat{y}^{(i)})$. That says that we should maximize the value $1-\\hat{y}$ which we already know is the probability assigned to $\\textbf{x}^{(i)}$ belonging to the negative class.\n",
    "\n",
    "\n",
    "Note that this loss function is commonly called **binary cross entropy**. It is a special case of [cross-entropy](https://en.wikipedia.org/wiki/Cross_entropy), which can apply to the multi-class ($>2$) setting. \n",
    "\n",
    "*If instead we use the labels $y^{(i)}=\\pm1$ like in the last lecture, the loss function has to be modified to $\\log(1+e^{-z})$. This is why there exists two versions of logistic regression. See the lecture note for more information.*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "requested-prototype",
   "metadata": {},
   "source": [
    "# 2. Implementation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "narrative-dealing",
   "metadata": {},
   "source": [
    "Below we will build a logistic regression model from scratch and compare it to the scikit-learn's implementation (`LogisticRegression()`). It will be helpful if you have a prior understanding of matrix algebra and Numpy. Here, we will only deal with Numpy arrays. \n",
    "\n",
    "**Initialization and Dataset.** We will first import the necessary libraries and datasets. We will use sklearn's `make_classification` dataset. We choose only four features and build a binary-classification problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "boring-sphere",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from numpy import log, dot, exp, shape\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from sklearn.datasets import make_classification\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "np.random.seed(1)\n",
    "\n",
    "X,y = make_classification(n_features = 4, n_classes=2)\n",
    "X_tr, X_te,y_tr,y_te = train_test_split(X,y,test_size=0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "limiting-police",
   "metadata": {},
   "source": [
    "It is a good practice to standardize data before feeding it to the algorithm. This is often necessary when attributes are from different scales. Mathematically it is given as $\\textbf{x} \\leftarrow \\frac{\\textbf{x}-\\pmb{\\mu}}{\\pmb{\\gamma}}$, where $\\pmb{\\mu}$ is the mean and $\\pmb{\\gamma}$ is standard deviation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "latest-infrared",
   "metadata": {},
   "outputs": [],
   "source": [
    "def standardize(X):\n",
    "    for i in range(shape(X)[1]):\n",
    "        X[:,i] = (X[:,i] - np.mean(X_tr[:,i]))/np.std(X_tr[:,i])\n",
    "standardize(X_tr)\n",
    "standardize(X_te)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "altered-jacksonville",
   "metadata": {},
   "source": [
    "**Loss Function.** Below is the python implementation of the binary cross entropy loss $\\ell_\\mathrm{CE}$. We ignore the bias term $b$ because it can be integrated into $\\textbf{w}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "american-anger",
   "metadata": {},
   "outputs": [],
   "source": [
    "def cost(params):\n",
    "    z = dot(X,params)\n",
    "    cost0 = y.T.dot(log(self.sigmoid(z)))\n",
    "    cost1 = (1-y).T.dot(log(1-self.sigmoid(z)))\n",
    "    cost = -((cost1 + cost0))/len(y)\n",
    "    return cost"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "lyric-horizontal",
   "metadata": {},
   "source": [
    "**Gradient Descent.** In practice, gradient descent is one of the easiest method to understand and implement. In this work, we use gradient descent to solve the above cross entropy loss. Gradient descent is an optimization algorithm that is responsible for the learning of best-fitting parameters. The gradients are the vector of the 1st order derivative of the loss function. These are the direction of the steepest ascent or maximum of a function. For gradient descent, we move in the opposite direction of the gradients. We will be updating the weights in every iteration until the convergence. \n",
    "The gradient of $\\textbf{w}, b$ are:\n",
    "$$ \\frac{\\partial \\ell_\\mathrm{CE}}{\\partial \\textbf{w}} = \\sum_{i=1}^M (\\hat{y}^{(i)} - y^{(i)})\\cdot \\textbf{x}^{(i)}, \\\\\n",
    "\\frac{\\partial \\ell_\\mathrm{CE}}{\\partial b} = \\sum_{i=1}^M (\\hat{y}^{(i)} - y^{(i)}) $$\n",
    "\n",
    "Here in our implementation of gradient descent, we ignore the bias term $b$ because it can be integrated into $\\textbf{w}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "jewish-numbers",
   "metadata": {},
   "outputs": [],
   "source": [
    "def fit(self,X,y,alpha=0.001,iter=100):\n",
    "    params,X = self.initialize(X)  # intergrate b into w\n",
    "    cost_list = np.zeros(iter,)\n",
    "    for i in range(iter):\n",
    "        params = params - alpha * dot(X.T, self.sigmoid(dot(X,params)) - np.reshape(y,(len(y),1)))\n",
    "        cost_list[i] = cost(params)\n",
    "    self.params = params\n",
    "    return cost_list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "interpreted-involvement",
   "metadata": {},
   "source": [
    "**Prediction.** We trained the model on a training dataset, and we will use the learned parameters to predict the unseen data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "widespread-housing",
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict(self,X):\n",
    "    z = dot(self.initialize(X)[1],self.weights)\n",
    "    lis = []\n",
    "    for i in self.sigmoid(z):\n",
    "        if i>0.5:\n",
    "            lis.append(1)\n",
    "        else:\n",
    "            lis.append(0)\n",
    "    return lis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "italian-preparation",
   "metadata": {},
   "source": [
    "**Error Analysis and Evaluation.**\n",
    "Now that we are done with the prediction, then we will measure how good our model predicts for unseen data. Firstly, we look at the different types of errors that the classifier makes:\n",
    "- _True Positive (TP)_: classifier correctly said positive\n",
    "- _True Negative (TN)_: classifier correctly said negtive\n",
    "- _False Positive (FP)_: classifier said positive, but actually negtive\n",
    "- _False Negative (FN)_: classifier said negtive, but actually positive\n",
    "\n",
    "This is summarized in the following table:\n",
    "\n",
    "<table>\n",
    "<tr><th colspan=2 rowspan=2><th colspan=2 style=\"text-align: center\">Actual</th></tr>\n",
    "<tr>  <th>Positive</th><th>Negtive</th></tr>\n",
    "<tr><th rowspan=2>Prediction</th><th>Positive</th><td>True Positive (TP)</td><td>False Positive (FP)</td></tr>\n",
    "<tr>  <th>Negtive</th><td>False Negative (FN)</td><td>True Negative (TN)</td></tr>\n",
    "</table>\n",
    "\n",
    "Then, we introduce four indices to describe our performance: the accuracy, precision, recall, and F1-score. An intuitive understanding of these indices is listed here:\n",
    "- _Accuracy (Accuracy)_: It tells us how often that our machine learning model correctly predict an outcome out of all the test samples. \n",
    "- _Precision (Precision)_: It measures the proportion of positively predicted labels that are actually correct. \n",
    "- _Recall (Recall)_: It measures the model’s ability to correctly predict the positives out of actual positives.\n",
    "- _F1 Score (F1)_:  It gives equal weight to both the Precision and Recall for measuring the performance, making it an alternative to Accuracy metric (it doesn’t require us to know the total number of samples).\n",
    "\n",
    "Mathematically, 4 metrics can be calculated by:\n",
    "\n",
    "$$ \\mathrm{Accuracy} = \\frac{TP + TN}{TP + FN + TN + FP}, \\\\ \\mathrm{Precision} = \\frac{TP}{FP + TP}, \\\\\n",
    "\\mathrm{Recall} = \\frac{TP}{FN + TP}, \\\\ F1 = \\frac{2 \\cdot \\mathrm{Precision} \\cdot \\mathrm{Recall}}{\\mathrm{Precision} + \\mathrm{Recall}}.$$\n",
    "\n",
    "In this lecture, we give the code to calculate the 4 indices. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "renewable-mirror",
   "metadata": {},
   "outputs": [],
   "source": [
    "def Evaluate(y,y_hat):\n",
    "    tp,tn,fp,fn = 0,0,0,0\n",
    "    for i in range(len(y)):\n",
    "        if y[i] == 1 and y_hat[i] == 1:\n",
    "            tp += 1\n",
    "        elif y[i] == 1 and y_hat[i] == 0:\n",
    "            fn += 1\n",
    "        elif y[i] == 0 and y_hat[i] == 1:\n",
    "            fp += 1\n",
    "        elif y[i] == 0 and y_hat[i] == 0:\n",
    "            tn += 1\n",
    "    precision = tp / (tp+fp)\n",
    "    recall = tp / (tp+fn)\n",
    "    f1_score = 2 * precision * recall / (precision + recall)\n",
    "    acc = (tp+tn)/(tp+tn+fp+fn)\n",
    "    return precision, recall, acc, f1_score"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "established-bronze",
   "metadata": {},
   "source": [
    "**Put All Together.** Now that we are done with every part, we will put everything together in a single class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "preliminary-pharmacy",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "class LogidticRegression:\n",
    "    def sigmoid(self,z):\n",
    "        sig = 1/(1+exp(-z))\n",
    "        return sig\n",
    "    def initialize(self,X):\n",
    "        weights = np.zeros((shape(X)[1]+1,1))\n",
    "        X = np.c_[np.ones((shape(X)[0],1)),X]\n",
    "        return weights,X\n",
    "    def fit(self, X, y, alpha=0.001, iter=400):\n",
    "        weights,X = self.initialize(X)\n",
    "        def cost(theta):\n",
    "            z = dot(X,theta)\n",
    "            cost0 = y.T.dot(log(self.sigmoid(z)))\n",
    "            cost1 = (1-y).T.dot(log(1-self.sigmoid(z)))\n",
    "            cost = -((cost1 + cost0))/len(y)\n",
    "            return cost\n",
    "        cost_list = np.zeros(iter,)\n",
    "        for i in range(iter):\n",
    "            weights = weights - alpha*dot(X.T,self.sigmoid(dot(X,weights))-np.reshape(y,(len(y),1)))\n",
    "            cost_list[i] = cost(weights)\n",
    "        self.weights = weights\n",
    "        return cost_list\n",
    "    def predict(self,X):\n",
    "        z = dot(self.initialize(X)[1],self.weights)\n",
    "        lis = []\n",
    "        for i in self.sigmoid(z):\n",
    "            if i>0.5:\n",
    "                lis.append(1)\n",
    "            else:\n",
    "                lis.append(0)\n",
    "        return lis\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "steady-strength",
   "metadata": {},
   "source": [
    "And then we train and test the model. For simplicity, here we only show the *Accuracy*, and other 3 indices are omitted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "swiss-assessment",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training ACC:  0.9777777777777777\n",
      "Testing ACC:  0.9\n",
      "Need time = 0.013265609741210938\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "standardize(X_tr)\n",
    "standardize(X_te)\n",
    "\n",
    "start = time.time()\n",
    "obj1 = LogidticRegression()\n",
    "cost_list = obj1.fit(X_tr,y_tr)\n",
    "y_pred = obj1.predict(X_te)\n",
    "end = time.time()\n",
    "y_train = obj1.predict(X_tr)\n",
    "#Let's see the f1-score for training and testing data\n",
    "_, _, acc_tr, _ = Evaluate(y_tr,y_train)\n",
    "_, _, acc_te, _ = Evaluate(y_te,y_pred)\n",
    "print('Training ACC:  ' + str(acc_tr))\n",
    "print('Testing ACC:  ' + str(acc_te))\n",
    "print(\"Need time =\", str(end-start))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "funky-seminar",
   "metadata": {},
   "source": [
    "Now let’s see if our loss function is descending or not:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "patient-harbor",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEICAYAAABPgw/pAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAZFUlEQVR4nO3dfZQddX3H8feHzQIrtgbJas3ykKgYxVrAbvEJNYo2AbRJK1ZQq231UNrSVqtoUluPj4UeelpsC+VwqLVHWsEqjVGi0WpRq7aySARiCEYEsomaBVkV2JpN8u0fM4uTm3t35+7OfZiZz+uce7LzcO/8frt3P/vLd34zVxGBmZmV32G9boCZmRXDgW5mVhEOdDOzinCgm5lVhAPdzKwiHOhmZhXhQLd5kTQk6ZOSfiTp37t87K2SVnbzmO0cX9KNkt6Y87WeL2l7UW2bD0l/JunqXrbBiuFALzlJd0t6SQ8OfQ7weOCYiHhlpw4i6UOS3pddFxFPj4gbO3XMuWSPL+ldkq5ZwGt9OSJWzCx3+ucpaaWk8YY2/GVE5PoDZP3NgW7zdQJwZ0Ts63VDLKGEf6frLCL8KPEDuBt4SZP1RwCXAbvTx2XAEem2JcCngEngh8CXgcPSbW8HdgE/AbYDZzR57XcDe4Fp4EHgDcC7gGsy+ywDAliULt8IvBf4SvranwWWZPY/Hfhq2qadwG8D56fH2Jse55ONfZ6jnyuBceAtwB7ge8DvtPg+vgi4LbP8n8DXM8v/DazNHh9Y3fB9+GaevjYcdyUwnn79YeAAMJW+3tvS9c/OfG++CazMPP9G4P3psaaAJwO/A2xLj30X8Hvpvkel+xxIX/9BYGmTn92vAVvT490IPK3h/fZW4FbgR8B1wJFzva/86FIe9LoBfizwB9g60N8D/A/wOGA4DYT3ptsuBq4EBtPH8wEBK0jCdGm63zLgSS2O2xgCjcvLODTQvwM8BRhKly9Jtx2fhs95aXuOAU5Jt30IeF+rPs/Rz5XAvnSfQeAs4GHg6Cb9OTINuyXAIuD7JH8gfi5t7xRJeanx+Af1e66+NjnuStJAb/bzBEaA+9O2Hwa8NF0ezhzrXuDpabsHgbOBJ6U/0xemfX5ms+M19iFt80PpcQaBtwE7gMMz7fs6yR+Cx5L84bhgtvdVr39H6vTwf8+q6zXAeyJiT0RMkIyqfyvdNg08ATghIqYjqeMGsJ9kxHuSpMGIuDsivlNgm/45Iu6MiCngo8Apmbb+Z0R8JG3P/RGxJedrztZPSPr6nvR1N5GMSlc0vkhE/B8wBrwAGCUZgf438DySEfK3I+L+AvrartcCmyJiU0QciIjPpe08K7PPhyJia0TsS/t5Q0R8JxJfJPkfwvNzHu9VwA0R8bmImAb+muSP0nMz+/xdROyOiB8Cn8z0rdX7yrrEgV5dS4F7Msv3pOsALiUZdX1W0l2S1gFExA7gTSQjtj2SrpW0lOJ8P/P1w8Cj06+PIxnRzsds/QS4Pw6u82eP2+iLJCPYF6Rf30gywn1hutyOVn1t1wnAKyVNzjxIylNPyOyzM/sESWdK+h9JP0z3P4vkfx55HPT9jIgD6euPZPZp1bem7yvrHgd6de0mCYMZx6friIifRMRbIuKJwMuBP5V0Rrrt3yLi9PS5AfxVzuM9BDwqs/wLbbR1J0mJoJm5Rngt+zkPjYH+ReYO9KJHoI2vtxP4cEQszjyOiohLmj1H0hHAx0lG1o+PiMXAJpLyS572HvT9lCSSP7i75mz4LO8r6w4HejUMSjoy81gEfAT4c0nDkpYA7wSuAZD0MklPTn9Zf0xSatkvaYWkF6eh8H8kdeP9OduwBXiBpOMlPQZY30b7/xV4iaTflLRI0jGSTkm3/QB44izPbdnPefgqSTnmNJIToltJwu1ZwJdaPOcHwLICZ5c09vca4OWSVkkaSH++KyUd2+L5h5OUzSaAfZLOBH614fWPSX9GzXwUOFvSGZIGSU4o/5TkezOrVu+ruZ5nxXGgV8MmkvCdebwLeB9JrfVW4DbgG+k6gBNJZnE8CHwNuCKSedVHAJcA95H8t/pxwJ/laUBa270uPd7NJLMdcomIe0nKAm8hmR2xBTg53fxPJDX9SUkbmjx9tn62JSIeSp+/NSL2pqu/BtwTEXtaPG3moqr7JX1jPsdtcDHJH6hJSW+NiJ3AGpKfwwTJiP0iWvzuRsRPgD8mCeYHgFcDGzPb7yD5I3hXeoylDc/fTlK3/3uS98HLgZdnvh+zafW+QtKnJeV6L9n8yecszMyqwSN0M7OKcKCbmVWEA93MrCIc6GZmFbGoVwdesmRJLFu2rFeHNzMrpZtvvvm+iBhutq1ngb5s2TLGxsZ6dXgzs1KSdE+rbS65mJlVhAPdzKwiHOhmZhXhQDczqwgHuplZRfRslst8bLhlF5du3s7uySmWLh7iolUrWHvqyNxPNDOrgdIE+oZbdrH++tuYmk7uxrlrcor1198G4FA3M6NEJZdLN29/JMxnTE3v59LN23vUIjOz/lKaQN89OdXWejOzuilNoC9dPNTWejOzuilNoF+0agVDgwMHrRsaHOCiVYd8gLuZWS2V5qTozIlPz3IxM2uuNIEOSag7wM3MmitNycXMzGbnQDczq4hSlVyyfNWomdnBShnovmrUzOxQpSy5+KpRM7NDlTLQfdWomdmhShnovmrUzOxQpQx0XzVqZnaoUp4U9VWjZmaHKmWgg68aNTNrVMqSi5mZHaq0I/RGvtDIzOquEoHuC43MzCpScvGFRmZmFQl0X2hkZlaRQPeFRmZmFQl0X2hkZlaRk6K+0MjMLGegS1oNfAAYAK6OiEua7LMSuAwYBO6LiBcW1socfKGRmdXdnIEuaQC4HHgpMA7cJGljRHwrs89i4ApgdUTcK+lxHWpvLp6TbmZ1lKeGfhqwIyLuioi9wLXAmoZ9Xg1cHxH3AkTEnmKbmd/MnPRdk1MEP5uTvuGWXb1qkplZV+QJ9BFgZ2Z5PF2X9RTgaEk3SrpZ0uuavZCk8yWNSRqbmJiYX4vn4DnpZlZXeQJdTdZFw/Ii4JeBs4FVwF9IesohT4q4KiJGI2J0eHi47cbm4TnpZlZXeQJ9HDgus3wssLvJPp+JiIci4j7gS8DJxTSxPZ6TbmZ1lSfQbwJOlLRc0uHAucDGhn0+ATxf0iJJjwKeBWwrtqn5eE66mdXVnLNcImKfpAuBzSTTFj8YEVslXZBuvzIitkn6DHArcIBkauPtnWx4K56TbmZ1pYjGcnh3jI6OxtjYWE+ObWZWVpJujojRZtsqcaXobDwn3czqotKB7vukm1mdVOLmXK14TrqZ1UmlA91z0s2sTiod6J6TbmZ1UulA95x0M6uTSp8U9Zx0M6uTWs1D9xRGMyu7Ws9Dn+EpjGZWdZWuoWd5CqOZVV1tAt1TGM2s6moT6J7CaGZVV5tA9xRGM6u62pwU9RRGM6u6Wk1bzPIURjMrI09bbOApjGZWRbWpoWd5CqOZVVEtA91TGM2simoZ6J7CaGZVVMtA9xRGM6uiWgb62lNHuPg3nsHI4iEELB4a5MjBw3jzdVt43iVfYMMtu3rdRDOzttUy0CEJ9a+sezF/+6pT+Om+Azzw8DTBz2a8ONTNrGxqG+gzPOPFzKqi9oHuGS9mVhW1D3TPeDGzqqh9oHvGi5lVRe0D3TNezKwqah/o4BkvZlYNDvQMz3gxszLLFeiSVkvaLmmHpHVNtq+U9CNJW9LHO4tvaud5xouZldmct8+VNABcDrwUGAdukrQxIr7VsOuXI+JlHWhj1yxdPMSuJuHtGS9mVgZ5RuinATsi4q6I2AtcC6zpbLN6wzNezKzM8gT6CLAzszyermv0HEnflPRpSU8vpHVd5hkvZlZmeQJdTdY1fm7dN4ATIuJk4O+BDU1fSDpf0piksYmJibYa2i2e8WJmZZUn0MeB4zLLxwK7sztExI8j4sH0603AoKQljS8UEVdFxGhEjA4PDy+g2Z3nGS9mVjZ5Av0m4ERJyyUdDpwLbMzuIOkXJCn9+rT0de8vurHd5BkvZlY2c85yiYh9ki4ENgMDwAcjYqukC9LtVwLnAL8vaR8wBZwbEY1lmVLxjBczK5s5Ax0eKaNsalh3ZebrfwD+odim9dZFq1aw/vrbDiq7eMaLmfUzXynagme8mFnZONBn4RkvZlYmDvQcPOPFzMrAgZ6DZ7yYWRk40HPwpxqZWRk40HNodo8XkdTSfYLUzPpFrmmLdbf21OTWNZdu3s6uySnEz+59MHOCNLufmVkveISe08yMl5HFQ4fcyMYnSM2sHzjQ2+QTpGbWrxzobfIJUjPrVw70NvkEqZn1K58UbZNPkJpZv/IIfR58gtTM+pEDfQF8gtTM+okDfQFanQgNcD3dzLrOgb4AzU6QzvAdGc2s2xzoC5C9Z3ozrqebWTc50Bdo5gSpWmx3Pd3MusWBXhDX082s1xzoBXE93cx6zYFeENfTzazXHOgFcj3dzHrJgd4BvoGXmfWCA70DmtXTBw8TD+/dx/J1N/gkqZl1hG/O1QHZG3jtnpziMUODPLR3Hw88PA34Jl5m1hkeoXfITD39u5eczVFHLGJ6/8G38fJJUjMrmgO9C1qdDPU91M2sSA70LpjtZKjnqJtZURzoXTDbRUfg8ouZFcMnRbug8VOOmvEcdTNbqFwjdEmrJW2XtEPSuln2+xVJ+yWdU1wTqyH7KUfN+J4vZrZQcwa6pAHgcuBM4CTgPEkntdjvr4DNRTeySnzPFzPrlDwj9NOAHRFxV0TsBa4F1jTZ74+AjwN7Cmxf5fieL2bWKXkCfQTYmVkeT9c9QtII8OvAlbO9kKTzJY1JGpuYmGi3rZXhe76YWSfkCfRmudP4YfeXAW+PiP2zvVBEXBURoxExOjw8nLOJ1eV7qJtZkfIE+jhwXGb5WGB3wz6jwLWS7gbOAa6QtLaIBlaZ6+lmVqQ8gX4TcKKk5ZIOB84FNmZ3iIjlEbEsIpYBHwP+ICI2FN3YqnE93cyKNGegR8Q+4EKS2SvbgI9GxFZJF0i6oNMNrLq56um+PYCZ5ZXrwqKI2ARsaljX9ARoRPz2wptVP0sXD7W86Mh3ZzSzPHzpf5/w7QHMbKEc6H1irno6uPxiZrNzoPeRuW4PAJ79YmatOdD7kMsvZjYfDvQ+5PKLmc2HA71PufxiZu1yoPc5l1/MLC8Hep9z+cXM8nKgl4DLL2aWhwO9RFx+MbPZONBLxOUXM5uNA71kXH4xs1Yc6CXl8ouZNXKgl5TLL2bWyIFeYi6/mFmWA70C8pRf3nTdFo/WzSou1wdcWH+b+dCLSzdvb/khGeAPyjCrOo/QKyJP+QV8stSsyhzoFTNX+QV8stSsqlxyqRiXX8zqSxHRkwOPjo7G2NhYT45dFxtu2cX6629janr/rPuNLB7iolUrHOxmJSDp5ogYbbbNI/QK82jdrF48Qq+J513yhVlDHWBA4kAESz1iN+tbs43QfVK0JvKcLN0fQeCLkczKyoFeE3luFZDl6Y1m5eNAr5GZueqXveqUOUfr4OmNZmXjk6I1lD1ZuntyisMk9rc4l+ITpmbl4ZOi5umNZiXiaYs2K09vNKuGXDV0SaslbZe0Q9K6JtvXSLpV0hZJY5JOL76p1knt3AvGd240609zBrqkAeBy4EzgJOA8SSc17PZ54OSIOAX4XeDqgttpXZJneiN4aqNZP8ozQj8N2BERd0XEXuBaYE12h4h4MH5WjD8K6E1h3hasnemNHq2b9Zc8gT4C7Mwsj6frDiLp1yXdAdxAMkq3kprP9MY3X7eFZetucLib9VCeQFeTdYeMwCPiPyLiqcBa4L1NX0g6P62xj01MTLTVUOu+dkbrM28Il2LMeidPoI8Dx2WWjwV2t9o5Ir4EPEnSkibbroqI0YgYHR4ebrux1n3tjtbBpRizXskT6DcBJ0paLulw4FxgY3YHSU+WpPTrZwKHA/cX3VjrnXZvHQAerZt125yBHhH7gAuBzcA24KMRsVXSBZIuSHd7BXC7pC0kM2JeFb26Ysk6xqN1s/7mK0VtXjbcsuuRC5FEvmlNQ4MDXPwbz/BFSWYLMNuVog50W7BsuOfhWwiYzZ8D3boi7z1hgEdG9Q53s/b4Xi7WFXnvCQOHTnPMPt/M5scjdOuIdkbrMzxaN5ubSy7WE+3W1sGlGLO5+DNFrSfmM83RV5yazZ8D3Tqu8aKkZveSaMZz2M3a45KLdZ1LMWbz5xq69aX5nDgFGDxMPPrIRUw+PM1SB7zVjKctWl9qnOaY94rT6QPBAw9PA572aJblEbr1jfmUYrJcjrE6cMnFSmW+pRhwrd2qz4FupTOfm381crhbFTnQrdRmwn335BSPGRrkob37mN7f3vvW4W5V4ZOiVmprTx05KIDnU2v3vWOsDjxCt9JaSK0dPFq3cnLJxSprobV2l2KsbBzoVgsOd6sDB7rVjsPdqsqBbrW20AuWHO7WTxzoZiz8JCo43K33HOhmqSIuWJrhcLdecKCbNeFwtzJyoJvNochw9+19rZMc6GZtKDLcwaN3K5YD3WyeOhXui4cGkfAo3trmQDcrQNHhnuVRvOXlQDcrmMPdesWBbtZBRdzetxWHuzVyoJt1UadG766/GxQQ6JJWAx8ABoCrI+KShu2vAd6eLj4I/H5EfHO213SgWx10sjQzw6P4ellQoEsaAO4EXgqMAzcB50XEtzL7PBfYFhEPSDoTeFdEPGu213WgW900lmYkeODhaY/irS0LDfTnkAT0qnR5PUBEXNxi/6OB2yNi1nePA90s4VG8tWOhgX4OsDoi3pgu/xbwrIi4sMX+bwWeOrN/w7bzgfMBjj/++F++55572uqIWdV1O9xf9NRh/uuOCXZPTnkkXxILDfRXAqsaAv20iPijJvu+CLgCOD0i7p/tdT1CN5tdN8K9kUfy/a8rJRdJvwT8B3BmRNw5V6Mc6Gb5daP+3sj1+P600EBfRHJS9AxgF8lJ0VdHxNbMPscDXwBeFxFfzdMoB7rZwnkUXz9FTFs8C7iMZNriByPi/ZIuAIiIKyVdDbwCmCmK72t1wBkOdLNieRRfD76wyKzGejmKd9AXz4FuZsDBo/ilmVku3Qp7B/3COdDNbE69GMnPcNDn50A3s7b0oh7fTPbTnx7jsAcc6GZWkF6O4hvVdVTvQDezwvXLKL5R1YPegW5mXVOmoC9jGceBbmY9169Bn1WG0b0D3cz6VrOgnxk9F/npTwvRT6N7B7qZlVIZRvUzujUjx4FuZpVSpqCHYks5swX6ogLaambWVWtPHWkahP0a9DPHnpyafmTdrskp1l9/G0Bho3cHuplVRjtBny2N9Cr0p6b3c+nm7Q50M7O8WgV9Vq9G97snpwp7LQe6mRnzH90vdEbO0sVDRTQfcKCbmc0qz+ge5jfCHxoc4KJVKwprqwPdzKwA7Y7wOzF33YFuZtZBeUf4RTisK0cxM7OOc6CbmVWEA93MrCIc6GZmFeFANzOriJ7dnEvSBHDPPJ++BLivwOaURR37Xcc+Qz37Xcc+Q/v9PiEihptt6FmgL4SksVZ3G6uyOva7jn2Geva7jn2GYvvtkouZWUU40M3MKqKsgX5VrxvQI3Xsdx37DPXsdx37DAX2u5Q1dDMzO1RZR+hmZtbAgW5mVhGlC3RJqyVtl7RD0rpet6cTJB0n6b8kbZO0VdKfpOsfK+lzkr6d/nt0r9taNEkDkm6R9Kl0uQ59XizpY5LuSH/mz6lJv9+cvr9vl/QRSUdWrd+SPihpj6TbM+ta9lHS+jTbtkta1e7xShXokgaAy4EzgZOA8ySd1NtWdcQ+4C0R8TTg2cAfpv1cB3w+Ik4EPp8uV82fANsyy3Xo8weAz0TEU4GTSfpf6X5LGgH+GBiNiF8EBoBzqV6/PwSsbljXtI/p7/i5wNPT51yRZl5upQp04DRgR0TcFRF7gWuBNT1uU+Ei4nsR8Y3065+Q/IKPkPT1X9Ld/gVY25MGdoikY4Gzgaszq6ve558HXgD8E0BE7I2ISSre79QiYEjSIuBRwG4q1u+I+BLww4bVrfq4Brg2In4aEd8FdpBkXm5lC/QRYGdmeTxdV1mSlgGnAv8LPD4ivgdJ6AOP62HTOuEy4G3Agcy6qvf5icAE8M9pqelqSUdR8X5HxC7gr4F7ge8BP4qIz1Lxfqda9XHB+Va2QFeTdZWddynp0cDHgTdFxI973Z5OkvQyYE9E3NzrtnTZIuCZwD9GxKnAQ5S/zDCntG68BlgOLAWOkvTa3raq5xacb2UL9HHguMzysST/TascSYMkYf6vEXF9uvoHkp6Qbn8CsKdX7euA5wG/JuluklLaiyVdQ7X7DMl7ejwi/jdd/hhJwFe93y8BvhsRExExDVwPPJfq9xta93HB+Va2QL8JOFHSckmHk5xA2NjjNhVOkkhqqtsi4m8ymzYCr0+/fj3wiW63rVMiYn1EHBsRy0h+rl+IiNdS4T4DRMT3gZ2SZj76/QzgW1S83ySllmdLelT6fj+D5FxR1fsNrfu4EThX0hGSlgMnAl9v65UjolQP4CzgTuA7wDt63Z4O9fF0kv9q3QpsSR9nAceQnBX/dvrvY3vd1g71fyXwqfTryvcZOAUYS3/eG4Cja9LvdwN3ALcDHwaOqFq/gY+QnCOYJhmBv2G2PgLvSLNtO3Bmu8fzpf9mZhVRtpKLmZm14EA3M6sIB7qZWUU40M3MKsKBbmZWEQ50M7OKcKCbmVXE/wPbedRugVygfAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.scatter(range(0,100), cost_list[0:100])\n",
    "plt.title(\"Loss function with iterations.\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "smoking-vermont",
   "metadata": {},
   "source": [
    "In the above plot, we can see that the cost function decreases with every iteration and almost gets flattened as we move towards 100. You can fiddle around with hyper-parameters and see the behaviour of cost function.\n",
    "\n",
    "Now, let’s see how our logistic regression fares in comparison to sklearn’s logistic regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "federal-administrator",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing ACC:  0.9\n",
      "Need time = 0.0033266544342041016\n"
     ]
    }
   ],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.metrics import f1_score\n",
    "start = time.time()\n",
    "model = LogisticRegression().fit(X_tr,y_tr)\n",
    "y_pred = model.predict(X_te)\n",
    "end = time.time()\n",
    "_, _, sk_acc, _ = Evaluate(y_te, y_pred)\n",
    "print('Testing ACC:  ' + str(sk_acc))\n",
    "print(\"Need time =\", str(end-start))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "wired-korean",
   "metadata": {},
   "source": [
    "At the end, we achieve the same accuracy to scikit-learn's implementation, but they are faster than ours."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "proud-nirvana",
   "metadata": {},
   "source": [
    "## 3. Conclusion\n",
    "\n",
    "We have successfully implemented a gradient descent algorithm for the logistic regression problem and compared with the sklearn version. We achieve the same accuracy to scikit-learn's implementations, but they are faster than ours."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "spectacular-monitoring",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "continuing-rebecca",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}