{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%html\n",
    "<!-- The customized css for the slides -->\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../styles/python-programming-introduction.css\"/>\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../styles/basic.css\"/>\n",
    "<link rel=\"stylesheet\" type=\"text/css\" href=\"../../assets/styles/basic.css\" />"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Install the necessary dependencies\n",
    "\n",
    "import os\n",
    "import sys\n",
    "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "source": [
    "---\n",
    "license:\n",
    "    code: MIT\n",
    "    content: CC-BY-4.0\n",
    "github: https://github.com/ocademy-ai/machine-learning\n",
    "venue: By Ocademy\n",
    "open_access: true\n",
    "bibliography:\n",
    "  - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0MRC0e0KhQ0S",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Logistic Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Introduction\n",
    "\n",
    "\n",
    "*  In fact, logistic regression is a  <u>classification</u> algorithm, unlike other regression models.\n",
    "* Logistic Regression is very important for entering deep learning. \n",
    "* After understanding this topic, you will be able to easily learning to Artificial Neural Network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "LWd1UlMnhT2s",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "YvGPUQaHhXfL",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Sigmoid function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "def sigmoid(x):\n",
    "    return 1.0 / (1.0 + np.exp(-x))\n",
    "\n",
    "values = np.arange(-10, 10, 0.1)\n",
    "\n",
    "plt.plot(values, sigmoid(values))\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('sigmoid(x)')\n",
    "plt.title('Sigmoid Function in Matplotlib')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "K1VMqkGvhc3-",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Importing the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "M52QDmyzhh9s",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "dataset = pd.read_csv('../../assets/data/Social_Network_Ads.csv')\n",
    "X = dataset.iloc[:, :-1].values\n",
    "y = dataset.iloc[:, -1].values\n",
    "\n",
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YvxIPVyMhmKp",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Splitting the dataset into the Training set and Test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "AVzJWAXIhxoC",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kW3c7UYih0hT",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Feature Scaling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "9fQlDPKCh8sc",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "sc = StandardScaler()\n",
    "X_train = sc.fit_transform(X_train)\n",
    "X_test = sc.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "bb6jCOCQiAmP",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Training the Logistic Regression model on the Training set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 103
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2125,
     "status": "ok",
     "timestamp": 1588265315505,
     "user": {
      "displayName": "Hadelin de Ponteves",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
      "userId": "15047218817161520419"
     },
     "user_tz": -240
    },
    "id": "e0pFVAmciHQs",
    "outputId": "67f64468-abdb-4fe7-cce9-de0037119610",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "classifier = LogisticRegression(random_state = 0)\n",
    "classifier.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "yyxW5b395mR2",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Predicting a new result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2118,
     "status": "ok",
     "timestamp": 1588265315505,
     "user": {
      "displayName": "Hadelin de Ponteves",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
      "userId": "15047218817161520419"
     },
     "user_tz": -240
    },
    "id": "f8YOXsQy58rP",
    "outputId": "2e1b0063-548e-4924-cf3a-93a79d97e35e",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "print(classifier.predict(sc.transform([[30, 87000], [65, 990000]])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "vKYVQH-l5NpE",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Predicting the Test set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2112,
     "status": "ok",
     "timestamp": 1588265315506,
     "user": {
      "displayName": "Hadelin de Ponteves",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
      "userId": "15047218817161520419"
     },
     "user_tz": -240
    },
    "id": "p6VMTb2O4hwM",
    "outputId": "a4f03a97-2942-45cd-f735-f4063277a96c",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "y_pred = classifier.predict(X_test)\n",
    "print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)), 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "h4Hwj34ziWQW",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Making the Confusion Matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 68
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2107,
     "status": "ok",
     "timestamp": 1588265315506,
     "user": {
      "displayName": "Hadelin de Ponteves",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
      "userId": "15047218817161520419"
     },
     "user_tz": -240
    },
    "id": "D6bpZwUiiXic",
    "outputId": "f202fcb3-5882-4d93-e5df-50791185067e",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import confusion_matrix, accuracy_score\n",
    "cm = confusion_matrix(y_test, y_pred)\n",
    "print(cm)\n",
    "accuracy_score(y_test, y_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6OMC_P0diaoD",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising the Training set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 349
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 23189,
     "status": "ok",
     "timestamp": 1588265336596,
     "user": {
      "displayName": "Hadelin de Ponteves",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
      "userId": "15047218817161520419"
     },
     "user_tz": -240
    },
    "id": "_NOjKvZRid5l",
    "outputId": "6fa60701-9aa4-46f2-a6aa-0f9b0aad62b3",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from matplotlib.colors import ListedColormap\n",
    "X_set, y_set = sc.inverse_transform(X_train), y_train\n",
    "X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.5),\n",
    "                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.5))\n",
    "plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),\n",
    "             alpha = 0.75, cmap = ListedColormap(('red', 'green')))\n",
    "plt.xlim(X1.min(), X1.max())\n",
    "plt.ylim(X2.min(), X2.max())\n",
    "for i, j in enumerate(np.unique(y_set)):\n",
    "    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)\n",
    "plt.title('Logistic Regression (Training set)')\n",
    "plt.xlabel('Age')\n",
    "plt.ylabel('Estimated Salary')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "SZ-j28aPihZx",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising the Test set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 349
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 43807,
     "status": "ok",
     "timestamp": 1588265357223,
     "user": {
      "displayName": "Hadelin de Ponteves",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
      "userId": "15047218817161520419"
     },
     "user_tz": -240
    },
    "id": "qeTjz2vDilAC",
    "outputId": "00fb10bc-c726-46b8-8eaa-c5c6b584aa54",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from matplotlib.colors import ListedColormap\n",
    "X_set, y_set = sc.inverse_transform(X_test), y_test\n",
    "X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.5),\n",
    "                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.5))\n",
    "plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),\n",
    "             alpha = 0.75, cmap = ListedColormap(('red', 'green')))\n",
    "plt.xlim(X1.min(), X1.max())\n",
    "plt.ylim(X2.min(), X2.max())\n",
    "for i, j in enumerate(np.unique(y_set)):\n",
    "    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)\n",
    "plt.title('Logistic Regression (Test set)')\n",
    "plt.xlabel('Age')\n",
    "plt.ylabel('Estimated Salary')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Linear Regression v.s. Logistic Regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_classification\n",
    "\n",
    "X, y = make_classification(\n",
    "    n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, random_state=12\n",
    ")\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)\n",
    "\n",
    "plt.scatter(X[:, 0], X[:, 1], c=y)\n",
    "\n",
    "plt.plot([-2.0, 0], [1.2, -1.3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "classifier = LogisticRegression(random_state = 0)\n",
    "classifier.fit(X_train, y_train)\n",
    "\n",
    "classifier.__dict__\n",
    "\n",
    "print(1.4/2.4)\n",
    "\n",
    "print(1.3/2.4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import confusion_matrix, accuracy_score\n",
    "\n",
    "y_pred = classifier.predict(X_test)\n",
    "\n",
    "cm = confusion_matrix(y_test, y_pred)\n",
    "print(cm)\n",
    "accuracy_score(y_test, y_pred)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "classifier.coef_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Assignment - 1\n",
    "\n",
    "- Build classification models：Predict the price range"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Assignment - 2\n",
    "\n",
    "- Logistic Regression from scratch\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "colab": {
   "authorship_tag": "ABX9TyOsvB/iqEjYj3VN6C/JbvkE",
   "collapsed_sections": [],
   "machine_shape": "hm",
   "name": "logistic_regression.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}