{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "%%html\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython " ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0MRC0e0KhQ0S", "slideshow": { "slide_type": "slide" } }, "source": [ "# Logistic Regression" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Introduction\n", "\n", "\n", "* In fact, logistic regression is a classification algorithm, unlike other regression models.\n", "* Logistic Regression is very important for entering deep learning. \n", "* After understanding this topic, you will be able to easily learning to Artificial Neural Network." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "LWd1UlMnhT2s", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Importing the libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "YvGPUQaHhXfL", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sigmoid function" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def sigmoid(x):\n", " return 1.0 / (1.0 + np.exp(-x))\n", "\n", "values = np.arange(-10, 10, 0.1)\n", "\n", "plt.plot(values, sigmoid(values))\n", "plt.xlabel('x')\n", "plt.ylabel('sigmoid(x)')\n", "plt.title('Sigmoid Function in Matplotlib')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "K1VMqkGvhc3-", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Importing the dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "M52QDmyzhh9s", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "dataset = pd.read_csv('../../assets/data/Social_Network_Ads.csv')\n", "X = dataset.iloc[:, :-1].values\n", "y = dataset.iloc[:, -1].values\n", "\n", "dataset" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "YvxIPVyMhmKp", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Splitting the dataset into the Training set and Test set" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "AVzJWAXIhxoC", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "kW3c7UYih0hT", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Feature Scaling" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "9fQlDPKCh8sc", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from sklearn.preprocessing import StandardScaler\n", "sc = StandardScaler()\n", "X_train = sc.fit_transform(X_train)\n", "X_test = sc.transform(X_test)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "bb6jCOCQiAmP", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Training the Logistic Regression model on the Training set" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 103 }, "colab_type": "code", "executionInfo": { "elapsed": 2125, "status": "ok", "timestamp": 1588265315505, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "e0pFVAmciHQs", "outputId": "67f64468-abdb-4fe7-cce9-de0037119610", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression\n", "classifier = LogisticRegression(random_state = 0)\n", "classifier.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yyxW5b395mR2", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Predicting a new result" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 2118, "status": "ok", "timestamp": 1588265315505, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "f8YOXsQy58rP", "outputId": "2e1b0063-548e-4924-cf3a-93a79d97e35e", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "print(classifier.predict(sc.transform([[30, 87000], [65, 990000]])))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vKYVQH-l5NpE", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Predicting the Test set results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "executionInfo": { "elapsed": 2112, "status": "ok", "timestamp": 1588265315506, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "p6VMTb2O4hwM", "outputId": "a4f03a97-2942-45cd-f735-f4063277a96c", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "y_pred = classifier.predict(X_test)\n", "print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)), 1))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "h4Hwj34ziWQW", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Making the Confusion Matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 68 }, "colab_type": "code", "executionInfo": { "elapsed": 2107, "status": "ok", "timestamp": 1588265315506, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "D6bpZwUiiXic", "outputId": "f202fcb3-5882-4d93-e5df-50791185067e", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix, accuracy_score\n", "cm = confusion_matrix(y_test, y_pred)\n", "print(cm)\n", "accuracy_score(y_test, y_pred)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "6OMC_P0diaoD", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Visualising the Training set results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 349 }, "colab_type": "code", "executionInfo": { "elapsed": 23189, "status": "ok", "timestamp": 1588265336596, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "_NOjKvZRid5l", "outputId": "6fa60701-9aa4-46f2-a6aa-0f9b0aad62b3", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from matplotlib.colors import ListedColormap\n", "X_set, y_set = sc.inverse_transform(X_train), y_train\n", "X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.5),\n", " np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.5))\n", "plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),\n", " alpha = 0.75, cmap = ListedColormap(('red', 'green')))\n", "plt.xlim(X1.min(), X1.max())\n", "plt.ylim(X2.min(), X2.max())\n", "for i, j in enumerate(np.unique(y_set)):\n", " plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)\n", "plt.title('Logistic Regression (Training set)')\n", "plt.xlabel('Age')\n", "plt.ylabel('Estimated Salary')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "SZ-j28aPihZx", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Visualising the Test set results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 349 }, "colab_type": "code", "executionInfo": { "elapsed": 43807, "status": "ok", "timestamp": 1588265357223, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "qeTjz2vDilAC", "outputId": "00fb10bc-c726-46b8-8eaa-c5c6b584aa54", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from matplotlib.colors import ListedColormap\n", "X_set, y_set = sc.inverse_transform(X_test), y_test\n", "X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.5),\n", " np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.5))\n", "plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),\n", " alpha = 0.75, cmap = ListedColormap(('red', 'green')))\n", "plt.xlim(X1.min(), X1.max())\n", "plt.ylim(X2.min(), X2.max())\n", "for i, j in enumerate(np.unique(y_set)):\n", " plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)\n", "plt.title('Logistic Regression (Test set)')\n", "plt.xlabel('Age')\n", "plt.ylabel('Estimated Salary')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Linear Regression v.s. Logistic Regression" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from sklearn.datasets import make_classification\n", "\n", "X, y = make_classification(\n", " n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, random_state=12\n", ")\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)\n", "\n", "plt.scatter(X[:, 0], X[:, 1], c=y)\n", "\n", "plt.plot([-2.0, 0], [1.2, -1.3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression\n", "\n", "classifier = LogisticRegression(random_state = 0)\n", "classifier.fit(X_train, y_train)\n", "\n", "classifier.__dict__\n", "\n", "print(1.4/2.4)\n", "\n", "print(1.3/2.4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix, accuracy_score\n", "\n", "y_pred = classifier.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "print(cm)\n", "accuracy_score(y_test, y_pred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "classifier.coef_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignment - 1\n", "\n", "- Build classification models:Predict the price range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignment - 2\n", "\n", "- Logistic Regression from scratch\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "colab": { "authorship_tag": "ABX9TyOsvB/iqEjYj3VN6C/JbvkE", "collapsed_sections": [], "machine_shape": "hm", "name": "logistic_regression.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }