{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"%%html\n",
"\n",
"\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# Install the necessary dependencies\n",
"\n",
"import os\n",
"import sys\n",
"!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython "
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": [
"remove-cell"
]
},
"source": [
"---\n",
"license:\n",
" code: MIT\n",
" content: CC-BY-4.0\n",
"github: https://github.com/ocademy-ai/machine-learning\n",
"venue: By Ocademy\n",
"open_access: true\n",
"bibliography:\n",
" - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "0MRC0e0KhQ0S",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Logistic Regression"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Introduction\n",
"\n",
"\n",
"* In fact, logistic regression is a classification algorithm, unlike other regression models.\n",
"* Logistic Regression is very important for entering deep learning. \n",
"* After understanding this topic, you will be able to easily learning to Artificial Neural Network."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "LWd1UlMnhT2s",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Importing the libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "YvGPUQaHhXfL",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Sigmoid function"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def sigmoid(x):\n",
" return 1.0 / (1.0 + np.exp(-x))\n",
"\n",
"values = np.arange(-10, 10, 0.1)\n",
"\n",
"plt.plot(values, sigmoid(values))\n",
"plt.xlabel('x')\n",
"plt.ylabel('sigmoid(x)')\n",
"plt.title('Sigmoid Function in Matplotlib')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "K1VMqkGvhc3-",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Importing the dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "M52QDmyzhh9s",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"dataset = pd.read_csv('../../assets/data/Social_Network_Ads.csv')\n",
"X = dataset.iloc[:, :-1].values\n",
"y = dataset.iloc[:, -1].values\n",
"\n",
"dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "YvxIPVyMhmKp",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Splitting the dataset into the Training set and Test set"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "AVzJWAXIhxoC",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "kW3c7UYih0hT",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Feature Scaling"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "9fQlDPKCh8sc",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"sc = StandardScaler()\n",
"X_train = sc.fit_transform(X_train)\n",
"X_test = sc.transform(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "bb6jCOCQiAmP",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Training the Logistic Regression model on the Training set"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 103
},
"colab_type": "code",
"executionInfo": {
"elapsed": 2125,
"status": "ok",
"timestamp": 1588265315505,
"user": {
"displayName": "Hadelin de Ponteves",
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
"userId": "15047218817161520419"
},
"user_tz": -240
},
"id": "e0pFVAmciHQs",
"outputId": "67f64468-abdb-4fe7-cce9-de0037119610",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"classifier = LogisticRegression(random_state = 0)\n",
"classifier.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "yyxW5b395mR2",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Predicting a new result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"executionInfo": {
"elapsed": 2118,
"status": "ok",
"timestamp": 1588265315505,
"user": {
"displayName": "Hadelin de Ponteves",
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
"userId": "15047218817161520419"
},
"user_tz": -240
},
"id": "f8YOXsQy58rP",
"outputId": "2e1b0063-548e-4924-cf3a-93a79d97e35e",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"print(classifier.predict(sc.transform([[30, 87000], [65, 990000]])))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "vKYVQH-l5NpE",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Predicting the Test set results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"colab_type": "code",
"executionInfo": {
"elapsed": 2112,
"status": "ok",
"timestamp": 1588265315506,
"user": {
"displayName": "Hadelin de Ponteves",
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
"userId": "15047218817161520419"
},
"user_tz": -240
},
"id": "p6VMTb2O4hwM",
"outputId": "a4f03a97-2942-45cd-f735-f4063277a96c",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"y_pred = classifier.predict(X_test)\n",
"print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)), 1))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "h4Hwj34ziWQW",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Making the Confusion Matrix"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 68
},
"colab_type": "code",
"executionInfo": {
"elapsed": 2107,
"status": "ok",
"timestamp": 1588265315506,
"user": {
"displayName": "Hadelin de Ponteves",
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
"userId": "15047218817161520419"
},
"user_tz": -240
},
"id": "D6bpZwUiiXic",
"outputId": "f202fcb3-5882-4d93-e5df-50791185067e",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from sklearn.metrics import confusion_matrix, accuracy_score\n",
"cm = confusion_matrix(y_test, y_pred)\n",
"print(cm)\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6OMC_P0diaoD",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Visualising the Training set results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 349
},
"colab_type": "code",
"executionInfo": {
"elapsed": 23189,
"status": "ok",
"timestamp": 1588265336596,
"user": {
"displayName": "Hadelin de Ponteves",
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
"userId": "15047218817161520419"
},
"user_tz": -240
},
"id": "_NOjKvZRid5l",
"outputId": "6fa60701-9aa4-46f2-a6aa-0f9b0aad62b3",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from matplotlib.colors import ListedColormap\n",
"X_set, y_set = sc.inverse_transform(X_train), y_train\n",
"X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.5),\n",
" np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.5))\n",
"plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),\n",
" alpha = 0.75, cmap = ListedColormap(('red', 'green')))\n",
"plt.xlim(X1.min(), X1.max())\n",
"plt.ylim(X2.min(), X2.max())\n",
"for i, j in enumerate(np.unique(y_set)):\n",
" plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)\n",
"plt.title('Logistic Regression (Training set)')\n",
"plt.xlabel('Age')\n",
"plt.ylabel('Estimated Salary')\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "SZ-j28aPihZx",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Visualising the Test set results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 349
},
"colab_type": "code",
"executionInfo": {
"elapsed": 43807,
"status": "ok",
"timestamp": 1588265357223,
"user": {
"displayName": "Hadelin de Ponteves",
"photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64",
"userId": "15047218817161520419"
},
"user_tz": -240
},
"id": "qeTjz2vDilAC",
"outputId": "00fb10bc-c726-46b8-8eaa-c5c6b584aa54",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from matplotlib.colors import ListedColormap\n",
"X_set, y_set = sc.inverse_transform(X_test), y_test\n",
"X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.5),\n",
" np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.5))\n",
"plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),\n",
" alpha = 0.75, cmap = ListedColormap(('red', 'green')))\n",
"plt.xlim(X1.min(), X1.max())\n",
"plt.ylim(X2.min(), X2.max())\n",
"for i, j in enumerate(np.unique(y_set)):\n",
" plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)\n",
"plt.title('Logistic Regression (Test set)')\n",
"plt.xlabel('Age')\n",
"plt.ylabel('Estimated Salary')\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Linear Regression v.s. Logistic Regression"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"from sklearn.datasets import make_classification\n",
"\n",
"X, y = make_classification(\n",
" n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, random_state=12\n",
")\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)\n",
"\n",
"plt.scatter(X[:, 0], X[:, 1], c=y)\n",
"\n",
"plt.plot([-2.0, 0], [1.2, -1.3])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"classifier = LogisticRegression(random_state = 0)\n",
"classifier.fit(X_train, y_train)\n",
"\n",
"classifier.__dict__\n",
"\n",
"print(1.4/2.4)\n",
"\n",
"print(1.3/2.4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from sklearn.metrics import confusion_matrix, accuracy_score\n",
"\n",
"y_pred = classifier.predict(X_test)\n",
"\n",
"cm = confusion_matrix(y_test, y_pred)\n",
"print(cm)\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"classifier.coef_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assignment - 1\n",
"\n",
"- Build classification models:Predict the price range"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assignment - 2\n",
"\n",
"- Logistic Regression from scratch\n"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"colab": {
"authorship_tag": "ABX9TyOsvB/iqEjYj3VN6C/JbvkE",
"collapsed_sections": [],
"machine_shape": "hm",
"name": "logistic_regression.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}