{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression With Non-Linear Boundary Demo\n", "\n", "_Source: 🤖[Homemade Machine Learning](https://github.com/trekhleb/homemade-machine-learning) repository_\n", "\n", "> ☝Before moving on with this demo you might want to take a look at:\n", "> - 📗[Math behind the Logistic Regression](https://github.com/trekhleb/homemade-machine-learning/tree/master/homemade/logistic_regression)\n", "> - ⚙️[Logistic Regression Source Code](https://github.com/trekhleb/homemade-machine-learning/blob/master/homemade/logistic_regression/logistic_regression.py)\n", "\n", "**Logistic regression** is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.\n", "\n", "Logistic Regression is used when the dependent variable (target) is categorical.\n", "\n", "For example:\n", "\n", "- To predict whether an email is spam (`1`) or (`0`).\n", "- Whether online transaction is fraudulent (`1`) or not (`0`).\n", "- Whether the tumor is malignant (`1`) or not (`0`).\n", "\n", "> **Demo Project:** In this example we will try to classify microchips into to categories (`valid` and `invalid`) based on two artifical parameters `param_1` and `param_2`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# To make debugging of logistic_regression module easier we enable imported modules autoreloading feature.\n", "# By doing this you may change the code of logistic_regression library and all these changes will be available here.\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "# Add project root folder to module loading paths.\n", "import sys\n", "sys.path.append('../..')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import Dependencies\n", "\n", "- [pandas](https://pandas.pydata.org/) - library that we will use for loading and displaying the data in a table\n", "- [numpy](http://www.numpy.org/) - library that we will use for linear algebra operations\n", "- [matplotlib](https://matplotlib.org/) - library that we will use for plotting the data\n", "- [logistic_regression](https://github.com/trekhleb/homemade-machine-learning/blob/master/homemade/logistic_regression/logistic_regression.py) - custom implementation of logistic regression" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Import 3rd party dependencies.\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# Import custom logistic regression implementation.\n", "from homemade.logistic_regression import LogisticRegression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the Data\n", "\n", "In this demo we will use artificial dataset in which `param_1` and `param_2` produce non-linear decision boundary (see the plot below)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | param_1 | \n", "param_2 | \n", "validity | \n", "
---|---|---|---|
0 | \n", "0.051267 | \n", "0.699560 | \n", "1 | \n", "
1 | \n", "-0.092742 | \n", "0.684940 | \n", "1 | \n", "
2 | \n", "-0.213710 | \n", "0.692250 | \n", "1 | \n", "
3 | \n", "-0.375000 | \n", "0.502190 | \n", "1 | \n", "
4 | \n", "-0.513250 | \n", "0.465640 | \n", "1 | \n", "
5 | \n", "-0.524770 | \n", "0.209800 | \n", "1 | \n", "
6 | \n", "-0.398040 | \n", "0.034357 | \n", "1 | \n", "
7 | \n", "-0.305880 | \n", "-0.192250 | \n", "1 | \n", "
8 | \n", "0.016705 | \n", "-0.404240 | \n", "1 | \n", "
9 | \n", "0.131910 | \n", "-0.513890 | \n", "1 | \n", "
\n", " | Theta 0 | \n", "Theta 1 | \n", "Theta 2 | \n", "Theta 3 | \n", "Theta 4 | \n", "Theta 5 | \n", "Theta 6 | \n", "Theta 7 | \n", "Theta 8 | \n", "Theta 9 | \n", "... | \n", "Theta 13 | \n", "Theta 14 | \n", "Theta 15 | \n", "Theta 16 | \n", "Theta 17 | \n", "Theta 18 | \n", "Theta 19 | \n", "Theta 20 | \n", "Theta 21 | \n", "Theta 22 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
VALID | \n", "-2.024052 | \n", "1.660125 | \n", "-1.042190 | \n", "1.660125 | \n", "-1.042190 | \n", "-3.345081 | \n", "11.635574 | \n", "-10.209539 | \n", "-19.784433 | \n", "-29.263600 | \n", "... | \n", "-0.443705 | \n", "37.384232 | \n", "1.548052 | \n", "40.476004 | \n", "16.795521 | \n", "24.197483 | \n", "-0.966907 | \n", "33.948141 | \n", "46.700557 | \n", "-22.072182 | \n", "
INVALID | \n", "2.028452 | \n", "-1.614655 | \n", "1.083132 | \n", "-1.614655 | \n", "1.083132 | \n", "3.439305 | \n", "-11.411377 | \n", "10.258364 | \n", "19.310634 | \n", "28.477962 | \n", "... | \n", "-0.423851 | \n", "-38.205306 | \n", "-2.021708 | \n", "-40.536997 | \n", "-16.190337 | \n", "-22.698874 | \n", "2.868809 | \n", "-32.068645 | \n", "-45.901514 | \n", "22.308492 | \n", "
2 rows × 23 columns
\n", "