{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Last updated as at 19 July 2017" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import plotly\n", "import plotly.graph_objs as go\n", "plotly.offline.init_notebook_mode(connected=True)\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression\n", "\n", "In this first part, we develop an unregularized logistic regression model that predicts whether a student gets admitted into a university." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Exam 1 ScoreExam 2 ScoreAdmission Result
034.62366078.0246930
130.28671143.8949980
235.84740972.9021980
360.18259986.3085521
479.03273675.3443761
\n", "
" ], "text/plain": [ " Exam 1 Score Exam 2 Score Admission Result\n", "0 34.623660 78.024693 0\n", "1 30.286711 43.894998 0\n", "2 35.847409 72.902198 0\n", "3 60.182599 86.308552 1\n", "4 79.032736 75.344376 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "admit_data = pd.read_csv(\"ex2data1.txt\", header = None, \n", " names = [\"Exam 1 Score\", \"Exam 2 Score\", \"Admission Result\"])\n", "admit_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filter the two groups of students, admitted and not admitted, and create two separate objects in the same figure." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "admitted = admit_data[admit_data[\"Admission Result\"] == 1]\n", "notadmitted = admit_data[admit_data[\"Admission Result\"] == 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "admits = go.Scatter(x = admitted[\"Exam 1 Score\"], \n", " y= admitted[\"Exam 2 Score\"],\n", " mode = \"markers\", name= \"Admitted\")\n", "notadmits = go.Scatter(x = notadmitted[\"Exam 1 Score\"],\n", " y = notadmitted[\"Exam 2 Score\"],\n", " mode = \"markers\", name= \"Not admitted\")\n", "plotly.offline.iplot({\"data\": [admits, notadmits],\n", " \"layout\": go.Layout(title = \"Scatter plot of training data\",\n", " xaxis = dict(title=\"Exam 1 score\"),\n", " yaxis = dict(title=\"Exam 2 score\"))})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sigmoid, Cost, and Gradient Functions" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def sigmoid(z):\n", " return 1 / (1 + np.exp(-z))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def compute_cost(theta, X, y):\n", " return 1/len(y) * (-y @ np.log(sigmoid(X@theta)) - (1-y) @ np.log(1-sigmoid(X@theta)) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Implementation notes for `compute_gradient()`\n", "\n", "Gradient function is vectorized according to the form given in the subsequent programming exercise 3 (I skipped ahead), making for an elegant implementation that does not use any loops." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def compute_gradient(theta, X, y):\n", " gradient = np.zeros(len(theta))\n", " XT = X.T\n", " beta = sigmoid(X@theta) - y\n", " gradient = 1/len(y) * XT@beta\n", " return gradient" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Prepare the input matrices and vectors.\n", "\n", "Matrix `X` is formed from a transpose in order to make it the design matrix. Parameter vector `theta_init` is initialized to 0 by convention." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 34.62365962, 78.02469282],\n", " [ 1. , 30.28671077, 43.89499752],\n", " [ 1. , 35.84740877, 72.90219803],\n", " [ 1. , 60.18259939, 86.3085521 ],\n", " [ 1. , 79.03273605, 75.34437644]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = np.array([np.ones(len(admit_data.index)), \n", " admit_data[\"Exam 1 Score\"], admit_data[\"Exam 2 Score\"]]).T\n", "theta_init = np.zeros(3)\n", "# Do not need to reshape y to be a column vector, keep y as row vector \n", "# in order to complete the multiplication with y on the left hand side\n", "# otherwise y needs to be on the right\n", "y = np.asarray(admit_data[\"Admission Result\"])\n", "X[0:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning the parameters using conjugate gradient\n", "\n", "Use of the `minimize` method in the `scipy.optimize` module requires some special care noted below.\n", "\n", "#### Implementation notes\n", "\n", "At first, also couldn't get the optimization to work, and it was because the arguments to my functions were in the order `(X,y,theta)`. Based on some simple Googling, I came across this [stackoverflow post](https://stackoverflow.com/questions/18801002/fminunc-alternate-in-numpy) and noticed that this person got his cost function to work correctly, and had his arguments in the order of `(theta, X, y)`. It made me wonder if this order mattered. So I applied this change to my function and immediately the optimization worked!\n", "\n", "In retrospect, I understand why `theta` must be the first argument that is passed into the cost and gradient functions. This is becasue the interface of the `minimize` function in `scipy.optimize` expects its `x0` argument to be the __initial guess__, ie. the initialized parameter values. In our case here $\\theta$ is the parameter for which we set our initial guess to be $\\theta = [\\theta_0,\\theta_1,\\theta_2] = [0,0,0]$. `minimize` then performs an optimization routine on my cost function as described in `compute_cost()`, iteratively updating $\\theta$ until the cost value converges. It then returns the $\\theta$ for which this lowest minimized cost value was obtained.\n", "\n", "That is why the `x0` argument of `minimize` is special, and must always be the array of parameters for the cost function! It was thus wrong when I ordered my arguments as `(X,y,theta)` as this made `X` be the argument passed to `x0`, when it should have been `theta`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.203498\n", " Iterations: 62\n", " Function evaluations: 140\n", " Gradient evaluations: 140\n", "Vector of learnt parameters: [-25.16120038 0.20623067 0.20147049]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/kohaugustine/miniconda2/envs/intro_machine_learning_python/lib/python3.5/site-packages/ipykernel_launcher.py:2: RuntimeWarning:\n", "\n", "divide by zero encountered in log\n", "\n" ] } ], "source": [ "from scipy.optimize import minimize\n", "\n", "results = minimize(compute_cost, theta_init, args = (X,y),\n", " method = 'CG', jac = compute_gradient, \n", " options = {\"maxiter\": 400, \"disp\" : True})\n", "# optimized parameters are accessible through the x attribute\n", "theta_optimized = results.x\n", "print(\"Vector of learnt parameters:\",theta_optimized)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot the Decision Boundary\n", "\n", "The __decision boundary__ is the line that separates the regions where y = 0 (student is not admitted) and where y = 1 (student is admitted). \n", "\n", "Our optimized paramters as computed via conjugate gradient is determined as $\\theta = [\\theta_0,\\theta_1,\\theta_2] = [-25.16120038, 0.20623067, 0.20147049]$. Our input, $\\theta^Tx$ to the sigmoid function $g(z)$ is $\\theta_0x_0+\\theta_1x_1+\\theta_2x_2 \\space where \\space x_0 = 1$. The property of the sigmoid function $g(z)$ is such that $g(z)\\geq0.5 \\space when \\space z\\geq0$ and $g(z)<0.5 \\space when \\space z<0$, where $z=\\theta^Tx$.\n", "\n", "Since we have defined that our hypothesis, $h_\\theta(x) = g(\\theta^Tx)$, and fixed the thresholds that the predicted class value, $y$, be $ y=1 \\space when \\space h_\\theta(x)\\geq0.5$ and $y=0 \\space when \\space h_\\theta(x)<0.5$, this means that $y=1\\space when \\space \\theta^Tx\\geq0 \\space$ and $\\space y=0 \\space when \\space \\theta^Tx<0$. Geometrically, any data points above the decision boundary are predicted to be case 1, and points below predicted as case 0.\n", "\n", "Therefore, the __decision boundary__ is defined by the line $\\theta^Tx=0 \\Leftrightarrow -25.16120038+0.20623067x_1+0.20147049x_2=0$ which can be plotted in the same space as the training dataset. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x1 = np.linspace(start = admit_data[\"Exam 1 Score\"].min(), \n", " stop = admit_data[\"Exam 1 Score\"].max(), num = 100)\n", "x0 = np.ones(100)\n", "x2 = -theta_optimized[1]/theta_optimized[2] * x1 - (theta_optimized[0]/theta_optimized[2] * x0)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "decisionboundary = go.Scatter(x = x1, y = x2, mode=\"lines\", \n", " name = \"Decision Boundary\" )\n", "plotly.offline.iplot({\"data\": [admits, notadmits, decisionboundary],\n", " \"layout\": go.Layout(title=\"Training data with decision boundary\",\n", " xaxis = dict(title=\"Exam 1 score\"),\n", " yaxis = dict(title=\"Exam 2 score\"))})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluating logistic regression\n", "\n", "We test the predictions that our logistic regression gives on:\n", "\n", "- a particular case of a student with Exam 1 score = 45 and Exam 2 score = 85\n", "\n", "- the training dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Probability of a particular student" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The admission probability of a student with Exam 1 score = 45 and Exam 2 score = 85 is 0.7762893321212395.\n" ] } ], "source": [ "p = sigmoid(z = (np.array([1,45,85]) @ theta_optimized))\n", "print(\"The admission probability of a student with Exam 1 score = 45 and Exam 2 score = 85 is {}.\".format(p))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Accuracy on the entire training dataset" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The accuracy of the logistic regression classifier as tested on the training dataset is 89.0 percent.\n" ] } ], "source": [ "hypothesis_out = sigmoid(X@theta_optimized)\n", "# By definition, whenever the hypothesis is greater than or equal to 0.5, \n", "# the predicted class should be set to 1, else 0.\n", "predicted_training = np.where(hypothesis_out >= 0.5, 1, 0) \n", "acc = np.mean(predicted_training == admit_data[\"Admission Result\"])*100\n", "print(\"The accuracy of the logistic regression classifier as tested on the training dataset is {} percent.\".format(acc))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Regularized logistic regression\n", "\n", "In this second part, we develop a regularized logistic regresion model that predicts whether microchips from a fabrication plant passes quality assurance." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Microchip Test 1Microchip Test 2Result
00.0512670.699561
1-0.0927420.684941
2-0.2137100.692251
3-0.3750000.502191
4-0.5132500.465641
\n", "
" ], "text/plain": [ " Microchip Test 1 Microchip Test 2 Result\n", "0 0.051267 0.69956 1\n", "1 -0.092742 0.68494 1\n", "2 -0.213710 0.69225 1\n", "3 -0.375000 0.50219 1\n", "4 -0.513250 0.46564 1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chip_data = pd.read_csv(\"./ex2data2.txt\", header = None,\n", " names = [\"Microchip Test 1\", \"Microchip Test 2\",\n", " \"Result\"])\n", "chip_data.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "accept = chip_data[chip_data[\"Result\"] == 1]\n", "reject = chip_data[chip_data[\"Result\"] == 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the data\n", "\n", "We see that it is impossible to separate the two classes of accepted and rejected microchips without a non-linear decision boundary. We need to build our classifier with **polynomial terms** generated from the given features in order to obtain a non-linear boundary that can separate our data." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "accepted = go.Scatter(x = accept[\"Microchip Test 1\"], \n", " y= accept[\"Microchip Test 2\"],\n", " mode = \"markers\", name= \"Accepted\")\n", "rejected = go.Scatter(x = reject[\"Microchip Test 1\"],\n", " y = reject[\"Microchip Test 2\"],\n", " mode = \"markers\", name= \"Rejected\")\n", "plotly.offline.iplot({\"data\": [accepted, rejected],\n", " \"layout\": go.Layout(title = \"Scatter plot of training data\",\n", " xaxis = dict(title=\"Microchip Test 1\"),\n", " yaxis = dict(title=\"Microchip Test 2\"))})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature Mapping\n", "\n", "Polynomial terms are generated iteratively, up to the specified degree (6 in this case), from vectors of the two features, $x_1, x_2$, that we have in our training dataset. \n", "\n", "TODO: Explore more efficient way...perhaps by using dynamic programming." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def map_features(x1, x2, deg):\n", " # Initialize array with first column of all ones, ie. the 0th degree term\n", " featuremap = np.ones((len(x1), 1))\n", " # Begin iteratively generating polynomial features from \n", " # the 2nd column onwards, degree 1 terms and above\n", " for d in range(1,deg+1): # add 1 to the deg in order to match the upper bound to the degree value\n", " for j in range(0,d+1):\n", " newfeature = x1**(d-j) * x2**j\n", " featuremap = np.append(featuremap, newfeature.reshape(len(x1),1), axis = 1)\n", " return featuremap " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regularized Cost and Gradient Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Unregularized cost function** was of the form $$J(\\theta) = \\frac{1}{m} \\sum_{i=0}^m \\left[-y^{(i)} \\log( h_\\theta(x^{(i)}) ) - (1 - y^{(i)}) \\log( 1 - h_\\theta(x^{(i)}))\\right]$$\n", "and now we add a regularization term at the end to get\n", "$$J(\\theta) = \\frac{1}{m} \\sum_{i=0}^m \\left[-y^{(i)} \\log( h_\\theta(x^{(i)}) ) - (1 - y^{(i)}) \\log( 1 - h_\\theta(x^{(i)}) ) \\right]+\\frac{\\lambda}{2m}\\sum_{j=0}^n \\theta^2$$ where $m$ represents the number of rows in the training data and $n$ the number of features.\n", "\n", "**Implementation Note:** Do not include the bias term $\\theta_0$ in computing the regularized cost. So we index all the parameters in the parameter vector `theta` starting from parameter `1` which is $\\theta_1$, all the way till the last parameter." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def compute_cost_regularized(theta, X, y, lda):\n", " reg = lda/(2*len(y)) * np.sum(theta[1:]**2)\n", " return 1/len(y) * np.sum(-y @ np.log(sigmoid(X@theta)) - (1-y) @ np.log(1-sigmoid(X@theta))) + reg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Unregularized gradient function** was\n", "$$\\frac{\\partial J(\\theta)}{\\partial \\theta_j} = \\frac{1}{m}\\sum_{i=0}^m(h_\\theta(x^{(i)}) - y^{(i)}) x^{(i)}_j $$\n", "and with the regularization term added to the cost function above, the gradient now becomes\n", "$$\\frac{\\partial J(\\theta)}{\\partial \\theta_j} = (\\frac{1}{m}\\sum_{i=0}^m (h_\\theta(x^{(i)}) - y^{(i)}) x^{(i)}_j ) + \\frac{\\lambda}{m}\\theta_j$$ for $j\\geq1$.\n", "\n", "\n", "#### Implementation notes for adding regularization to gradient function in a vectorized form\n", "\n", "With the gradient function already vectorized from the earlier unregularized implementation, adding the regularization in a vectorized manner is also easy. We just form the regularization term as a vector, `regterm`, where all the hypothesis parameters except the first, $\\theta_1 \\dots \\theta_n$ get multiplied by the regularization parameter. In the implementation below, the first hypothesis parameter, $\\theta_0$, is left unregularized by setting the value at that position in the array to $0$ before `regterm` is added to the partial derivative term." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def compute_gradient_regularized(theta, X, y, lda):\n", " gradient = np.zeros(len(theta))\n", " XT = X.T\n", " beta = sigmoid(X@theta) - y\n", " regterm = lda/len(y) * theta\n", " # theta_0 does not get regularized, so a 0 is substituted in its place\n", " regterm[0] = 0 \n", " gradient = 1/len(y) * XT@beta + regterm\n", " return gradient" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize the input variables. The design matrix $X$ is 28 columns long due to the polynomial features." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 1.00000000e+00, 5.12670000e-02, 6.99560000e-01,\n", " 2.62830529e-03, 3.58643425e-02, 4.89384194e-01,\n", " 1.34745327e-04, 1.83865725e-03, 2.50892595e-02,\n", " 3.42353606e-01, 6.90798869e-06, 9.42624411e-05,\n", " 1.28625106e-03, 1.75514423e-02, 2.39496889e-01,\n", " 3.54151856e-07, 4.83255257e-06, 6.59422333e-05,\n", " 8.99809795e-04, 1.22782870e-02, 1.67542444e-01,\n", " 1.81563032e-08, 2.47750473e-07, 3.38066048e-06,\n", " 4.61305487e-05, 6.29470940e-04, 8.58939846e-03,\n", " 1.17205992e-01],\n", " [ 1.00000000e+00, -9.27420000e-02, 6.84940000e-01,\n", " 8.60107856e-03, -6.35227055e-02, 4.69142804e-01,\n", " -7.97681228e-04, 5.89122275e-03, -4.35092419e-02,\n", " 3.21334672e-01, 7.39785525e-05, -5.46363780e-04,\n", " 4.03513411e-03, -2.98012201e-02, 2.20094970e-01,\n", " -6.86091891e-06, 5.06708697e-05, -3.74226408e-04,\n", " 2.76382476e-03, -2.04120477e-02, 1.50751849e-01,\n", " 6.36295342e-07, -4.69931780e-06, 3.47065055e-05,\n", " -2.56322636e-04, 1.89305413e-03, -1.39810280e-02,\n", " 1.03255971e-01]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = map_features(np.array(chip_data[\"Microchip Test 1\"]), np.array(chip_data[\"Microchip Test 2\"]),6)\n", "theta_init = np.zeros(X.shape[1])\n", "y = np.asarray(chip_data[\"Result\"])\n", "lda = 1 # Regularization parameter, lambda, value \n", "X[0:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute some test values..." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.13484831467\n", "[ 0.34604507 0.08508073 0.11852457 0.1505916 0.01591449]\n" ] } ], "source": [ "print(compute_cost_regularized(np.ones(X.shape[1]), X,y, lda))\n", "print(compute_gradient_regularized(np.ones(X.shape[1]), X,y,lda)[0:5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning the parameters with regularization using conjugate gradient" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.529003\n", " Iterations: 19\n", " Function evaluations: 55\n", " Gradient evaluations: 55\n", "Vector of learnt parameters: [ 1.2726322 0.62526851 1.18110054 -2.01977776 -0.91750969 -1.43140105\n", " 0.12396937 -0.36542088 -0.35720348 -0.17516018 -1.45817405 -0.05109747\n", " -0.61556197 -0.2747374 -1.19275643 -0.24225813 -0.20594332 -0.04478629\n", " -0.27777193 -0.29534671 -0.45647333 -1.04328679 0.02771439 -0.29243876\n", " 0.01551392 -0.3273817 -0.14391016 -0.92473298]\n" ] } ], "source": [ "results = minimize(compute_cost_regularized, theta_init, args = (X,y,lda),\n", " method = 'CG', jac = compute_gradient_regularized, \n", " options = {\"maxiter\": 400, \"disp\" : True})\n", "# optimized parameters are accessible through the x attribute\n", "theta_optimized = results.x\n", "print(\"Vector of learnt parameters:\", theta_optimized)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Checking the accuracy\n", "\n", "The accuracy of the prediction results from the training is $83.1\\%$, which closely matches the expected accuracy at $\\lambda=1$!" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83.0508474576 %\n" ] } ], "source": [ "print(np.mean(y == np.where(sigmoid(X@theta_optimized)>=0.5,1,0))*100,\"%\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting the decision boundary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The decision boundary is visualized as a contour plot. The classifier's prediction was computed for each point in an evenly spaced grid of feature values $x_1, x_2$ and stored in a 2D array which was then used to generate the plot.\n", "\n", "#### Implementation notes\n", "When I try to vectorize the grid values computation by constructing the 2D grid for each feature using \n", "`featuremap = map_features(x1,x2,6)` and `np.meshgrid(*[featuremap[i] for i in range(0, featuremap.shape[1])])`, the memory space requirement explodes with the size of the two arrays `x1` and `x2`. When these arrays were just the size of 2 element, the calculation proceeded but ate up a lot of memory. Any larger sizes and `numpy` refuses to execute the computation. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate evenly spaced feature values." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x1 = np.linspace(start = chip_data[\"Microchip Test 1\"].min(), \n", " stop = chip_data[\"Microchip Test 1\"].max(), num = 100)\n", "x2 = np.linspace(start = chip_data[\"Microchip Test 2\"].min(), \n", " stop = chip_data[\"Microchip Test 2\"].max(), num = 100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define a simpler version of the feature mapping function that accepts two scalars of feature values and returns a vector of polynomial features." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Returns a vector of new polynomial features given two scalars\n", "def map_features_from_scalar(x1,x2,deg):\n", " featvec = np.ones(1)\n", " for d in range(1,deg+1): # add 1 to the deg in order to match the upper bound to the degree value\n", " for j in range(0,d+1):\n", " newfeature = x1**(d-j) * x2**j\n", " featvec = np.append(featvec, newfeature)\n", " return featvec \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Compute classifier predictions for each point on the evenly spaced grid\n", "\n", "As noted above, initially I attempted to vectorize the grid generation for all 28 features using `np.meshgrid()` but found out that this was not possible because it would require a lot of memory.\n", "\n", "Eventually I realized that there is no escape from using a double loop to generate the grid, with the need to compute all the features each time. To vectorize this will require more memory than my puny machine can handle. It is a trade-off between saving on computation time or saving on memory." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "z = np.zeros((len(x1),len(x1)))\n", "for i in range(0, len(x1)):\n", " for j in range(0,len(x2)):\n", " # Multiplying the vector of polynomial features for each pair of x1 and x2 feature values to the \n", " # vector of optimized parameters in order to obtain the \n", " z[i,j] = map_features_from_scalar(x1[i],x2[j],6) @ theta_optimized\n", "# Convert z to prediction\n", "z=np.where(sigmoid(z)>=0.5,1,0)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "contmap = go.Contour(z = z, x=x1, y=x2, showlegend = True, \n", " showscale = True, ncontours=1, \n", " text =np.asarray(list(map(lambda x: \"Accepted\" if x == 1 else \"Rejected\",\n", " z.flatten()))).reshape(len(x1),len(x1)),\n", " hoverinfo = \"text\" ,\n", " colorbar=dict(ticktext=[\"Rejected\", \"Accepted\"],\n", " tickmode = \"array\", \n", " title = \"Prediction\"),\n", " contours = dict(start=0,end=1))\n", "\n", "plotly.offline.iplot({\n", " \"data\": [contmap],\n", " \"layout\" : go.Layout(title =\"Decision Boundary as Contour Plot\",\n", " xaxis = dict(title=\"Microchip Test 1\"),\n", " yaxis = dict(title=\"Microchip Test 2\"))\n", " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Optional exercises\n", "\n", "In this part, we explore different regularization parameters for the dataset **to understand how regularization prevents overfitting**. Before we begin, it will be helpful to wrap up our above code into functions that can be called with different regularization parameters multiple times easily." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def compute_optimal_theta(X,y,lda):\n", " results = minimize(compute_cost_regularized, theta_init, args = (X,y,lda),\n", " method = 'CG', jac = compute_gradient_regularized, \n", " options = {\"maxiter\": 400})\n", " theta_optimized = results.x\n", " return theta_optimized" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def generate_classification_grid(optimal_theta):\n", " x1 = np.linspace(start = chip_data[\"Microchip Test 1\"].min(), \n", " stop = chip_data[\"Microchip Test 1\"].max(), num = 200)\n", " x2 = np.linspace(start = chip_data[\"Microchip Test 2\"].min(), \n", " stop = chip_data[\"Microchip Test 2\"].max(), num = 200)\n", " z = np.zeros((len(x1),len(x1)))\n", " for i in range(0, len(x1)):\n", " for j in range(0,len(x2)):\n", " z[i,j] = map_features_from_scalar(x1[i],x2[j],6) @ optimal_theta\n", " \n", " z=np.where(sigmoid(z)>=0.5,1,0)\n", " return x1,x2,z" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def generate_decision_boundary_for_given_lambda(X,y,lda,_title):\n", " optimal_theta = compute_optimal_theta(X,y,lda)\n", " a,b,c = generate_classification_grid(optimal_theta)\n", " \n", " # We only show lines on the contour plot in order to display the decision boundary clearly\n", " contmap = go.Contour(z = c, x=a, y=b, showlegend = True, \n", " name = \"Prediction\", \n", " text =np.asarray(list(map(lambda x: \"Accepted\" if x == 1 else \"Rejected\",\n", " c.flatten()))).reshape(len(a),len(a)),\n", " hoverinfo = \"name+x+y+text\" ,\n", " ncontours = 2,\n", " line = dict(smoothing=1.3, width = 2),\n", " contours = dict(coloring=\"lines\"), \n", " showscale = False)\n", "\n", " plotly.offline.iplot({\n", " \"data\": [accepted,rejected,contmap],\n", " \"layout\" : go.Layout(title = _title,\n", " xaxis = dict(title=\"Microchip Test 1\"),\n", " yaxis = dict(title=\"Microchip Test 2\"))\n", " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples of good regularization\n", "\n", "Let us consider the decision boundaries when $\\lambda = 0.2$ and $\\lambda = 1$. Click on the \"Show closest data on hover\" button in the top-right hand corner of the plot for less clutter during hovering." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "generate_decision_boundary_for_given_lambda(X,y,0.2,\"Good Regularization (When lambda = 0.2)\")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "generate_decision_boundary_for_given_lambda(X,y,1,\"Good Regularization (When lambda = 1)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples of poor regularization (overfitting)\n", "\n", "Next, we consider the case when no regularization $(\\lambda=0)$ is imposed, leading to overfitting." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "generate_decision_boundary_for_given_lambda(X,y,0,\"No Regularization (Overfitting when lambda = 0)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples of poor regularization (underfitting)\n", "\n", "Lastly, we examine the case when too much regularization is imposed, such as when $\\lambda = 60$ and $\\lambda = 100$." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "generate_decision_boundary_for_given_lambda(X,y,60,\"Too Much Regularization (Underfitting when lambda = 60)\")" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "generate_decision_boundary_for_given_lambda(X,y,100,\"Too Much Regularization (Even More Underfitting when lambda = 100)\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 2 }