{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient Descent Algorithms\n",
"\n",
"This notebook is an exploraton of gradient descent algorithms. We explore two dimensions of variation here. The first pertains to the calculation of the magnitude of the descent (the learning rate, $\\alpha$) along the gradient at each iteration of the algorithm. The second pertains to variations in the number of training samples utilized in the calculation of the gradient $\\triangledown$ of the cost $J$ with respect to $\\theta_j$.\n",
"\n",
"### Basic Gradient Descent Algorithm\n",
"\n",
"Gradient descent is essentially a method for calculating parameters that when applied to each of the observations (also called dependent variables or features) in a data set $X$ results in the lowest total error, or cost.\n",
"\n",
"\n",
"$$f(X,\\theta)=H$$\n",
"\n",
"\n",
"where $H$ is the estimated value of the output (also called dependent variables or labels) for each set of inputs and\n",
"\n",
"$$J(\\theta) = \\sum_{i=1}^m\\mid{h_i(\\theta,x_i) - y_i}\\mid$$\n",
"\n",
"where $J$ represents the total cost. Different methods of computing the cost are used depending on the application, but the formula above expresses the essence of it, which is that we are trying to predict the actual outcomes as closely as possible.\n",
"\n",
"In order to minimze $J$ with respect to $\\theta$ we start with an abitrary value of $\\theta$ and then calculate the direction for each our input variables that would result in the fastest decrease in cost (gradient). Then we calculate a new theta based on the gradient and a model input for the magnitude of the change (the learning rate or $\\alpha$). Then we repeat that process until $\\theta$ and $J$ stop changing and that should result in a value of $\\theta$ that minimizes $J$.\n",
"\n",
"\n",
"$$\\theta_j := \\theta_j - \\alpha\\triangledown\\theta_jJ(\\theta)$$\n",
"\n",
"or\n",
"\n",
"$$\\theta_j := \\theta_j - \\alpha\\frac\\partial{\\partial\\theta_j}J(\\theta)$$\n",
"\n",
"#### Normal Equation\n",
"\n",
"$\\theta$ can also be calculated directly using the equation below.\n",
"\n",
"$$\\theta = (X^TX)^{-1}X^Ty$$\n",
"\n",
"Complexity for the normal equation, $O(n^3)$, is greater than it is it for the basic gradient descent algorithm, $O(n^2)$, but has acceptable performance up to approximately $n=10^5$ features.\n",
"\n",
"### Methods for Determining Learning Rate\n",
"\n",
"We explore three methods for determining the learning rate.\n",
"\n",
"1. Constant\n",
"2. Adaptive Gradient Evaluation\n",
"3. Adapative Momentum Estimation\n",
"\n",
"In order to compare the performance of the algorithms and ensure that they are functioning correctly, we will use a known function as our cost function. This allows us to compare the value of $\\theta$ produced by each algorithm and its associated $J$ with values we can calculate directly from our known function. Using a known function with two dimensional inputs allows us to plot as a surface all possible values of $J$ for a given range of values for $\\theta$ and then $J_\\theta$ for each iteration of the algorithm. We will use the [Stablinsky-Tang function](https://en.wikipedia.org/wiki/Test_functions_for_optimization), which results in a non-convex cost surface suitable for testing with straightforward gradient and cost computations.\n",
"\n",
"Our gradient descent algorithm has been generalized to accept the following arguments:\n",
"\n",
"|Argument |Definition |\n",
"|----------------|------------------------------------------------------------------------|\n",
"|`theta_hist` |Starting value of $\\theta_0$ and calculated value of $J_{\\theta_0}$ |\n",
"|`alpha` |Learning rate, $\\alpha$ |\n",
"|`get_gradient` |Function that returns gradient, $f(\\theta_j)=\\triangledown_{\\theta_j}$ |\n",
"|`get_cost` |Function that returns cost, $f(\\theta_j)=J_{\\theta_j}$ |\n",
"|`delta_min` |Minimum change in cost to establish convergence |\n",
"\n",
"Throughout the variations we consider, the core algorithm remains the same. Both changes to the calculation of the learning rate, $\\alpha$, and later, the number of training samples used in calculating the gradient, are effected by modifying the function that gets passed in with respect to the `get_gradient` argument.\n",
"\n",
"##### Cost Function\n",
"\n",
"The Styblinski–Tang function with respect to $\\theta$ is:\n",
"\n",
"$$J(\\theta) = \\dfrac{\\sum_{i=1}^n\\theta_i^4-16\\theta_i^2+5\\theta_i}{n}$$\n",
"\n",
"where $n$ is the number of dimensions in the data. For two dimensions, we can also express our cost function as:\n",
"\n",
"$$J(\\theta) = \\dfrac{\\theta_1^4-16\\theta_1^2+5\\theta_1+\\theta_2^4-16\\theta_2^2+5\\theta_2}{2}$$\n",
"\n",
"The global minimum of this function is $-78.33233$ at $\\theta = (-2.903534, -2.903534)$\n",
"\n",
"##### Gradient Function\n",
"\n",
"$$\\frac\\partial{\\partial\\theta_n}J(\\theta) = 2\\theta_n^3-16\\theta_n+2.5$$\n",
"\n",
"##### Plots\n",
"\n",
"The color scale of the surface corresponds to the z-axis value, which represents total cost, $J$, for all values of theta in the displayed range. The color scale of the points on the surface, which represent the total cost, $J_{\\theta_j}$, as function of $\\theta_j$ at each iteration of the model, corresponds to the iteration.\n",
"\n",
"The plots are also interactive, so you can spin them around and zoom in and out for more detail.\n",
"\n",
"### Constant Learning Rate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# Define general gradient descent algorithm\n",
"def gd(theta0, alpha, grad_obj, grad_adapt, delta_min, iters=1000):\n",
"\n",
" # Initialize theta and cost history for convergence testing and plot\n",
" theta_hist = np.zeros((iters, theta0.shape[0]+1))\n",
" theta_hist[0] = grad_obj.cost(theta0)\n",
" \n",
" # Create theta generator\n",
" theta_gen = grad_adapt(alpha, grad_obj.grad)(theta0)\n",
" \n",
" # Initialize iteration variables\n",
" delta = float(\"inf\")\n",
" i = 1\n",
" \n",
" # Run algorithm\n",
" while delta > delta_min:\n",
" # Get next theta\n",
" theta = next(theta_gen)\n",
" \n",
" # Test for convergence, store cost for plotting\n",
" try:\n",
" theta_hist[i] = grad_obj.cost(theta)\n",
" except:\n",
" print('{} minimum change in cost not achieved in {} iterations.'.format(delta_min, theta_hist.shape[0]))\n",
" break\n",
" delta = abs(theta_hist[i-1,-1] - theta_hist[i,-1])\n",
" i += 1\n",
" \n",
" # Trim zeros\n",
" theta_hist = theta_hist[np.nonzero(theta_hist[:,-1])]\n",
" return theta_hist\n",
"\n",
"# Define Gradient class used to pass gradient and cost generator functions to gd algorithm\n",
"class Gradient(object):\n",
" \n",
" def __init__(self, grad_fun, cost_fun, data=np.array([]), size=50):\n",
"\n",
" # Should return an array with the same dimensions as theta\n",
" self.grad_fun = grad_fun\n",
" \n",
" # Should return an array with cost appended to theta\n",
" self.cost_fun = cost_fun\n",
" \n",
" # Should have ones in first column if required\n",
" self.data = data\n",
" \n",
" self.size = size\n",
" \n",
" def batches_gen(data):\n",
" data = self.data\n",
" size = self.size\n",
" i = 0\n",
" while True:\n",
" index = slice(i*size, min((i+1)*size, data.shape[0]), 1)\n",
" if data.shape[0] - i * size > 0: \n",
" yield (data[index,:-1], data[index,-1])\n",
" i += 1\n",
" else:\n",
" np.random.shuffle(data)\n",
" i = 0\n",
"\n",
" self.batches = batches_gen(self.data)\n",
"\n",
" def grad_from_data(theta):\n",
" return self.grad_fun(theta, next(self.batches))\n",
" \n",
" def cost_from_data(theta):\n",
" return self.cost_fun(theta, self.data)\n",
"\n",
" if self.data.size > 1:\n",
" self.grad = grad_from_data\n",
" self.cost = cost_from_data\n",
" else:\n",
" self.grad = self.grad_fun\n",
" self.cost = self.cost_fun\n",
" \n",
"# Define generator function for constant alpha\n",
"# Not terribly useful here, but allows other gradient adaptations\n",
"def const_alpha(alpha, grad_fun):\n",
" \n",
" def theta_gen_const(theta):\n",
" while True:\n",
" theta = theta - alpha * grad_fun(theta)\n",
" yield theta\n",
" \n",
" return theta_gen_const\n",
"\n",
"# Define gradient and cost functions for testing (Stablyinski-Tang function)\n",
"def grad_fun_st(theta):\n",
" return np.apply_along_axis(lambda o: 2.5 - 16*o + 2*o**3, 0, theta)\n",
"\n",
"def cost_fun_st(theta):\n",
" return np.append(theta, np.sum(5*theta - 16*theta**2 + theta**4) / theta.shape[0])\n",
"\n",
"# Create Gradient instance for test function\n",
"stab_tang = Gradient(grad_fun_st, cost_fun_st)\n",
"\n",
"# Initialize model hyperparameters\n",
"delta_min = 10**-6\n",
"alpha = 0.01\n",
"\n",
"# Initialize starting value of theta\n",
"theta0 = np.array([-0.2, -4.4])\n",
"\n",
"# Run algorithm\n",
"theta_hist = gd(theta0, alpha, stab_tang, const_alpha, delta_min)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import plotly.offline as py\n",
"py.init_notebook_mode()\n",
"import plotly.graph_objs as go\n",
"\n",
"# Prepare plot\n",
"# Prepare surface\n",
"x = np.arange(-4.6, 4.6, 0.1)\n",
"y = np.arange(-4.6, 4.6, 0.1)\n",
"X, Y = np.meshgrid(x, y)\n",
"Z = 1/2.0 * (X**4 - 16*X**2 + 5*X + Y**4 - 16*Y**2 + 5*Y)\n",
"\n",
"# Prepare surface contours\n",
"contour = dict(\n",
" show = True,\n",
" color = 'DodgerBlue', #'#0066FF',\n",
" highlightcolor = 'DeepSkyBlue',\n",
" highlightwidth = 1.5,\n",
" width = 1\n",
")\n",
"\n",
"# Add surface to plot\n",
"surface = go.Surface(\n",
" name = 'J surface',\n",
" x = X,\n",
" y = Y,\n",
" z = Z,\n",
" colorscale = 'Rainbow',\n",
" showlegend = False,\n",
" contours = dict(\n",
" y = contour,\n",
" x = contour,\n",
" z = dict(\n",
" show = False,\n",
" color = contour['color'],\n",
" highlightcolor = contour['highlightcolor'],\n",
" highlightwidth = contour['highlightwidth'],\n",
" width = contour['width']\n",
" )\n",
" )\n",
")\n",
"\n",
"# Add theta_hist to plot - set up as a function for future plots\n",
"def spec_theta_hist_trace(theta_hist):\n",
" theta_hist_trace = go.Scatter3d(\n",
" name = 'theta_hist',\n",
" x = theta_hist[:,0],\n",
" y = theta_hist[:,1],\n",
" z = theta_hist[:,2],\n",
" mode = 'markers',\n",
" showlegend = False,\n",
" marker = dict(\n",
" color = np.arange(theta_hist.shape[0]),\n",
" colorscale = 'Blackbody',\n",
" showscale = False,\n",
" size = \"5\"\n",
" )\n",
" )\n",
" return theta_hist_trace\n",
"\n",
"# Specify layout options\n",
"layout = go.Layout(\n",
" title='Constant Alpha',\n",
" autosize=False,\n",
" width=700,\n",
" height=700,\n",
" scene=dict(\n",
" xaxis=dict(\n",
" title = 'theta1',\n",
" ticks = \"outside\",\n",
" dtick = 0.25,\n",
" showticklabels = False\n",
" ),\n",
" yaxis=dict(\n",
" title = 'theta2',\n",
" ticks = \"\",\n",
" dtick = 0.25,\n",
" showticklabels = False\n",
" ),\n",
" zaxis=dict(\n",
" title = 'J',\n",
" ),\n",
" camera=dict(\n",
" up=dict(x=0, y=0, z=1),\n",
" center=dict(x=0, y=0, z=0),\n",
" eye=dict(x=0.25, y=1.25, z=1.15)\n",
" )\n",
" )\n",
")\n",
"\n",
"# Execute plot\n",
"fig = go.Figure(data=[surface, spec_theta_hist_trace(theta_hist)], layout=layout)\n",
"py.iplot(fig, filename='constant_alpha_gradient_descent')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can see from this graph that the core gradient descent algorithm is functioning as expected. With a starting value of $\\theta=(-0.3,4.6)$, it produces a final $\\theta$ that minimizes total cost, $J$. If you spin the graph around (it's actually easiest to see looking at the bottom of the surface), you can see the steps in $\\theta$ the algorithm produced. Visually they appear to be in the direction of the greatest reduction in cost. The steps are also bigger where the gradient is steeper and smaller where it is flatter. The rate of change looks like it changed four times on the way to the global minimum.\n",
"\n",
"This algorithm's ability to solve for the global minimum of $J$ depends on $\\alpha$ and the starting value of $\\theta_0$. You can try different values of $\\alpha$ and $\\theta_0$ to see how they affect the algorithm's ability to successfully solve for $\\theta$ such that the global minimum cost, $J_{min}$, is achieved. Even in the scenarios where the algorithm fails to minimize cost, it appears to be functioning correctly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adaptive Gradient Algorithm (Adagrad)\n",
"\n",
"[Duchi, J., Hazan, E., and Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://stanford.edu/~jduchi/projects/DuchiHaSi10_colt.pdf)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Define gradient generator function for Adagrad\n",
"def adagrad(alpha, grad_fun):\n",
"\n",
" def theta_gen_adagrad(theta):\n",
" # Initialize gradient history and hyperparameter epsilon\n",
" grad_hist = 0\n",
" epsilon = 10**-8\n",
" \n",
" # Generate gradient\n",
" while True: \n",
" # Get gradient to adapt from gradient function\n",
" gradient = grad_fun(theta)\n",
"\n",
" # Perform gradient adaptation\n",
" grad_hist += np.square(gradient)\n",
" theta = theta - alpha * gradient / (epsilon + np.sqrt(grad_hist))\n",
" yield theta\n",
"\n",
" return theta_gen_adagrad\n",
"\n",
"alpha = 0.1\n",
"\n",
"# Run algorithm\n",
"theta_hist = gd(theta0, alpha, stab_tang, adagrad, delta_min)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Execute plot\n",
"layout.title = 'Adaptive Gradient Descent'\n",
"fig = go.Figure(data=[surface, spec_theta_hist_trace(theta_hist)], layout=layout)\n",
"py.iplot(fig, filename='adagrad_gradient_descent')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adaptive Moment Estimation (Adam)\n",
"\n",
"[Kingma, D. P., & Ba, J. L. (2015). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.](https://arxiv.org/pdf/1412.6980v8.pdf)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Define theta generator function for Adam\n",
"def adam(alpha, grad_fun):\n",
"\n",
" # Create theta generator\n",
" def theta_gen_adam(theta):\n",
" # Initialize moment variables and hyperparameters\n",
" moment1 = np.zeros(theta.shape[0])\n",
" moment2 = np.zeros(theta.shape[0])\n",
" beta1 = 0.9\n",
" beta2 = 0.999\n",
" epsilon = 10**-8\n",
" t = 1\n",
"\n",
" # Generate new theta\n",
" while True: \n",
" # Get gradient to adapt\n",
" gradient = grad_fun(theta)\n",
"\n",
" # Update moment estimates\n",
" moment1 = beta1 * moment1 + (1 - beta1) * gradient\n",
" moment2 = beta2 * moment2 + (1 - beta2) * np.square(gradient)\n",
" moment1_hat = moment1 / (1 - beta1**t)\n",
" moment2_hat = moment2 / (1 - beta2**t) \n",
"\n",
" # Yield adapted gradient\n",
" theta_new = theta - alpha * moment1_hat / (epsilon + np.sqrt(moment2_hat))\n",
" yield theta_new\n",
" t += 1\n",
" theta = theta_new\n",
" \n",
" return theta_gen_adam\n",
"\n",
"# Run algorithm\n",
"theta_hist = gd(theta0, alpha, stab_tang, adam, delta_min)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Execute plot\n",
"layout.title = 'Adaptive Moment Estimation'\n",
"fig = go.Figure(data=[surface, spec_theta_hist_trace(theta_hist)], layout=layout)\n",
"py.iplot(fig, filename='adam_gradient_descent')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Linear Regression\n",
"\n",
"We pass gradient_gen and cost_gen to Gradient object.\n",
"\n",
"We pass Gradient object to gd().\n",
"\n",
"gd() calls Gradient.grad(theta) to get next gradient.\n",
"\n",
"We want to pass data to grad_gen once and then be able to pass theta to it subsequently and have it return a gradient based on batches of the data we originally passed.\n",
"\n",
"Plan would be to set up grad_gen(theta, data) and then have a grad_decorator "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.001 minimum change in cost not achieved in 1000 iterations.\n",
"[ 614.85791208 264.89480007 1647.98649304]\n"
]
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Define gradient generator function\n",
"# batch is tuple of (X values, y values)\n",
"def grad_fun_linear(theta, batch):\n",
" X, y = batch\n",
" return X.T.dot(X.dot(theta) - y) / X.shape[0]\n",
"\n",
"def cost_fun_linear(theta, data):\n",
" X = data[:,:-1]\n",
" y = data[:,-1]\n",
" return np.append(theta, np.sum(np.square(X.dot(theta) - y)) / (2 * X.shape[0]))\n",
"\n",
"data = np.ones((100,3))\n",
"obj_theta = np.array([100, 10])\n",
"np.random.seed(4)\n",
"data[:,1:-1] = np.random.randint(101, size=(100,1))\n",
"data[:,-1] = data[:,:-1].dot(obj_theta) * np.random.uniform(0.85, 1.15, 100)\n",
"\n",
"traces = []\n",
"\n",
"plot_data = go.Scatter(\n",
" name = 'Training data',\n",
" x = data[:,1],\n",
" y = data[:,-1],\n",
" mode=\"markers\"\n",
")\n",
"\n",
"traces.append(plot_data)\n",
"\n",
"# data_raw = np.loadtxt(open(\"data/test_multi.txt\",\"rb\"),delimiter=\",\")\n",
"mu = np.mean(data[:,1:-1], axis=0)\n",
"sigma = np.std(data[:,1:-1], axis=0)\n",
"data[:,1:-1] = (data[:,1:-1] - mu) / sigma\n",
"\n",
"linear = Gradient(grad_fun_linear, cost_fun_linear, data, 5)\n",
"theta0 = np.zeros(2)\n",
"alpha = .05\n",
"delta_min = 10**-3\n",
"\n",
"# Run algorithm\n",
"theta_hist = gd(theta0, alpha, linear, const_alpha, delta_min)\n",
"print(theta_hist[-1])\n",
"\n",
"x = np.array([0, 100])\n",
"x_norm = np.ones((2, 2))\n",
"x_norm[:,1] = (np.array([0, 100]) - mu) / sigma\n",
"y = x_norm.dot(theta_hist[-1,:-1])\n",
"\n",
"plot_theta = go.Scatter(\n",
" name = 'Model',\n",
" x = x_norm[:,1],\n",
" y = y\n",
")\n",
"\n",
"layout = go.Layout(\n",
" title='Linear Regression'\n",
")\n",
"\n",
"traces.append(plot_theta)\n",
"fig = go.Figure(data=traces, layout=layout)\n",
"py.iplot(fig, filename='linear-data')\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"test_sequence (__main__.TestGradient) ... ok\n",
"test_sequence_100 (__main__.TestGradient) ... ok\n",
"test_shape (__main__.TestGradient) ... ok\n",
"test_with_data (__main__.TestGradient) ... ok\n",
"test_without_data (__main__.TestGradient) ... ok\n",
"\n",
"----------------------------------------------------------------------\n",
"Ran 5 tests in 0.005s\n",
"\n",
"OK\n"
]
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import unittest\n",
"\n",
"class TestGradient(unittest.TestCase):\n",
" \n",
" def setUp(self):\n",
" # Set up test data and theta\n",
" \n",
" self.test_data = np.ones((100,3))\n",
" for i in range(self.test_data.shape[0]):\n",
" self.test_data[i,1:] = np.array([i+1, i+1])\n",
" \n",
" self.theta0 = np.zeros(2)\n",
" \n",
" self.with_data = Gradient(grad_fun_linear, cost_fun_linear, self.test_data, 40)\n",
" \n",
" self.with_data_100 = Gradient(grad_fun_linear, cost_fun_linear, self.test_data, 100)\n",
" \n",
" self.with_data_110 = Gradient(grad_fun_linear, cost_fun_linear, self.test_data, 110)\n",
" \n",
" self.test_batches = self.with_data.batches\n",
" \n",
" self.test_batches_100 = self.with_data_100.batches\n",
" \n",
" self.test_batches_110 = self.with_data_110.batches\n",
" \n",
" self.without_data = Gradient(grad_fun_st, cost_fun_st)\n",
" \n",
" def tearDown(self):\n",
" self.test_data = None\n",
" self.theta0 = None\n",
" self.with_data = None\n",
" self.with_data_100 = None\n",
" self.with_data_110 = None\n",
" self.without_data = None\n",
" \n",
" # Test that batches return tuple of (m x n, m)\n",
" def test_shape(self):\n",
" batch = next(self.test_batches)\n",
" self.assertEqual(batch[0].shape[0], 40)\n",
" self.assertEqual(batch[0].shape[1], 2)\n",
" self.assertEqual(batch[1].shape[0], 40)\n",
" with self.assertRaises(IndexError):\n",
" batch[1].shape[1]\n",
"\n",
" # Test that batches returns correctly sequenced batches with size > m\n",
" def test_sequence(self):\n",
" self.assertEqual(next(self.test_batches)[0][-1,-1], 40)\n",
" self.assertEqual(next(self.test_batches)[0][-1,-1], 80)\n",
" self.assertEqual(next(self.test_batches)[0][-1,-1], 100)\n",
" self.assertEqual(next(self.test_batches)[0].shape[0], 40)\n",
" self.assertNotEqual(next(self.test_batches)[0][-1, -1], 80)\n",
"\n",
" # Test that batches returns correctly sequenced batches with size = m\n",
" def test_sequence_100(self):\n",
" self.assertEqual(next(self.test_batches_100)[0].shape[0], 100)\n",
" self.assertEqual(next(self.test_batches_100)[0].shape[0], 100)\n",
"\n",
" # Test that batches returns correctly sequenced batches with size > m\n",
" def test_sequence_100(self):\n",
" self.assertEqual(next(self.test_batches_100)[0].shape[0], 100)\n",
" self.assertEqual(next(self.test_batches_100)[0].shape[0], 100)\n",
"\n",
" # Test to make sure Gradient returns correct gradient from data\n",
" def test_with_data(self):\n",
" self.assertEqual(self.with_data.grad(self.theta0)[0],-20.5)\n",
" self.assertEqual(self.with_data.grad(self.theta0)[0],-60.5)\n",
" self.assertEqual(self.with_data.grad(self.theta0).size,2)\n",
" self.assertEqual(self.with_data.cost(self.theta0).size,3)\n",
" self.assertEqual(self.with_data.cost(self.theta0)[-1],1691.75)\n",
" \n",
" # Test to make sure Gradient returns correct gradient without\n",
" def test_without_data(self):\n",
" theta1 = self.without_data.grad(self.theta0)\n",
" self.assertEqual(theta1[0], 2.5)\n",
" self.assertEqual(self.without_data.grad(theta1)[0], -6.25)\n",
" \n",
"suite = unittest.TestLoader().loadTestsFromTestCase(TestGradient)\n",
"unittest.TextTestRunner(verbosity=2).run(suite)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Logistic Regression\n",
"\n",
"With logistic regression, we have a depedent variable that is classified, meaning it can only assume a limited number of finite values. In order to account for this we modify our cost function such that our predictions will always range between $0$ and $1$.\n",
"\n",
"$$\\begin{align*}& h_\\theta (x) = g ( \\theta^T x ) \\newline \\newline& z = \\theta^T x \\newline& g(z) = \\dfrac{1}{1 + e^{-z}}\\end{align*}$$"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0.33398824 0.93569619 0.83717276]\n"
]
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def sigmoid(z):\n",
" return 1 / (1 + np.exp(-z))\n",
"\n",
"def grad_fun_logistic(theta, batch):\n",
" X, y = batch\n",
" return (sigmoid(X.dot(theta)) - y).T.dot(X) / X.shape[0]\n",
"\n",
"def cost_fun_logistic(theta, data):\n",
" X = data[:,:-1]\n",
" y = data[:,-1]\n",
" sigm = sigmoid(X.dot(theta))\n",
" return np.append(theta, np.sum(-y.dot(np.log(sigm)) - (1 - y).dot(np.log(1 - sigm))) / data.shape[0])\n",
"\n",
"data = np.ones((100,4))\n",
"data[:,1:] = np.loadtxt(open(\"data/test_logistic.txt\",\"rb\"),delimiter=\",\")\n",
"\n",
"mu = np.mean(data[:,1:-1], axis=0)\n",
"sigma = np.std(data[:,1:-1], axis=0)\n",
"data[:,1:-1] = (data[:,1:-1] - mu) / sigma\n",
"\n",
"\n",
"logistic = Gradient(grad_fun_logistic, cost_fun_logistic, data, 100)\n",
"theta0 = np.zeros(3)\n",
"alpha = .05\n",
"delta_min = 10**-3\n",
"\n",
"# Run algorithm\n",
"theta_hist = gd(theta0, alpha, logistic, const_alpha, delta_min)\n",
"print(theta_hist[-1,:-1])\n",
"\n",
"traces = []\n",
"\n",
"plot_data = go.Scatter(\n",
" name = 'Training data',\n",
" x = data[:,1],\n",
" y = data[:,2],\n",
" mode=\"markers\",\n",
" marker=dict(\n",
" color=data[:,-1],\n",
" colorscale='Bluered',\n",
" )\n",
")\n",
"\n",
"traces.append(plot_data)\n",
"\n",
"# Plot decision boundary\n",
"x1 = np.ones((2,2))\n",
"x1[:,-1] = np.array([-1.8, 1.8])\n",
"x2 = x1.dot((theta_hist[-1,:-2] / -theta_hist[-1,-2]))\n",
"\n",
"plot_db = go.Scatter(\n",
" name = 'Decision boundary',\n",
" x = x1[:,-1],\n",
" y = x2,\n",
" mode=\"lines\"\n",
")\n",
"\n",
"traces.append(plot_db)\n",
"\n",
"layout = go.Layout(\n",
" title='Logistic Regression'\n",
")\n",
"\n",
"fig = go.Figure(data=traces, layout=layout)\n",
"py.iplot(fig, filename='logistic-regression')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"test_cost_fun (__main__.TestLogistic) ... ok\n",
"test_grad_fun (__main__.TestLogistic) ... ok\n",
"test_sigmoid (__main__.TestLogistic) ... ok\n",
"\n",
"----------------------------------------------------------------------\n",
"Ran 3 tests in 0.005s\n",
"\n",
"OK\n"
]
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class TestLogistic(unittest.TestCase):\n",
" \n",
" def setUp(self):\n",
" # Set up test data and theta\n",
" \n",
" self.data = np.ones((100,4))\n",
" self.data[:,1:] = np.loadtxt(open(\"data/test_logistic.txt\",\"rb\"),delimiter=\",\")\n",
"\n",
" mu = np.mean(self.data[:,1:-1], axis=0)\n",
" sigma = np.std(self.data[:,1:-1], axis=0)\n",
" data[:,1:-1] = (self.data[:,1:-1] - mu) / sigma\n",
"\n",
" self.logistic = Gradient(grad_fun_logistic, cost_fun_logistic, self.data, 5)\n",
" \n",
" self.batches = self.logistic.batches\n",
" \n",
" self.theta0 = np.zeros(3)\n",
" \n",
" def tearDown(self):\n",
" self.data = None\n",
" self.logistic = None\n",
" self.batches = None\n",
" self.theta0 = None\n",
" \n",
" def test_sigmoid(self):\n",
" sigm = sigmoid(next(self.batches)[0].dot(theta0))\n",
" self.assertEqual(sigm.shape[0], 5)\n",
" \n",
" def test_grad_fun(self):\n",
" self.assertEqual(self.logistic.grad(self.theta0).shape[0], 3)\n",
" \n",
" def test_cost_fun(self):\n",
" self.assertEqual(self.logistic.cost(self.theta0)[-1], 0.69314718055994518)\n",
" \n",
"suite = unittest.TestLoader().loadTestsFromTestCase(TestLogistic)\n",
"unittest.TextTestRunner(verbosity=2).run(suite)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda root]",
"language": "python",
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}