{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# pysgd\n", "\n", "The `pysgd` package is structured as a general gradient descent algorithm that accepts data, an objective function, a gradient descent adaptation and hyperparameters as its arguments. Below is the file structure of the package:\n", "\n", "```\n", "pysgd/\n", "|--__init__.py\n", "|--adaptations/\n", "| |--__init__.py\n", "| |--adagrad.py\n", "| |--adam.py\n", "| |--constant.py\n", "|--objectives/\n", "| |--__init__.py\n", "| |--linear.py\n", "| |--logistic.py\n", "| |--stab-tang.py\n", "|--tests/\n", "\n", "```\n", "The intention of this package is to present reasonbly efficient, working algorithms that are easy to understand.\n", "\n", "The package is structured to make it easy to add additional objective functions and gradient descent adaptations, by following the basic form of the existing ones and adding them into their respective folders.\n", "\n", "### Gradient Descent\n", "\n", "Gradient descent is a method for minimizing an objective function. In machine learning applications the objective function to be minimized is the error, $J$, (or cost) of a predictive model. A predictive model consists of a parameters, $\\theta$, that are applied to inputs, $X$, (also called training samples, features, observations or independent variables) in order to estimate an output, $\\hat{y}$ (also called a label or dependent variable). Gradient descent attempts to determine the parameters that when applied to a set of inputs result in the lowest total error (the difference between the actual outcome and the one predicted by the model). Below is the basic predictive formula.\n", "\n", "$$H(X,\\theta)=\\hat{y}$$\n", "\n", "Below is an illustrative formula for determining the cost of a model.\n", "\n", "$$J(\\theta) = \\sum_{i=1}^m\\mid{h_i(\\theta,x_i) - y_i}\\mid$$\n", "\n", "There are different formulas for computing cost depending on the application, but the formula above expresses the essence of predicting actual outcomes as closely as possible.\n", "\n", "In order to minimze $J$ with respect to $\\theta$, the algorithm starts with an abitrary value of $\\theta$, determines the \"direction\" that would result in the fastest decrease in cost (called the gradient), updates $\\theta$ in that direction by a small amount (called the learning rate or $\\alpha$) and then repeats until cost $J$ has been minimized.\n", "\n", "\n", "$$\\theta_j := \\theta_j - \\alpha\\triangledown\\theta_jJ(\\theta)$$\n", "\n", "or\n", "\n", "$$\\theta_j := \\theta_j - \\alpha\\frac\\partial{\\partial\\theta_j}J(\\theta)$$\n", "\n", "### API\n", "\n", "The package has one main function, `sgd`, that returns a $j$ x $(n + 2)$ array, where $j$ is the number of iterations and $n$ is the number of features in the data. $\\theta_j$ is in the first $n + 1$ columns and the cost $J_j$ in the last column.\n", "\n", "|Argument |Definition |\n", "|------------------|----------------------------------------------------------------------------------------------|\n", "|`theta0` |Starting value of $\\theta$ ($\\theta_0$) in the form of an $1$ x$(n + 1)$ array. |\n", "|`obj='stab_tang'` |Objective function to be minimized in the form of a string with a value of `stab_tang`, `linear` or `logistic`. `stab_tang` is for the [Stablinsky-Tang function](https://en.wikipedia.org/wiki/Test_functions_for_optimization), included for testing and illustrative purposes. |\n", "|`adapt='constant'`|Gradient descent adaptation in the form of a string with a value of `constant`, `adagrad` or `adam`. |\n", "|`data=np.array([])`|Data in the form of an $m$ x $(n+1)$ array, including `ones` in the first column, if necessary, where $m$ is the number of training observations. |\n", "|`size=50` |Batch size in the form of an integer between $1$ and $m$. Batches are generated contiguously over the data. Data is shuffled between cycles. |\n", "|`alpha=.01` |Learning rate $\\alpha$ in the form of a floating point integer. |\n", "|`epsilon=10**-8` |Hyperparameter used by `adagrad` and `adam` for smoothing. |\n", "|`beta1=0.9` |Hyperparamter used by `adam` that controls the decay rates of the moving gradient averages. |\n", "|`beta2=0.999` |Hyperparamter used by `adam` that controls the decay rates of the moving gradient averages. |\n", "|`delta_min=10**-6`|Maximum change in $\\theta_n$ to establish convergence in the form of a floating point integer.|\n", "|`iters=1000` |Maximum number of batches evaluated unless convergence is achieved in fewer iterations. |\n", "\n", "#### Testing\n", "\n", "Tests are in the `tests` folder using [pytest](http://doc.pytest.org/en/latest/index.html), with 100% coverage.\n", "\n", "In addition to sample data sets, we also use the [Stablinsky-Tang function](https://en.wikipedia.org/wiki/Test_functions_for_optimization) for testing, which is non-convex, suitable for testing, with straightforward gradient computation. This allows us to compare the value of $\\theta$ produced by each algorithm and its associated $J$ with values we can calculate directly. By using a known function with two dimensional inputs we can plot $J$ as a surface for a given range of $\\theta$ values and then $J_\\theta$ for each iteration of the algorithm to visualize the progression of the algorithms.\n", "\n", "The Styblinski–Tang function with respect to $\\theta$ is:\n", "\n", "$$J(\\theta) = \\dfrac{\\sum_{i=1}^n\\theta_i^4-16\\theta_i^2+5\\theta_i}{n}$$\n", "\n", "where $n$ is the number of dimensions in the data. For two dimensions, we can also express our cost function as:\n", "\n", "$$J(\\theta) = \\dfrac{\\theta_1^4-16\\theta_1^2+5\\theta_1+\\theta_2^4-16\\theta_2^2+5\\theta_2}{2}$$\n", "\n", "The global minimum of this function is $-78.33233$ at $\\theta = (-2.903534, -2.903534)$\n", "\n", "The Styblinski–Tang gradient function is:\n", "\n", "$$\\frac\\partial{\\partial\\theta_n}J(\\theta) = 2\\theta_n^3-16\\theta_n+2.5$$\n", "\n", "The color scale of the surface plots corresponds to the z-axis value, which represents cost $J$ for all values of $\\theta$ in the displayed range. The color scale of the points on the surface, which represent the cost $J_{\\theta_j}$ as a function of $\\theta_j$ at each iteration of the model, corresponds to the iteration.\n", "\n", "### pysgd" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/markdown": [ "``` python\n", "import numpy as np\n", "import importlib\n", "from pysgd.objectives import Objective\n", "\n", "# Define general gradient descent algorithm\n", "def sgd(\n", " theta0,\n", " obj='stab_tang',\n", " adapt='constant',\n", " data=np.array([]),\n", " size=50,\n", " alpha=.01,\n", " epsilon=10**-8,\n", " beta1=0.9,\n", " beta2=0.999,\n", " delta_min=10**-6,\n", " iters=1000):\n", "\n", " # Initialize gradient adaptation parameters\n", " params = dict(\n", " alpha=alpha,\n", " epsilon=epsilon,\n", " beta1=beta1,\n", " beta2=beta2\n", " )\n", "\n", " # Initialize cost and gradient functions\n", " obj_fun = Objective(obj, data, size)\n", "\n", " # Initialize gradient adaptation.\n", " grad_adapt = importlib.import_module('pysgd.adaptations.' + adapt).adapt\n", "\n", " # Initialize theta and cost history for convergence testing and plot\n", " theta_hist = np.zeros((iters, theta0.shape[0]+1))\n", " theta_hist[0] = obj_fun.cost(theta0)\n", "\n", " # Initialize theta generator\n", " theta_gen = grad_adapt(params, obj_fun.grad)(theta0)\n", "\n", " # Initialize iteration variables\n", " delta = float(\"inf\")\n", " i = 1\n", "\n", " # Run algorithm\n", " while delta > delta_min:\n", " # Get next theta\n", " theta = next(theta_gen)\n", "\n", " # Store cost for plotting, test for convergence\n", " try:\n", " theta_hist[i] = obj_fun.cost(theta)\n", " except:\n", " print('{} minimum change in theta not achieved in {} iterations.'\n", " .format(delta_min, theta_hist.shape[0]))\n", " break\n", " delta = np.max(np.square(theta - theta_hist[i-1,:-1]))**0.5\n", "\n", " i += 1\n", " # Trim zeros and return\n", " theta_hist = theta_hist[:i]\n", " return theta_hist\n", "\n", "```" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pysgd\n", "import inspect\n", "\n", "# Define function to display code.\n", "def disp_mod(mod):\n", " code, line_no = inspect.getsourcelines(mod)\n", " from IPython.display import Markdown, display\n", " def printmd(string):\n", " display(Markdown('``` python\\n' + string + '```'))\n", " printmd(''.join(code))\n", "\n", "disp_mod(pysgd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pysgd.objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Objective` class in `sgd/objectives/__init__.py` is where the objective functions, data and gradient adaptations are handled." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/markdown": [ "``` python\n", "import importlib\n", "import numpy as np\n", "from os.path import dirname, basename, isfile\n", "import glob\n", "modules = glob.glob(dirname(__file__)+\"/*.py\")\n", "__all__ = [ basename(f)[:-3] for f in modules if isfile(f)]\n", "\n", "class Objective(object):\n", "\n", " def __init__(self, obj, data, size):\n", "\n", " obj = importlib.import_module('pysgd.objectives.' + obj)\n", "\n", " def batches_gen(data=data, size=size):\n", " i = 0\n", " while True:\n", " index = slice(i*size, min((i+1)*size, data.shape[0]), 1)\n", " if data.shape[0] - i * size > 0:\n", " yield (data[index,:-1], data[index,-1])\n", " i += 1\n", " else:\n", " np.random.shuffle(data)\n", " i = 0\n", "\n", " self.batches = batches_gen()\n", "\n", " def grad_from_data(theta):\n", " return obj.grad_fun(theta, next(self.batches))\n", "\n", " def cost_from_data(theta):\n", " return obj.cost_fun(theta, data)\n", "\n", " if data.size > 1:\n", " self.grad = grad_from_data\n", " self.cost = cost_from_data\n", " else:\n", " self.grad = obj.grad_fun\n", " self.cost = obj.cost_fun\n", "```" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "disp_mod(pysgd.objectives)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Contstant Alpha" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run gradient descent algorithm." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "\n", "theta_hist = pysgd.sgd(theta0=np.array([-0.2, -4.4]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot $J_j$ for each $\\theta_j$." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false, "hide_input": true }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly.offline as py\n", "py.init_notebook_mode()\n", "import plotly.graph_objs as go\n", "\n", "# Prepare plot\n", "x = np.arange(-4.6, 4.6, 0.1)\n", "y = np.arange(-4.6, 4.6, 0.1)\n", "X, Y = np.meshgrid(x, y)\n", "Z = 1/2.0 * (X**4 - 16*X**2 + 5*X + Y**4 - 16*Y**2 + 5*Y)\n", "\n", "# Prepare surface contours\n", "contour = dict(\n", " show = True,\n", " color = 'DodgerBlue', #'#0066FF',\n", " highlightcolor = 'DeepSkyBlue',\n", " highlightwidth = 1.5,\n", " width = 1\n", ")\n", "\n", "# Add surface to plot\n", "surface = go.Surface(\n", " name = 'J surface',\n", " x = X,\n", " y = Y,\n", " z = Z,\n", " colorscale = 'Rainbow',\n", " showlegend = False,\n", " contours = dict(\n", " y = contour,\n", " x = contour,\n", " z = dict(\n", " show = False,\n", " color = contour['color'],\n", " highlightcolor = contour['highlightcolor'],\n", " highlightwidth = contour['highlightwidth'],\n", " width = contour['width']\n", " )\n", " )\n", ")\n", "\n", "# Add theta_hist to plot - set up as a function for future plots\n", "def spec_theta_hist_trace(theta_hist):\n", " theta_hist_trace = go.Scatter3d(\n", " name = 'theta_hist',\n", " x = theta_hist[:,0],\n", " y = theta_hist[:,1],\n", " z = theta_hist[:,2],\n", " mode = 'markers',\n", " showlegend = False,\n", " marker = dict(\n", " color = np.arange(theta_hist.shape[0]),\n", " colorscale = 'Blackbody',\n", " showscale = False,\n", " size = \"5\"\n", " )\n", " )\n", " return theta_hist_trace\n", "\n", "# Specify layout options\n", "layout = go.Layout(\n", " title='Constant Alpha',\n", " autosize=False,\n", " width=700,\n", " height=700,\n", " scene=dict(\n", " xaxis=dict(\n", " title = 'theta1',\n", " ticks = \"outside\",\n", " dtick = 0.25,\n", " showticklabels = False\n", " ),\n", " yaxis=dict(\n", " title = 'theta2',\n", " ticks = \"\",\n", " dtick = 0.25,\n", " showticklabels = False\n", " ),\n", " zaxis=dict(\n", " title = 'J',\n", " ),\n", " camera=dict(\n", " up=dict(x=0, y=0, z=1),\n", " center=dict(x=0, y=0, z=0),\n", " eye=dict(x=0.25, y=1.25, z=1.15)\n", " )\n", " )\n", ")\n", "\n", "# Execute plot\n", "fig = go.Figure(data=[surface, spec_theta_hist_trace(theta_hist)], layout=layout)\n", "py.iplot(fig, filename='constant_alpha_gradient_descent')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The core gradient descent algorithm functions as expected. With a starting value of $\\theta=(-0.2,4.4)$, it produces a final $\\theta$ that minimizes total cost, $J$. If you spin the graph around (it's actually easiest to see looking at the bottom of the surface), you can see the steps in $\\theta$ the algorithm produced.\n", "\n", "Gradient descent's ability to solve for the global minimum of $J$ depends on the learning rate, $\\alpha$ and the starting value of $\\theta_0$. You can try different values of $\\alpha$ and $\\theta_0$ to see how they affect the algorithm's ability to successfully solve for the $\\theta$ that achieves the global minimum cost." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adaptive Gradient Algorithm (Adagrad)\n", "\n", "[Duchi, J., Hazan, E., and Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://stanford.edu/~jduchi/projects/DuchiHaSi10_colt.pdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pysgd.adaptations.adagrad" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/markdown": [ "``` python\n", "import numpy as np\n", "\n", "# Define theta generator function for Adagrad\n", "# [Duchi, J., Hazan, E., and Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://stanford.edu/~jduchi/projects/DuchiHaSi10_colt.pdf)\n", "def adapt(params, grad_fun):\n", "\n", " def theta_gen_adagrad(theta):\n", " # Initialize hyperparameters and gradient history\n", " alpha = params['alpha']\n", " epsilon = params['epsilon']\n", " grad_hist = 0\n", "\n", " # Generate adapted theta\n", " while True:\n", " # Get gradient\n", " gradient = grad_fun(theta)\n", "\n", " # Perform gradient adaptation\n", " grad_hist += np.square(gradient)\n", " theta = theta - alpha * gradient / (epsilon + np.sqrt(grad_hist))\n", " yield theta\n", "\n", " return theta_gen_adagrad\n", "```" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pysgd.adaptations import adagrad\n", "disp_mod(adagrad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run gradient descent algorithm." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "273 [ -2.90353403 -2.90353404 -78.33233141]\n" ] } ], "source": [ "theta_hist = pysgd.sgd(\n", " theta0=np.array([-0.2, -4.4]),\n", " alpha=0.3,\n", " adapt='adagrad',\n", " delta_min=10**-9\n", ")\n", "print(theta_hist.shape[0], theta_hist[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot $J_j$ for each $\\theta_j$." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "layout.title = 'Adaptive Gradient Descent'\n", "fig = go.Figure(data=[surface, spec_theta_hist_trace(theta_hist)], layout=layout)\n", "py.iplot(fig, filename='adagrad_gradient_descent')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adaptive Moment Estimation (Adam)\n", "\n", "[Kingma, D. P., & Ba, J. L. (2015). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.](https://arxiv.org/pdf/1412.6980v8.pdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pysgd.adaptations.adam" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/markdown": [ "``` python\n", "import numpy as np\n", "\n", "# Define theta generator function for Adam\n", "# [Kingma, D. P., & Ba, J. L. (2015). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.](https://arxiv.org/pdf/1412.6980v8.pdf)\n", "def adapt(params, grad_fun):\n", "\n", " # Define theta generator\n", " def theta_gen_adam(theta):\n", " # Initialize hyperparameters, moment and iteration variables\n", " alpha = params['alpha']\n", " beta1 = params['beta1']\n", " beta2 = params['beta2']\n", " epsilon = params['epsilon']\n", " moment1 = np.zeros(theta.shape[0])\n", " moment2 = np.zeros(theta.shape[0])\n", " t = 1\n", "\n", " # Generate new theta\n", " while True:\n", " # Get gradient\n", " gradient = grad_fun(theta)\n", "\n", " # Update moment estimates\n", " moment1 = beta1 * moment1 + (1 - beta1) * gradient\n", " moment2 = beta2 * moment2 + (1 - beta2) * np.square(gradient)\n", " moment1_hat = moment1 / (1 - beta1**t)\n", " moment2_hat = moment2 / (1 - beta2**t)\n", "\n", " # Yield adapted gradient\n", " theta = (theta - alpha * (1 - beta2**t)**0.5 / (1 - beta1**t)\n", " * moment1 / (epsilon + np.sqrt(moment2)))\n", " # theta = theta - alpha * moment1_hat / (epsilon + np.sqrt(moment2_hat))\n", " yield theta\n", " t += 1\n", "\n", " return theta_gen_adam\n", "```" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pysgd.adaptations import adam\n", "disp_mod(adam)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run gradient descent." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "241 [ -2.90357641 -2.90354339 -78.33233137]\n" ] } ], "source": [ "theta_hist = pysgd.sgd(theta0=np.array([-0.2, -4.4]), alpha=0.03, adapt='adam')\n", "print(theta_hist.shape[0], theta_hist[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot $J_j$ for each $\\theta_j$." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "layout.title = 'Adaptive Moment Estimation'\n", "fig = go.Figure(data=[surface, spec_theta_hist_trace(theta_hist)], layout=layout)\n", "py.iplot(fig, filename='adam_gradient_descent')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pysgd.objectives.linear" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/markdown": [ "``` python\n", "import numpy as np\n", "\n", "def grad_fun(theta, batch):\n", " X, y = batch\n", " return X.T.dot(X.dot(theta) - y) / X.shape[0]\n", "\n", "def cost_fun(theta, data):\n", " X = data[:,:-1]\n", " y = data[:,-1]\n", " return np.append(theta, np.sum(np.square(X.dot(theta) - y)) / (2 * X.shape[0]))\n", "```" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pysgd.objectives import linear\n", "disp_mod(linear)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate sample linear regression data." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = np.ones((100,3))\n", "obj_theta = np.array([100, 10])\n", "np.random.seed(4)\n", "data[:,1:-1] = np.random.randint(101, size=(100,1))\n", "data[:,-1] = data[:,:-1].dot(obj_theta) * np.random.uniform(0.85, 1.15, 100)\n", "mu = np.mean(data[:,1:-1], axis=0)\n", "sigma = np.std(data[:,1:-1], axis=0)\n", "data[:,1:-1] = (data[:,1:-1] - mu) / sigma" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run gradient descent algorithm." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.001 minimum change in theta not achieved in 1000 iterations.\n", "1000 [ 616.21128784 265.36208775 1646.94767019]\n" ] } ], "source": [ "theta_hist = pysgd.sgd(\n", " theta0=np.array([-0.2, -4.4]),\n", " alpha=0.05,\n", " obj='linear',\n", " data=data, delta_min=10**-3\n", ")\n", "print(theta_hist.shape[0], theta_hist[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot sample regression data with model parameters." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_data = go.Scatter(\n", " name = 'Training data',\n", " x = data[:,1],\n", " y = data[:,-1],\n", " mode=\"markers\"\n", ")\n", "\n", "x_norm = np.ones((2, 2))\n", "x_norm[:,1] = (np.array([0, 100]) - mu) / sigma\n", "y = x_norm.dot(theta_hist[-1,:-1])\n", "\n", "plot_theta = go.Scatter(\n", " name = 'Model',\n", " x = x_norm[:,1],\n", " y = y,\n", " mode='lines'\n", ")\n", "\n", "layout = go.Layout(\n", " title='Linear Regression'\n", ")\n", "\n", "fig = go.Figure(data=[plot_data, plot_theta], layout=layout)\n", "py.iplot(fig, filename='linear-data')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Logistic Regression\n", "\n", "With logistic regression, the dependent variable is classified, meaning it can only assume a limited number of finite values. To account for this, the cost function is modified such that predictions are always between 0 and 1.\n", "\n", "$$z = \\theta^T x$$\n", "\n", "$$g(z) = \\dfrac{1}{1 + e^{-z}}$$\n", "\n", "$$0< h_\\theta (x) = g (z) < 1$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pysgd.objectives.logistic" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/markdown": [ "``` python\n", "import numpy as np\n", "\n", "def sigmoid(z):\n", " return 1 / (1 + np.exp(-z))\n", "\n", "def grad_fun(theta, batch):\n", " X, y = batch\n", " return (sigmoid(X.dot(theta)) - y).T.dot(X) / X.shape[0]\n", "\n", "def cost_fun(theta, data):\n", " X = data[:,:-1]\n", " y = data[:,-1]\n", " sigm = sigmoid(X.dot(theta))\n", " return np.append(theta, np.sum(-y.dot(np.log(sigm)) - (1 - y).dot(np.log(1 - sigm))) / data.shape[0])\n", "```" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pysgd.objectives import logistic\n", "disp_mod(logistic)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate sample logistic regression data." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = np.ones((100,4))\n", "data[:,1:] = np.loadtxt(open(\"pysgd/tests/logistic.txt\",\"rb\"),delimiter=\",\")\n", "\n", "mu = np.mean(data[:,1:-1], axis=0)\n", "sigma = np.std(data[:,1:-1], axis=0)\n", "data[:,1:-1] = (data[:,1:-1] - mu) / sigma" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run gradient descent algorithm." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1e-06 minimum change in theta not achieved in 1000 iterations.\n", "1000 [ 1.00304487 2.4903719 2.27781043 0.22446381]\n" ] } ], "source": [ "theta_hist = pysgd.sgd(theta0=np.zeros(3), alpha=0.05, obj='logistic', data=data, delta_min=10**-6)\n", "print(theta_hist.shape[0], theta_hist[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot training data and decision boundary." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_data = go.Scatter(\n", " name = 'Training data',\n", " x = data[:,1],\n", " y = data[:,2],\n", " mode=\"markers\",\n", " marker=dict(\n", " color=data[:,-1],\n", " colorscale='Bluered',\n", " )\n", ")\n", "\n", "# Plot decision boundary\n", "x1 = np.ones((2,2))\n", "x1[:,-1] = np.array([-1.8, 1.8])\n", "x2 = x1.dot((theta_hist[-1,:-2] / -theta_hist[-1,-2]))\n", "\n", "plot_db = go.Scatter(\n", " name = 'Decision boundary',\n", " x = x1[:,-1],\n", " y = x2,\n", " mode=\"lines\"\n", ")\n", "\n", "layout = go.Layout(\n", " title='Logistic Regression'\n", ")\n", "\n", "fig = go.Figure(data=[plot_data, plot_db], layout=layout)\n", "py.iplot(fig, filename='logistic-regression')" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda root]", "language": "python", "name": "conda-root-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }