{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Consider a ridge regression problem as follows:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\text{minimize } \\ \\frac{1}{2} \\sum_{i=1}^n (w^T x_i - y_i)^2 + \\frac{\\lambda}{2} \\|w\\|^2$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This problem is a special case of a more general family of problems called *regularized empirical risk minimization*, where the objective function is usually comprised of two parts: a set of *loss terms* and a *regularization term*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we show how to use the package *SGDOptim* to solve such a problem. First, we have to prepare some simulation data:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 3.0\n", " -4.0\n", " 5.0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w = [3.0, -4.0, 5.0]; # the underlying model coefficients" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3x10000 Array{Float64,2}:\n", " 1.20754 -0.106163 1.2656 -0.195827 … 0.926012 1.33045 0.24129 \n", " 1.36097 -0.86756 1.05529 0.609302 1.11368 -1.57341 -1.165 \n", " -1.05901 0.648764 0.67563 0.276843 -0.262188 0.425759 -0.052067" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = 10000; X = randn(3, n); # generate 10000 sample features" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "10000-element Array{Float64,1}:\n", " -7.16593\n", " 6.56201\n", " 2.89612\n", " -1.69577\n", " 4.10703\n", " 11.8532 \n", " 14.9927 \n", " 3.83893\n", " -22.1367 \n", " -10.1685 \n", " -6.24833\n", " -3.17315\n", " 5.49029\n", " ⋮ \n", " 10.8841 \n", " 11.0071 \n", " -7.784 \n", " 12.7698 \n", " -1.27293\n", " -2.37607\n", " -5.78557\n", " -8.26047\n", " -9.313 \n", " -3.02949\n", " 12.331 \n", " 5.12746" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sig = 0.1; y = vec(w'X) + sig * randn(n); # generate the responses, adding some noise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's try to estimate $w$ from the data. This can be done by the following statement:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "using SGDOptim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step is to construct a risk model, which comprises a prediction model and a loss function." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "EmpiricalRisks.SupervisedRiskModel{EmpiricalRisks.LinearPred,EmpiricalRisks.SqrLoss}(EmpiricalRisks.LinearPred(3),EmpiricalRisks.SqrLoss())" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rmodel = riskmodel(LinearPred(3), # use linear prediction x -> w'x, 3 is the input dimension\n", " SqrLoss()) # use squared loss: loss(u, y) = (u - y)^2/2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we are ready to solve the problem:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter 100: avg.loss = 6.5156e-03\n", "Iter 200: avg.loss = 4.5992e-03\n", "Iter 300: avg.loss = 4.8039e-03\n", "Iter 400: avg.loss = 8.1380e-03\n", "Iter 500: avg.loss = 7.6406e-03\n", "Iter 600: avg.loss = 1.0267e-02\n", "Iter 700: avg.loss = 1.6335e-02\n", "Iter 800: avg.loss = 1.3647e-02\n", "Iter 900: avg.loss = 1.1578e-02\n", "Iter 1000: avg.loss = 1.0762e-02\n" ] }, { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 3.00284\n", " -3.99607\n", " 5.0021 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w_e = sgd(rmodel,\n", " zeros(3), # the initial guess\n", " minibatch_seq(X, y, 10), # supply the data in mini-batches, each with 10 samples\n", " reg = SqrL2Reg(1.0e-4), # add a squared L2 regression with coefficient 1.0e-4\n", " lrate = t->1.0/(100.0 + t), # set the rule of learning rate \n", " cbinterval = 100, # invoke the callback every 100 iterations\n", " callback = simple_trace) # print the optimization trace in callback" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that 10000 samples can be partitioned into 1000 minibatches of size 10. So there were 1000 iterations, each using a single minibatch.\n", "\n", "Now let's compare the estimated solution with the ground-truth:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5.582303599714307e-7" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sumabs2(w_e - w) / sumabs2(w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result looks quite accurate. We are done!" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.4.0-dev", "language": "julia", "name": "julia 0.4" }, "language_info": { "name": "julia", "version": "0.4.0" } }, "nbformat": 4, "nbformat_minor": 0 }