{"worksheets": [{"cells": [{"source": ["Linear Regression and Kernel Methods", "====================================", "", "*Important:* Please read the [installation page](http://gpeyre.github.io/numerical-tours/installation_matlab/) for details about how to install the toolboxes.", "$\\newcommand{\\dotp}[2]{\\langle #1, #2 \\rangle}$", "$\\newcommand{\\enscond}[2]{\\lbrace #1, #2 \\rbrace}$", "$\\newcommand{\\pd}[2]{ \\frac{ \\partial #1}{\\partial #2} }$", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$", "$\\newcommand{\\umax}[1]{\\underset{#1}{\\max}\\;}$", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$", "$\\newcommand{\\uargmin}[1]{\\underset{#1}{argmin}\\;}$", "$\\newcommand{\\norm}[1]{\\|#1\\|}$", "$\\newcommand{\\abs}[1]{\\left|#1\\right|}$", "$\\newcommand{\\choice}[1]{ \\left\\{ \\begin{array}{l} #1 \\end{array} \\right. }$", "$\\newcommand{\\pa}[1]{\\left(#1\\right)}$", "$\\newcommand{\\diag}[1]{{diag}\\left( #1 \\right)}$", "$\\newcommand{\\qandq}{\\quad\\text{and}\\quad}$", "$\\newcommand{\\qwhereq}{\\quad\\text{where}\\quad}$", "$\\newcommand{\\qifq}{ \\quad \\text{if} \\quad }$", "$\\newcommand{\\qarrq}{ \\quad \\Longrightarrow \\quad }$", "$\\newcommand{\\ZZ}{\\mathbb{Z}}$", "$\\newcommand{\\CC}{\\mathbb{C}}$", "$\\newcommand{\\RR}{\\mathbb{R}}$", "$\\newcommand{\\EE}{\\mathbb{E}}$", "$\\newcommand{\\Zz}{\\mathcal{Z}}$", "$\\newcommand{\\Ww}{\\mathcal{W}}$", "$\\newcommand{\\Vv}{\\mathcal{V}}$", "$\\newcommand{\\Nn}{\\mathcal{N}}$", "$\\newcommand{\\NN}{\\mathcal{N}}$", "$\\newcommand{\\Hh}{\\mathcal{H}}$", "$\\newcommand{\\Bb}{\\mathcal{B}}$", "$\\newcommand{\\Ee}{\\mathcal{E}}$", "$\\newcommand{\\Cc}{\\mathcal{C}}$", "$\\newcommand{\\Gg}{\\mathcal{G}}$", "$\\newcommand{\\Ss}{\\mathcal{S}}$", "$\\newcommand{\\Pp}{\\mathcal{P}}$", "$\\newcommand{\\Ff}{\\mathcal{F}}$", "$\\newcommand{\\Xx}{\\mathcal{X}}$", "$\\newcommand{\\Mm}{\\mathcal{M}}$", "$\\newcommand{\\Ii}{\\mathcal{I}}$", "$\\newcommand{\\Dd}{\\mathcal{D}}$", "$\\newcommand{\\Ll}{\\mathcal{L}}$", "$\\newcommand{\\Tt}{\\mathcal{T}}$", "$\\newcommand{\\si}{\\sigma}$", "$\\newcommand{\\al}{\\alpha}$", "$\\newcommand{\\la}{\\lambda}$", "$\\newcommand{\\ga}{\\gamma}$", "$\\newcommand{\\Ga}{\\Gamma}$", "$\\newcommand{\\La}{\\Lambda}$", "$\\newcommand{\\si}{\\sigma}$", "$\\newcommand{\\Si}{\\Sigma}$", "$\\newcommand{\\be}{\\beta}$", "$\\newcommand{\\de}{\\delta}$", "$\\newcommand{\\De}{\\Delta}$", "$\\newcommand{\\phi}{\\varphi}$", "$\\newcommand{\\th}{\\theta}$", "$\\newcommand{\\om}{\\omega}$", "$\\newcommand{\\Om}{\\Omega}$", "$\\newcommand{\\eqdef}{\\equiv}$"], "cell_type": "markdown", "metadata": {}}, {"source": ["This tour studies linear regression method in conjunction with", "regularization.", "It contrasts ridge regression and the Lasso.", "It also presents its non-linear variant using kernlization."], "cell_type": "markdown", "metadata": {}}, {"source": ["We recommend that after doing this Numerical Tours, you apply it to your", "own data, for instance using a dataset from .", "", "_Disclaimer:_ these machine learning tours are intended to be", "overly-simplistic implementations and applications of baseline machine learning methods.", "For more advanced uses and implementations, we recommend", "to use a state-of-the-art library, the most well known being", ""], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["addpath('toolbox_general')", "addpath('solutions/ml_2_regression')"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Dataset Loading", "---------------", "We test the method on the prostate dataset in $n=97$ samples with", "features $x_i \\in \\RR^p$ in dimension $p=8$. The goal is to predict the price value", "$y_i \\in \\RR$.", "", "", "Helpers."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["SetAR = @(ar)set(gca, 'PlotBoxAspectRatio', [1 ar 1], 'FontSize', 20);", "Xm = @(X)X-repmat(mean(X,1), [size(X,1) 1]);", "Cov = @(X)Xm(X)'*Xm(X);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Load the dataset."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["name = 'prostate';", "load(['ml-' name]);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Randomly permute it."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["A = A(randperm(size(A,1)),:);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Separate the features $X$ from the data $y$ to predict information."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["X = A(:,1:end-2);", "y = A(:,end-1);", "c = A(:,end);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["$n$ is the number of samples, $p$ is the dimensionality of the features,"], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["[n,p] = size(X);", "fprintf('n=%d, p=%d\\n', n,p);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Split into training and testing."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["I0 = find(c==1); % train", "I1 = find(c==0); % test", "n0 = length(I0); n1 = n-n0;", "X0 = X(I0,:); y0 = y(I0);", "X1 = X(I1,:); y1 = y(I1);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Normalize the features by the mean and std of the *training* set.", "This is optional."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["RepSub = @(X,u)X-repmat(u, [size(X,1) 1]);", "RepDiv = @(X,u)X./repmat(u, [size(X,1) 1]);", "mX0 = mean(X0); sX0 = std(X0);", "X0 = RepSub(X0,mX0); X1 = RepSub(X1,mX0);", "X0 = RepDiv(X0,sX0); X1 = RepDiv(X1,sX0);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Remove the mean (computed from the *test* set) to avoid introducing a bias term and a constant regressor.", "This is optional."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["m0 = mean(y0);", "y0 = y0-m0;", "y1 = y1-m0;"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Dimenionality Reduction and PCA", "-------------------------------", "In order to display in 2-D or 3-D the data, dimensionality is needed.", "The simplest method is the principal component analysis, which perform an", "orthogonal linear projection on the principal axsis (eigenvector) of the", "covariance matrix.", "", "", "Display the covariance matrix of the training set."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["clf;", "imagesc(Cov(X0));"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Display the covariance between the data and the regressors."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["clf;", "bar(X0'*y0);", "axis tight;", "SetAR(1/2);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Compute PCA ortho-basis and", "the feature in the PCA basis."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["[U,D,V] = svd(Xm(X0),'econ');", "Z = Xm(X0) * V;"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Plot sqrt of the eigenvalues."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["clf; ", "plot(diag(D), '.-', 'LineWidth', 2, 'MarkerSize', 30);", "axis tight;", "SetAR(1/2);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Display the features."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["pmax = min(p,8);", "k = 0;", "clf;", "for i=1:pmax", " for j=1:pmax", " k = k+1;", " subplot(pmax,pmax,k);", " if i==j", " hist(X0(:,i),6);", " axis tight;", " else", " plot(X0(:,j),X0(:,i), '.');", " axis tight;", " end", " set(gca, 'XTick', [], 'YTick', [] );", " axis tight;", " if i==1", " title(class_names{j});", " end", " end", "end"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Display the points cloud of feature vectors in 3-D PCA space."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["options.disp_dim = 3;", "clf; plot_multiclasses(X,ones(n,1),options);", "SetAR(1);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["1D plot of the function to regress along the main eigenvector axes."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["col = {'b' 'g' 'r' 'c' 'm' 'y' 'k'};", "clf;", "for i=1:min(p,3)", " subplot(3,1,i);", " plot(Z(:,i), y0, '.', 'Color', col{i}, 'MarkerSize', 20);", " axis tight;", "end"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Linear Regression", "-----------------", "We look for a linear relationship", " $y_i = \\dotp{w}{x_i}$", "written in matrix format", " $y= X w$", "where the rows of $X \\in \\RR^{n \\times p}$ stores the features $x_i \\in \\RR^p$.", "", "", "Since here $n > p$, this is an over-determined system, which can", "solved in the least square sense", " $$\\umin{ w } \\norm{Xw-y}^2$$", "whose solution is given using the Moore-Penrose pseudo-inverse", " $$w = (X^\\top X)^{-1} X^\\top y$$", "", "", "Least square solution."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["w = (X0'*X0) \\ (X0'*y0);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Prediction (along 1st eigenvector)."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["clf;", "plot( [y1 X1*w], '.-', 'MarkerSize', 20);", "axis tight;", "legend('y', 'X_1 w');"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Mean-square error on testing set."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["E = norm(X1*w-y1) / norm(y1);", "fprintf('Relative prediction error: %.3f\\n', E);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Regularization is obtained by introducing a penalty. It is often called", "ridge regression, and is defined as", " $$\\umin{ w } \\norm{Xw-y}^2 + \\lambda \\norm{w}^2$$", "where $\\lambda>0$ is the regularization parameter.", "", "", "The solution is given using the following equivalent formula", " $$w = (X^\\top X + \\lambda \\text{Id}_p )^{-1} X^\\top y,$$", " $$w = X^\\top ( XX^\\top + \\lambda \\text{Id}_n)^{-1} y,$$", "When $p0$ is crucial and controls the locality of", "the model. It is typically tuned through cross validation."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["sigma = .3;", "kappa = @(X,Z)exp( -distmat(X,Z)/(2*sigma^2) );"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Once avaluated on grid points, the kernel define a matrix", "$$K = (\\kappa(x_i,x_j))_{i,j=1}^n \\in \\RR^{n \\times n}.$$"], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["K = kappa(X,X);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["The weights $h \\in \\RR^n$ are solutions of", " $$\\umin{h} \\norm{Kh-y}^2 + \\la \\dotp{Kh}{h}$$", "and hence can be computed by solving a linear system", " $$h = (K+\\la \\text{Id}_n)^{-1} y$$"], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["lambda = 0.01;", "h = (K+lambda*eye(n))\\y;"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Regressor."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["Y = @(x)kappa(x,X)*h;"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Evaluation on a 2D grid."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["q = 101;", "t = linspace(-B,B,q);", "[v,u] = meshgrid(t,t);", "Xn = [u(:), v(:)];"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["Display as an image."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["yn = reshape(Y(Xn),[q,q]);", "clf;", "imagesc(t,t,yn); axis image; axis off; ", "colormap jet(256);"], "outputs": [], "language": "python", "metadata": {}}, {"source": ["__Exercise 7__", "", "Display the evolution of the regression as a function of $\\sigma$."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["exo7()"], "outputs": [], "language": "python", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["%% Insert your code here."], "outputs": [], "language": "python", "metadata": {}}, {"source": ["__Exercise 8__", "", "Apply the kernelize regression to a real life dataset. Study the influence of $\\la$ and $\\si$."], "cell_type": "markdown", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["exo8()"], "outputs": [], "language": "python", "metadata": {}}, {"collapsed": false, "cell_type": "code", "input": ["%% Insert your code here."], "outputs": [], "language": "python", "metadata": {}}]}], "nbformat_minor": 0, "metadata": {"language_info": {"file_extension": ".m", "mimetype": "text/x-matlab", "help_links": [{"url": "https://github.com/calysto/metakernel/blob/master/metakernel/magics/README.md", "text": "MetaKernel Magics"}], "name": "matlab"}, "kernelspec": {"language": "matlab", "name": "matlab_kernel", "display_name": "Matlab"}}, "nbformat": 3}