{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Basis Functions\n", "### [Neil D. Lawrence](http://inverseprobability.com), University of Sheffield\n", "### 2015-10-20\n", "\n", "**Abstract**: In the last session we explored least squares for univariate and\n", "multivariate *regression*. We introduced *matrices*, *linear algebra*\n", "and *derivatives*.\n", "\n", "In this session we will introduce *basis functions* which allow us to\n", "implement *non-linear regression models*.\n", "\n", "$$\n", "\\newcommand{\\tk}[1]{}\n", "%\\newcommand{\\tk}[1]{\\textbf{TK}: #1}\n", "\\newcommand{\\Amatrix}{\\mathbf{A}}\n", "\\newcommand{\\KL}[2]{\\text{KL}\\left( #1\\,\\|\\,#2 \\right)}\n", "\\newcommand{\\Kaast}{\\kernelMatrix_{\\mathbf{ \\ast}\\mathbf{ \\ast}}}\n", "\\newcommand{\\Kastu}{\\kernelMatrix_{\\mathbf{ \\ast} \\inducingVector}}\n", "\\newcommand{\\Kff}{\\kernelMatrix_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kfu}{\\kernelMatrix_{\\mappingFunctionVector \\inducingVector}}\n", "\\newcommand{\\Kuast}{\\kernelMatrix_{\\inducingVector \\bf\\ast}}\n", "\\newcommand{\\Kuf}{\\kernelMatrix_{\\inducingVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kuu}{\\kernelMatrix_{\\inducingVector \\inducingVector}}\n", "\\newcommand{\\Kuui}{\\Kuu^{-1}}\n", "\\newcommand{\\Qaast}{\\mathbf{Q}_{\\bf \\ast \\ast}}\n", "\\newcommand{\\Qastf}{\\mathbf{Q}_{\\ast \\mappingFunction}}\n", "\\newcommand{\\Qfast}{\\mathbf{Q}_{\\mappingFunctionVector \\bf \\ast}}\n", "\\newcommand{\\Qff}{\\mathbf{Q}_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\aMatrix}{\\mathbf{A}}\n", "\\newcommand{\\aScalar}{a}\n", "\\newcommand{\\aVector}{\\mathbf{a}}\n", "\\newcommand{\\acceleration}{a}\n", "\\newcommand{\\bMatrix}{\\mathbf{B}}\n", "\\newcommand{\\bScalar}{b}\n", "\\newcommand{\\bVector}{\\mathbf{b}}\n", "\\newcommand{\\basisFunc}{\\phi}\n", "\\newcommand{\\basisFuncVector}{\\boldsymbol{ \\basisFunc}}\n", "\\newcommand{\\basisFunction}{\\phi}\n", "\\newcommand{\\basisLocation}{\\mu}\n", "\\newcommand{\\basisMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\basisScalar}{\\basisFunction}\n", "\\newcommand{\\basisVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\activationFunction}{\\phi}\n", "\\newcommand{\\activationMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\activationScalar}{\\basisFunction}\n", "\\newcommand{\\activationVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\bigO}{\\mathcal{O}}\n", "\\newcommand{\\binomProb}{\\pi}\n", "\\newcommand{\\cMatrix}{\\mathbf{C}}\n", "\\newcommand{\\cbasisMatrix}{\\hat{\\boldsymbol{ \\Phi}}}\n", "\\newcommand{\\cdataMatrix}{\\hat{\\dataMatrix}}\n", "\\newcommand{\\cdataScalar}{\\hat{\\dataScalar}}\n", "\\newcommand{\\cdataVector}{\\hat{\\dataVector}}\n", "\\newcommand{\\centeredKernelMatrix}{\\mathbf{ \\MakeUppercase{\\centeredKernelScalar}}}\n", "\\newcommand{\\centeredKernelScalar}{b}\n", "\\newcommand{\\centeredKernelVector}{\\centeredKernelScalar}\n", "\\newcommand{\\centeringMatrix}{\\mathbf{H}}\n", "\\newcommand{\\chiSquaredDist}[2]{\\chi_{#1}^{2}\\left(#2\\right)}\n", "\\newcommand{\\chiSquaredSamp}[1]{\\chi_{#1}^{2}}\n", "\\newcommand{\\conditionalCovariance}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\coregionalizationMatrix}{\\mathbf{B}}\n", "\\newcommand{\\coregionalizationScalar}{b}\n", "\\newcommand{\\coregionalizationVector}{\\mathbf{ \\coregionalizationScalar}}\n", "\\newcommand{\\covDist}[2]{\\text{cov}_{#2}\\left(#1\\right)}\n", "\\newcommand{\\covSamp}[1]{\\text{cov}\\left(#1\\right)}\n", "\\newcommand{\\covarianceScalar}{c}\n", "\\newcommand{\\covarianceVector}{\\mathbf{ \\covarianceScalar}}\n", "\\newcommand{\\covarianceMatrix}{\\mathbf{C}}\n", "\\newcommand{\\covarianceMatrixTwo}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\croupierScalar}{s}\n", "\\newcommand{\\croupierVector}{\\mathbf{ \\croupierScalar}}\n", "\\newcommand{\\croupierMatrix}{\\mathbf{ \\MakeUppercase{\\croupierScalar}}}\n", "\\newcommand{\\dataDim}{p}\n", "\\newcommand{\\dataIndex}{i}\n", "\\newcommand{\\dataIndexTwo}{j}\n", "\\newcommand{\\dataMatrix}{\\mathbf{Y}}\n", "\\newcommand{\\dataScalar}{y}\n", "\\newcommand{\\dataSet}{\\mathcal{D}}\n", "\\newcommand{\\dataStd}{\\sigma}\n", "\\newcommand{\\dataVector}{\\mathbf{ \\dataScalar}}\n", "\\newcommand{\\decayRate}{d}\n", "\\newcommand{\\degreeMatrix}{\\mathbf{ \\MakeUppercase{\\degreeScalar}}}\n", "\\newcommand{\\degreeScalar}{d}\n", "\\newcommand{\\degreeVector}{\\mathbf{ \\degreeScalar}}\n", "% Already defined by latex\n", "%\\newcommand{\\det}[1]{\\left|#1\\right|}\n", "\\newcommand{\\diag}[1]{\\text{diag}\\left(#1\\right)}\n", "\\newcommand{\\diagonalMatrix}{\\mathbf{D}}\n", "\\newcommand{\\diff}[2]{\\frac{\\text{d}#1}{\\text{d}#2}}\n", "\\newcommand{\\diffTwo}[2]{\\frac{\\text{d}^2#1}{\\text{d}#2^2}}\n", "\\newcommand{\\displacement}{x}\n", "\\newcommand{\\displacementVector}{\\textbf{\\displacement}}\n", "\\newcommand{\\distanceMatrix}{\\mathbf{ \\MakeUppercase{\\distanceScalar}}}\n", "\\newcommand{\\distanceScalar}{d}\n", "\\newcommand{\\distanceVector}{\\mathbf{ \\distanceScalar}}\n", "\\newcommand{\\eigenvaltwo}{\\ell}\n", "\\newcommand{\\eigenvaltwoMatrix}{\\mathbf{L}}\n", "\\newcommand{\\eigenvaltwoVector}{\\mathbf{l}}\n", "\\newcommand{\\eigenvalue}{\\lambda}\n", "\\newcommand{\\eigenvalueMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\eigenvalueVector}{\\boldsymbol{ \\lambda}}\n", "\\newcommand{\\eigenvector}{\\mathbf{ \\eigenvectorScalar}}\n", "\\newcommand{\\eigenvectorMatrix}{\\mathbf{U}}\n", "\\newcommand{\\eigenvectorScalar}{u}\n", "\\newcommand{\\eigenvectwo}{\\mathbf{v}}\n", "\\newcommand{\\eigenvectwoMatrix}{\\mathbf{V}}\n", "\\newcommand{\\eigenvectwoScalar}{v}\n", "\\newcommand{\\entropy}[1]{\\mathcal{H}\\left(#1\\right)}\n", "\\newcommand{\\errorFunction}{E}\n", "\\newcommand{\\expDist}[2]{\\left<#1\\right>_{#2}}\n", "\\newcommand{\\expSamp}[1]{\\left<#1\\right>}\n", "\\newcommand{\\expectation}[1]{\\left\\langle #1 \\right\\rangle }\n", "\\newcommand{\\expectationDist}[2]{\\left\\langle #1 \\right\\rangle _{#2}}\n", "\\newcommand{\\expectedDistanceMatrix}{\\mathcal{D}}\n", "\\newcommand{\\eye}{\\mathbf{I}}\n", "\\newcommand{\\fantasyDim}{r}\n", "\\newcommand{\\fantasyMatrix}{\\mathbf{ \\MakeUppercase{\\fantasyScalar}}}\n", "\\newcommand{\\fantasyScalar}{z}\n", "\\newcommand{\\fantasyVector}{\\mathbf{ \\fantasyScalar}}\n", "\\newcommand{\\featureStd}{\\varsigma}\n", "\\newcommand{\\gammaCdf}[3]{\\mathcal{GAMMA CDF}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaDist}[3]{\\mathcal{G}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaSamp}[2]{\\mathcal{G}\\left(#1,#2\\right)}\n", "\\newcommand{\\gaussianDist}[3]{\\mathcal{N}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gaussianSamp}[2]{\\mathcal{N}\\left(#1,#2\\right)}\n", "\\newcommand{\\given}{|}\n", "\\newcommand{\\half}{\\frac{1}{2}}\n", "\\newcommand{\\heaviside}{H}\n", "\\newcommand{\\hiddenMatrix}{\\mathbf{ \\MakeUppercase{\\hiddenScalar}}}\n", "\\newcommand{\\hiddenScalar}{h}\n", "\\newcommand{\\hiddenVector}{\\mathbf{ \\hiddenScalar}}\n", "\\newcommand{\\identityMatrix}{\\eye}\n", "\\newcommand{\\inducingInputScalar}{z}\n", "\\newcommand{\\inducingInputVector}{\\mathbf{ \\inducingInputScalar}}\n", "\\newcommand{\\inducingInputMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\inducingScalar}{u}\n", "\\newcommand{\\inducingVector}{\\mathbf{ \\inducingScalar}}\n", "\\newcommand{\\inducingMatrix}{\\mathbf{U}}\n", "\\newcommand{\\inlineDiff}[2]{\\text{d}#1/\\text{d}#2}\n", "\\newcommand{\\inputDim}{q}\n", "\\newcommand{\\inputMatrix}{\\mathbf{X}}\n", "\\newcommand{\\inputScalar}{x}\n", "\\newcommand{\\inputSpace}{\\mathcal{X}}\n", "\\newcommand{\\inputVals}{\\inputVector}\n", "\\newcommand{\\inputVector}{\\mathbf{ \\inputScalar}}\n", "\\newcommand{\\iterNum}{k}\n", "\\newcommand{\\kernel}{\\kernelScalar}\n", "\\newcommand{\\kernelMatrix}{\\mathbf{K}}\n", "\\newcommand{\\kernelScalar}{k}\n", "\\newcommand{\\kernelVector}{\\mathbf{ \\kernelScalar}}\n", "\\newcommand{\\kff}{\\kernelScalar_{\\mappingFunction \\mappingFunction}}\n", "\\newcommand{\\kfu}{\\kernelVector_{\\mappingFunction \\inducingScalar}}\n", "\\newcommand{\\kuf}{\\kernelVector_{\\inducingScalar \\mappingFunction}}\n", "\\newcommand{\\kuu}{\\kernelVector_{\\inducingScalar \\inducingScalar}}\n", "\\newcommand{\\lagrangeMultiplier}{\\lambda}\n", "\\newcommand{\\lagrangeMultiplierMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\lagrangian}{L}\n", "\\newcommand{\\laplacianFactor}{\\mathbf{ \\MakeUppercase{\\laplacianFactorScalar}}}\n", "\\newcommand{\\laplacianFactorScalar}{m}\n", "\\newcommand{\\laplacianFactorVector}{\\mathbf{ \\laplacianFactorScalar}}\n", "\\newcommand{\\laplacianMatrix}{\\mathbf{L}}\n", "\\newcommand{\\laplacianScalar}{\\ell}\n", "\\newcommand{\\laplacianVector}{\\mathbf{ \\ell}}\n", "\\newcommand{\\latentDim}{q}\n", "\\newcommand{\\latentDistanceMatrix}{\\boldsymbol{ \\Delta}}\n", "\\newcommand{\\latentDistanceScalar}{\\delta}\n", "\\newcommand{\\latentDistanceVector}{\\boldsymbol{ \\delta}}\n", "\\newcommand{\\latentForce}{f}\n", "\\newcommand{\\latentFunction}{u}\n", "\\newcommand{\\latentFunctionVector}{\\mathbf{ \\latentFunction}}\n", "\\newcommand{\\latentFunctionMatrix}{\\mathbf{ \\MakeUppercase{\\latentFunction}}}\n", "\\newcommand{\\latentIndex}{j}\n", "\\newcommand{\\latentScalar}{z}\n", "\\newcommand{\\latentVector}{\\mathbf{ \\latentScalar}}\n", "\\newcommand{\\latentMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\learnRate}{\\eta}\n", "\\newcommand{\\lengthScale}{\\ell}\n", "\\newcommand{\\rbfWidth}{\\ell}\n", "\\newcommand{\\likelihoodBound}{\\mathcal{L}}\n", "\\newcommand{\\likelihoodFunction}{L}\n", "\\newcommand{\\locationScalar}{\\mu}\n", "\\newcommand{\\locationVector}{\\boldsymbol{ \\locationScalar}}\n", "\\newcommand{\\locationMatrix}{\\mathbf{M}}\n", "\\newcommand{\\variance}[1]{\\text{var}\\left( #1 \\right)}\n", "\\newcommand{\\mappingFunction}{f}\n", "\\newcommand{\\mappingFunctionMatrix}{\\mathbf{F}}\n", "\\newcommand{\\mappingFunctionTwo}{g}\n", "\\newcommand{\\mappingFunctionTwoMatrix}{\\mathbf{G}}\n", "\\newcommand{\\mappingFunctionTwoVector}{\\mathbf{ \\mappingFunctionTwo}}\n", "\\newcommand{\\mappingFunctionVector}{\\mathbf{ \\mappingFunction}}\n", "\\newcommand{\\scaleScalar}{s}\n", "\\newcommand{\\mappingScalar}{w}\n", "\\newcommand{\\mappingVector}{\\mathbf{ \\mappingScalar}}\n", "\\newcommand{\\mappingMatrix}{\\mathbf{W}}\n", "\\newcommand{\\mappingScalarTwo}{v}\n", "\\newcommand{\\mappingVectorTwo}{\\mathbf{ \\mappingScalarTwo}}\n", "\\newcommand{\\mappingMatrixTwo}{\\mathbf{V}}\n", "\\newcommand{\\maxIters}{K}\n", "\\newcommand{\\meanMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanScalar}{\\mu}\n", "\\newcommand{\\meanTwoMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanTwoScalar}{m}\n", "\\newcommand{\\meanTwoVector}{\\mathbf{ \\meanTwoScalar}}\n", "\\newcommand{\\meanVector}{\\boldsymbol{ \\meanScalar}}\n", "\\newcommand{\\mrnaConcentration}{m}\n", "\\newcommand{\\naturalFrequency}{\\omega}\n", "\\newcommand{\\neighborhood}[1]{\\mathcal{N}\\left( #1 \\right)}\n", "\\newcommand{\\neilurl}{http://inverseprobability.com/}\n", "\\newcommand{\\noiseMatrix}{\\boldsymbol{ E}}\n", "\\newcommand{\\noiseScalar}{\\epsilon}\n", "\\newcommand{\\noiseVector}{\\boldsymbol{ \\epsilon}}\n", "\\newcommand{\\norm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\normalizedLaplacianMatrix}{\\hat{\\mathbf{L}}}\n", "\\newcommand{\\normalizedLaplacianScalar}{\\hat{\\ell}}\n", "\\newcommand{\\normalizedLaplacianVector}{\\hat{\\mathbf{ \\ell}}}\n", "\\newcommand{\\numActive}{m}\n", "\\newcommand{\\numBasisFunc}{m}\n", "\\newcommand{\\numComponents}{m}\n", "\\newcommand{\\numComps}{K}\n", "\\newcommand{\\numData}{n}\n", "\\newcommand{\\numFeatures}{K}\n", "\\newcommand{\\numHidden}{h}\n", "\\newcommand{\\numInducing}{m}\n", "\\newcommand{\\numLayers}{\\ell}\n", "\\newcommand{\\numNeighbors}{K}\n", "\\newcommand{\\numSequences}{s}\n", "\\newcommand{\\numSuccess}{s}\n", "\\newcommand{\\numTasks}{m}\n", "\\newcommand{\\numTime}{T}\n", "\\newcommand{\\numTrials}{S}\n", "\\newcommand{\\outputIndex}{j}\n", "\\newcommand{\\paramVector}{\\boldsymbol{ \\theta}}\n", "\\newcommand{\\parameterMatrix}{\\boldsymbol{ \\Theta}}\n", "\\newcommand{\\parameterScalar}{\\theta}\n", "\\newcommand{\\parameterVector}{\\boldsymbol{ \\parameterScalar}}\n", "\\newcommand{\\partDiff}[2]{\\frac{\\partial#1}{\\partial#2}}\n", "\\newcommand{\\precisionScalar}{j}\n", "\\newcommand{\\precisionVector}{\\mathbf{ \\precisionScalar}}\n", "\\newcommand{\\precisionMatrix}{\\mathbf{J}}\n", "\\newcommand{\\pseudotargetScalar}{\\widetilde{y}}\n", "\\newcommand{\\pseudotargetVector}{\\mathbf{ \\pseudotargetScalar}}\n", "\\newcommand{\\pseudotargetMatrix}{\\mathbf{ \\widetilde{Y}}}\n", "\\newcommand{\\rank}[1]{\\text{rank}\\left(#1\\right)}\n", "\\newcommand{\\rayleighDist}[2]{\\mathcal{R}\\left(#1|#2\\right)}\n", "\\newcommand{\\rayleighSamp}[1]{\\mathcal{R}\\left(#1\\right)}\n", "\\newcommand{\\responsibility}{r}\n", "\\newcommand{\\rotationScalar}{r}\n", "\\newcommand{\\rotationVector}{\\mathbf{ \\rotationScalar}}\n", "\\newcommand{\\rotationMatrix}{\\mathbf{R}}\n", "\\newcommand{\\sampleCovScalar}{s}\n", "\\newcommand{\\sampleCovVector}{\\mathbf{ \\sampleCovScalar}}\n", "\\newcommand{\\sampleCovMatrix}{\\mathbf{s}}\n", "\\newcommand{\\scalarProduct}[2]{\\left\\langle{#1},{#2}\\right\\rangle}\n", "\\newcommand{\\sign}[1]{\\text{sign}\\left(#1\\right)}\n", "\\newcommand{\\sigmoid}[1]{\\sigma\\left(#1\\right)}\n", "\\newcommand{\\singularvalue}{\\ell}\n", "\\newcommand{\\singularvalueMatrix}{\\mathbf{L}}\n", "\\newcommand{\\singularvalueVector}{\\mathbf{l}}\n", "\\newcommand{\\sorth}{\\mathbf{u}}\n", "\\newcommand{\\spar}{\\lambda}\n", "\\newcommand{\\trace}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\BasalRate}{B}\n", "\\newcommand{\\DampingCoefficient}{C}\n", "\\newcommand{\\DecayRate}{D}\n", "\\newcommand{\\Displacement}{X}\n", "\\newcommand{\\LatentForce}{F}\n", "\\newcommand{\\Mass}{M}\n", "\\newcommand{\\Sensitivity}{S}\n", "\\newcommand{\\basalRate}{b}\n", "\\newcommand{\\dampingCoefficient}{c}\n", "\\newcommand{\\mass}{m}\n", "\\newcommand{\\sensitivity}{s}\n", "\\newcommand{\\springScalar}{\\kappa}\n", "\\newcommand{\\springVector}{\\boldsymbol{ \\kappa}}\n", "\\newcommand{\\springMatrix}{\\boldsymbol{ \\mathcal{K}}}\n", "\\newcommand{\\tfConcentration}{p}\n", "\\newcommand{\\tfDecayRate}{\\delta}\n", "\\newcommand{\\tfMrnaConcentration}{f}\n", "\\newcommand{\\tfVector}{\\mathbf{ \\tfConcentration}}\n", "\\newcommand{\\velocity}{v}\n", "\\newcommand{\\sufficientStatsScalar}{g}\n", "\\newcommand{\\sufficientStatsVector}{\\mathbf{ \\sufficientStatsScalar}}\n", "\\newcommand{\\sufficientStatsMatrix}{\\mathbf{G}}\n", "\\newcommand{\\switchScalar}{s}\n", "\\newcommand{\\switchVector}{\\mathbf{ \\switchScalar}}\n", "\\newcommand{\\switchMatrix}{\\mathbf{S}}\n", "\\newcommand{\\tr}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\loneNorm}[1]{\\left\\Vert #1 \\right\\Vert_1}\n", "\\newcommand{\\ltwoNorm}[1]{\\left\\Vert #1 \\right\\Vert_2}\n", "\\newcommand{\\onenorm}[1]{\\left\\vert#1\\right\\vert_1}\n", "\\newcommand{\\twonorm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\vScalar}{v}\n", "\\newcommand{\\vVector}{\\mathbf{v}}\n", "\\newcommand{\\vMatrix}{\\mathbf{V}}\n", "\\newcommand{\\varianceDist}[2]{\\text{var}_{#2}\\left( #1 \\right)}\n", "% Already defined by latex\n", "%\\newcommand{\\vec}{#1:}\n", "\\newcommand{\\vecb}[1]{\\left(#1\\right):}\n", "\\newcommand{\\weightScalar}{w}\n", "\\newcommand{\\weightVector}{\\mathbf{ \\weightScalar}}\n", "\\newcommand{\\weightMatrix}{\\mathbf{W}}\n", "\\newcommand{\\weightedAdjacencyMatrix}{\\mathbf{A}}\n", "\\newcommand{\\weightedAdjacencyScalar}{a}\n", "\\newcommand{\\weightedAdjacencyVector}{\\mathbf{ \\weightedAdjacencyScalar}}\n", "\\newcommand{\\onesVector}{\\mathbf{1}}\n", "\\newcommand{\\zerosVector}{\\mathbf{0}}\n", "$$\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nonlinear Regression\n", "\n", "We've now seen how we may perform linear regression. Now, we are going\n", "to consider how we can perform *non-linear* regression. However, before\n", "we get into the details of how to do that we first need to consider in\n", "what ways the regression can be non-linear. Multivariate linear\n", "regression allows us to build models that take many features into\n", "account when making our prediction. In this session we are going to\n", "introduce *basis functions*. The term seems complicted, but they are\n", "actually based on rather a simple idea. If we are doing a multivariate\n", "linear regression, we get extra features that *might* help us predict\n", "our required response varible (or target value), $y$. But what if we\n", "only have one input value? We can actually artificially generate more\n", "input values with basis functions.\n", "\n", "## Non-linear in the Inputs\n", "\n", "When we refer to non-linear regression, we are normally referring to\n", "whether the regression is non-linear in the input space, or non-linear\n", "in the *covariates*. The covariates are the observations that move with\n", "the target (or *response*) variable. In our notation we have been using\n", "$\\inputVector_i$ to represent a vector of the covariates associated with\n", "the $i$th observation. The coresponding response variable is\n", "$\\dataScalar_i$. If a model is non-linear in the inputs, it means that\n", "there is a non-linear function between the inputs and the response\n", "variable. Linear functions are functions that only involve\n", "multiplication and addition, in other words they can be represented\n", "through *linear algebra*. Linear regression involves assuming that a\n", "function takes the form $$\n", "\\mappingFunction(\\inputVector) = \\mappingVector^\\top \\inputVector\n", "$$ where $\\mappingVector$ are our regression weights. A very easy way to\n", "make the linear regression non-linear is to introduce non-linear\n", "functions. When we are introducing non-linear regression these functions\n", "are known as *basis functions*.\n", "\n", "# Basis Functions\n", "\n", "## Basis Functions \\[edit\\]\n", "\n", "Here's the idea, instead of working directly on the original input\n", "space, $\\inputVector$, we build models in a new space,\n", "$\\basisVector(\\inputVector)$ where $\\basisVector(\\cdot)$ is a\n", "*vector-valued* function that is defined on the space $\\inputVector$.\n", "\n", "## Quadratic Basis\n", "\n", "Remember, that a *vector-valued function* is just a vector that contains\n", "functions instead of values. Here's an example for a one dimensional\n", "input space, $x$, being projected to a *quadratic* basis. First we\n", "consider each basis function in turn, we can think of the elements of\n", "our vector as being indexed so that we have $$\n", "\\begin{align*}\n", "\\basisFunc_1(\\inputScalar) = 1, \\\\\n", "\\basisFunc_2(\\inputScalar) = x, \\\\\n", "\\basisFunc_3(\\inputScalar) = \\inputScalar^2.\n", "\\end{align*}\n", "$$ Now we can consider them together by placing them in a vector, $$\n", "\\basisVector(\\inputScalar) = \\begin{bmatrix} 1\\\\ x \\\\ \\inputScalar^2\\end{bmatrix}.\n", "$$ For the vector-valued function, we have simply collected the\n", "different functions together in the same vector making them notationally\n", "easier to deal with in our mathematics.\n", "\n", "When we consider the vector-valued function for each data point, then we\n", "place all the data into a matrix. The result is a matrix valued\n", "function, $$\n", "\\basisMatrix(\\inputVector) = \n", "\\begin{bmatrix} 1 & \\inputScalar_1 &\n", "\\inputScalar_1^2 \\\\\n", "1 & \\inputScalar_2 & \\inputScalar_2^2\\\\\n", "\\vdots & \\vdots & \\vdots \\\\\n", "1 & \\inputScalar_n & \\inputScalar_n^2\n", "\\end{bmatrix}\n", "$$ where we are still in the one dimensional input setting so\n", "$\\inputVector$ here represents a vector of our inputs with $\\numData$\n", "elements.\n", "\n", "Let's try constructing such a matrix for a set of inputs. First of all,\n", "we create a function that returns the matrix valued function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def quadratic(x, **kwargs):\n", " \"\"\"Take in a vector of input values and return the design matrix associated \n", " with the basis functions.\"\"\"\n", " return np.hstack([np.ones((x.shape[0], 1)), x, x**2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions Derived from Quadratic Basis\n", "\n", "$$\n", "\\mappingFunction(\\inputScalar) = {\\color{cyan}\\mappingScalar_0} + {\\color{green}\\mappingScalar_1 \\inputScalar} + {\\color{yellow}\\mappingScalar_2 \\inputScalar^2}\n", "$$\n", "\n", "{\n", "\n", "Figure:\n", "{" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('\\basisfunction{num_basis:0>3}.svg', \n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(0,0,2,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function takes in an $\\numData \\times 1$ dimensional vector and\n", "returns an $\\numData \\times 3$ dimensional *design matrix* containing\n", "the basis functions. We can plot those basis functions against there\n", "input as follows.\n", "\n", "The actual function we observe is then made up of a sum of these\n", "functions. This is the reason for the name basis. The term *basis* means\n", "'the underlying support or foundation for an idea, argument, or\n", "process', and in this context they form the underlying support for our\n", "prediction function. Our prediction function can only be composed of a\n", "weighted linear sum of our basis functions.\n", "\n", "## Quadratic Functions\n", "\n", "\n", "\n", "Figure: Functions constructed by weighted sum of the components of a\n", "quadratic basis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('quadratic_function{num_function:0>3}.svg', \n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(0,0,2,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Different Bases \\[edit\\]\n", "\n", "Our choice of basis can be made based on what our beliefs about what is\n", "appropriate for the data. For example, the polynomial basis extends the\n", "quadratic basis to aribrary degree, so we might define the $j$th basis\n", "function associated with the model as $$\n", "\\basisFunc_j(\\inputScalar_i) = \\inputScalar_i^j\n", "$$ which is known as the *polynomial basis*.\n", "\n", "## Polynomial Basis \\[edit\\]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai\n", "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s polynomial mlai.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions Derived from Polynomial Basis\n", "\n", "$$\n", "\\mappingFunction(\\inputScalar) = {\\color{cyan}\\mappingScalar_0} + {\\color{green}\\mappingScalar_1 \\inputScalar} + {\\color{yellow}\\mappingScalar_2 \\inputScalar^2}\n", "$$\n", "\n", "\n", "\n", "Figure: A polynomial basis is made up of different degrees of\n", "polynomial." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('polynomial_basis{num_basis:0>3}.svg', \n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(1,1,4,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To aid in understanding how a basis works, we've provided you with a\n", "small interactive tool for exploring this polynomial basis. The tool can\n", "be summoned with the following command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_prediction(basis=mlai.polynomial, num_basis=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try moving the sliders around to change the weight of each basis\n", "function. Click the control box `display_basis` to show the underlying\n", "basis functions (in red). The prediction function is shown in a thick\n", "blue line. *Warning* the sliders aren't presented quite in the correct\n", "order. `w_0` is associated with the bias, `w_1` is the linear term,\n", "`w_2` the quadratic and here (because we have four basis functions) we\n", "have `w_3` for the *cubic* term. So the subscript of the weight\n", "parameter is always associated with the corresponding polynomial's\n", "degree.\n", "\n", "## Different Basis\n", "\n", "The polynomial basis is widely used in Engineering and graphics, but it\n", "has some drawbacks in machine learning: outside the input region between\n", "-1 and 1, the values of the polynomial basis rise very quickly.\n", "\n", "Now we look at basis functions that have been used as the *activation*\n", "functions in neural network model.\n", "\n", "## Radial Basis Functions \\[edit\\]\n", "\n", "Another type of basis is sometimes known as a 'radial basis' because the\n", "effect basis functions are constructed on 'centres' and the effect of\n", "each basis function decreases as the radial distance from each centre\n", "increases.\n", "\n", "$$\\basisFunc_j(\\inputScalar) = \\exp\\left(-\\frac{(\\inputScalar-\\mu_j)^2}{\\lengthScale^2}\\right)$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai\n", "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s radial mlai.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_prediction(basis=mlai.radial, num_basis=4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import IntSlider\n", "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('radial_basis{num_basis:0>3}.svg', \n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(0,0,2,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions Derived from Radial Basis\n", "\n", "$$\n", "\\mappingFunction(\\inputScalar) = {\\color{cyan}\\mappingScalar_1 e^{-2(\\inputScalar+1)^2}} + {\\color{green}\\mappingScalar_2e^{-2\\inputScalar^2}} + {\\color{yellow}\\mappingScalar_3 e^{-2(\\inputScalar-1)^2}}\n", "$$\n", "\n", "\n", "\n", "Figure: A radial basis is made up of different locally effective\n", "functions centered at different points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import IntSlider\n", "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('radial_function{func_num:0>3}.svg', directory='../slides/diagrams/ml', func_num=IntSlider(0,0,2,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rectified Linear Units \\[edit\\]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s relu mlai.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_prediction(basis=mlai.relu, num_basis=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rectified linear units are popular in the current generation of\n", "multilayer perceptron models, or deep networks. These basis functions\n", "start flat, and then become linear functions at a certain threshold." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions Derived from Relu Basis\n", "\n", "$$\n", "\\mappingFunction(\\inputScalar) = {\\color{cyan}\\mappingScalar_0} + {\\color{green}\\mappingScalar_1 xH(x+1.0) } + {\\color{yellow}\\mappingScalar_2 xH(x+0.33) } + {\\color{magenta}\\mappingScalar_3 xH(x-0.33)} + {\\color{red}\\mappingScalar_4 xH(x-1.0)}\n", "$$\n", "\n", "\n", "\n", "Figure: A rectified linear unit basis is made up of different\n", "rectified linear unit functions centered at different points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('relu_basis{num_basis:0>3}.svg', \n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(0,0,4,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hyperbolic Tangent Basis \\[edit\\]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s tanh mlai.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_prediction(basis=mlai.tanh, num_basis=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sigmoid or hyperbolic tangent basis was popular in the original\n", "generation of multilayer perceptron models, or deep networks. These\n", "basis functions start flat, rise and then saturate.\n", "\n", "## Functions Derived from Tanh Basis\n", "\n", "$$\n", "\\mappingFunction(\\inputScalar) = {\\color{cyan}\\mappingScalar_0} + {\\color{green}\\mappingScalar_1 } + {\\color{yellow}\\mappingScalar_3 }\n", "$$\n", "\n", "\n", "\n", "Figure: A hyperbolic tangent basis is made up of s-shaped basis\n", "functions centered at different points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('tanh_basis{num_basis:0>3}.svg', \n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(0,0,4,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fourier Basis \\[edit\\]\n", "\n", "[Joseph Fourier](https://en.wikipedia.org/wiki/Joseph_Fourier) suggested\n", "that functions could be converted to a sum of sines and cosines. A\n", "Fourier basis is a linear weighted sum of these functions.\n", "$$\\basisFunc_j(\\inputScalar) = \\mappingScalar_0 + \\mappingScalar_1 \\sin(\\inputScalar) + \\mappingScalar_2 \\cos(\\inputScalar) + \\mappingScalar_3 \\sin(2\\inputScalar) + \\mappingScalar_4 \\cos(2\\inputScalar)$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s fourier mlai.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai\n", "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('fourier_basis{num_basis:0>3}.svg', \n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(0,0,4,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this code, basis functions with an *odd* index are sine and basis\n", "functions with an *even* index are cosine. The first basis function\n", "(index 0, so cosine) has a frequency of 0 and then frequencies increase\n", "every time a sine and cosine are included." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_prediction(basis=mlai.fourier, num_basis=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions Derived from Fourier Basis\n", "\n", "$$\n", "\\mappingFunction(\\inputScalar) = {\\color{cyan}\\mappingScalar_0} + {\\color{green}\\mappingScalar_1 \\sin(\\inputScalar)} + {\\color{yellow}\\mappingScalar_2 \\cos(\\inputScalar)} + {\\color{magenta}\\mappingScalar_3 \\sin(2\\inputScalar)} + {\\color{red}\\mappingScalar_4 \\cos(2\\inputScalar)}\n", "$$\n", "\n", "\n", "\n", "Figure: A Fourier basis is made up of sine and cosine functions with\n", "different frequencies." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('fourier_function{func_num:0>3}.svg', directory='../slides/diagrams/ml', func_num=IntSlider(0,0,2,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Try increasing the number of basis functions (thereby increasing the\n", "*degree* of the resulting polynomial). Describe what you see as you\n", "increase number of basis up to 10. Is it easy to change the function in\n", "intiutive ways?\n", "\n", "### Write your answer to Exercise here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use this box for any code you need for the exercise\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fitting to Data \\[edit\\]\n", "\n", "Now we are going to consider how these basis functions can be adjusted\n", "to fit to a particular data set. We will return to the olympic marathon\n", "data from last time. First we will scale the output of the data to be\n", "zero mean and variance 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pods.datasets.olympic_marathon_men()\n", "y = data['Y']\n", "x = data['X']\n", "y -= y.mean()\n", "y /= y.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 2\n", "\n", "Now we are going to redefine our polynomial basis. Have a careful look\n", "at the operations we perform on `x` to create `z`. We use `z` in the\n", "polynomial computation. What are we doing to the inputs? Why do you\n", "think we are changing `x` in this manner?\n", "\n", "*10 marks*\n", "\n", "### Write your answer to Question 2 here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s polynomial mlai.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_prediction(basis=dict(radial=mlai.radial, \n", " polynomial=mlai.polynomial, \n", " fourier=mlai.fourier, \n", " relu=mlai.relu), \n", " data_limits=(1888, 2020),\n", " fig=fig, ax=ax,\n", " offset=0.,\n", " wlim = (-4., 4.),\n", " num_basis=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 3\n", "\n", "Use the tool provided above to try and find the best fit you can to the\n", "data. Explore the parameter space and give the weight values you used\n", "for the\n", "\n", "(a) polynomial basis\n", "(b) Radial basis\n", "(c) Fourier basis\n", "\n", "Write your answers in the code box below creating a new vector of\n", "parameters (in the correct order!) for each basis.\n", "\n", "*15 marks*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write code for your answer to Question 3 in this box\n", "# provide the answers so that the code runs correctly otherwise you will loose marks!\n", "\n", "\n", "# (a) polynomial\n", "###### Edit these lines #####\n", "# w_0 =\n", "# w_1 = \n", "# w_2 = \n", "# w_3 =\n", "##############################\n", "# w_polynomial = np.asarray([[w_0], [w_1], [w_2], [w_3]]) \n", "\n", "# (b) radial\n", "###### Edit these lines #####\n", "# w_0 =\n", "# w_1 = \n", "# w_2 = \n", "# w_3 =\n", "##############################\n", "# w_rbf = np.asarray([[w_0], [w_1], [w_2], [w_3]]) \n", "\n", "# (c) fourier\n", "###### Edit these lines #####\n", "# w_0 =\n", "# w_1 = \n", "# w_2 = \n", "# w_3 =\n", "##############################\n", "# w_fourier = np.asarray([[w_0], [w_1], [w_2], [w_3]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.asarray([[1, 2, 3, 4]]).shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basis Function Models\n", "\n", "$$\n", " \\mappingFunction(\\inputVector_i) = \\sum_{j=1}^\\numBasisFunc \\mappingScalar_j \\basisFunc_{i, j}\n", " $$\n", "\n", "$$\n", " \\mappingFunction(\\inputVector_i) = \\mappingVector^\\top \\basisVector_i\n", " $$\n", "\n", "## Log Likelihood for Basis Function Model\n", "\n", "$$\n", " p\\left(\\dataScalar_i|\\inputScalar_i\\right)=\\frac{1}{\\sqrt{2\\pi\\dataStd^2}}\\exp\\left(-\\frac{\\left(\\dataScalar_i-\\mappingVector^{\\top}\\basisVector_i\\right)^{2}}{2\\dataStd^2}\\right).\n", " $$\n", "\n", "$$\n", " L(\\mappingVector,\\dataStd^2)= -\\frac{\\numData}{2}\\log \\dataStd^2-\\frac{\\numData}{2}\\log 2\\pi -\\frac{\\sum_{i=1}^{\\numData}\\left(\\dataScalar_i-\\mappingVector^{\\top}\\basisVector_i\\right)^{2}}{2\\dataStd^2}.\n", " $$\n", "\n", "## Objective Function\n", "\n", "$$\n", " \\errorFunction(\\mappingVector,\\dataStd^2)= \\frac{\\numData}{2}\\log\\dataStd^2 + \\frac{\\sum_{i=1}^{\\numData}\\left(\\dataScalar_i-\\mappingVector^{\\top}\\basisVector_i\\right)^{2}}{2\\dataStd^2}.\n", " $$\n", "\n", "## Expand the Brackets\n", "\n", "$$\n", "\\begin{align}\n", " \\errorFunction(\\mappingVector,\\dataStd^2) = &\\frac{\\numData}{2}\\log \\dataStd^2 + \\frac{1}{2\\dataStd^2}\\sum_{i=1}^{\\numData}\\dataScalar_i^{2}-\\frac{1}{\\dataStd^2}\\sum_{i=1}^{\\numData}\\dataScalar_i\\mappingVector^{\\top}\\basisVector_i\\\\ &+\\frac{1}{2\\dataStd^2}\\sum_{i=1}^{\\numData}\\mappingVector^{\\top}\\basisVector_i\\basisVector_i^{\\top}\\mappingVector+\\text{const}.\n", "\\end{align}\n", "$$\n", "\n", "## Expand the Brackets\n", "\n", "$$\\begin{align} \\errorFunction(\\mappingVector, \\dataStd^2) = & \\frac{\\numData}{2}\\log \\dataStd^2 + \\frac{1}{2\\dataStd^2}\\sum_{i=1}^{\\numData}\\dataScalar_i^{2}-\\frac{1}{\\dataStd^2} \\mappingVector^\\top\\sum_{i=1}^{\\numData}\\basisVector_i \\dataScalar_i\\\\ & +\\frac{1}{2\\dataStd^2}\\mappingVector^{\\top}\\left[\\sum_{i=1}^{\\numData}\\basisVector_i\\basisVector_i^{\\top}\\right]\\mappingVector+\\text{const}.\\end{align}$$\n", "\n", "## Design Matrices\n", "\n", "We like to make use of *design* matrices for our data. Design matrices,\n", "as you will recall, involve placing the data points into rows of the\n", "matrix and data features into the columns of the matrix. By convention,\n", "we are referincing a vector with a bold lower case letter, and a matrix\n", "with a bold upper case letter. The design matrix is therefore given by\n", "$$\n", " \\basisMatrix = \\begin{bmatrix} \\mathbf{1} & \\inputVector & \\inputVector^2\\end{bmatrix}\n", " $$ so that $$\n", " \\basisMatrix \\in \\Re^{\\numData \\times \\dataDim}.\n", " $$\n", "\n", "## Multivariate Derivatives Reminder\n", "\n", "$$\\frac{\\text{d}\\mathbf{a}^{\\top}\\mappingVector}{\\text{d}\\mappingVector}=\\mathbf{a}$$\n", "and\n", "$$\\frac{\\text{d}\\mappingVector^{\\top}\\mathbf{A}\\mappingVector}{\\text{d}\\mappingVector}=\\left(\\mathbf{A}+\\mathbf{A}^{\\top}\\right)\\mappingVector$$\n", "or if $\\mathbf{A}$ is symmetric (*i.e.* $\\mathbf{A}=\\mathbf{A}^{\\top}$)\n", "$$\\frac{\\text{d}\\mappingVector^{\\top}\\mathbf{A}\\mappingVector}{\\text{d}\\mappingVector}=2\\mathbf{A}\\mappingVector.$$\n", "\n", "## Differentiate\n", "\n", "Differentiating with respect to the vector $\\mappingVector$ we obtain\n", "$$\\frac{\\text{d} E\\left(\\mappingVector,\\dataStd^2 \\right)}{\\text{d}\\mappingVector}=-\\frac{1}{\\dataStd^2} \\sum_{i=1}^{\\numData}\\basisVector_i\\dataScalar_i+\\frac{1}{\\dataStd^2} \\left[\\sum_{i=1}^{\\numData}\\basisVector_i\\basisVector_i^{\\top}\\right]\\mappingVector$$\n", "Leading to\n", "$$\\mappingVector^{*}=\\left[\\sum_{i=1}^{\\numData}\\basisVector_i\\basisVector_i^{\\top}\\right]^{-1}\\sum_{i=1}^{\\numData}\\basisVector_i\\dataScalar_i,$$\n", "\n", "## Matrix Notation\n", "\n", "Rewrite in matrix notation: $$\n", "\\sum_{i=1}^{\\numData}\\basisVector_i\\basisVector_i^\\top = \\basisMatrix^\\top \\basisMatrix$$\n", "$$\\sum _{i=1}^{\\numData}\\basisVector_i\\dataScalar_i = \\basisMatrix^\\top \\dataVector\n", "$$\n", "\n", "## Update Equations\n", "\n", "- Update for $\\mappingVector^{*}$ $$\n", " \\mappingVector^{*} = \\left(\\basisMatrix^\\top \\basisMatrix\\right)^{-1} \\basisMatrix^\\top \\dataVector\n", " $$\n", "- The equation for $\\left.\\dataStd^2\\right.^{*}$ may also be found $$\n", " \\left.\\dataStd^2\\right.^{{*}}=\\frac{\\sum_{i=1}^{\\numData}\\left(\\dataScalar_i-\\left.\\mappingVector^{*}\\right.^{\\top}\\basisVector_i\\right)^{2}}{\\numData}.\n", " $$\n", "\n", "## Avoid Direct Inverse\n", "\n", "- E.g. Solve for $\\mappingVector$ $$\n", " \\left(\\basisMatrix^\\top \\basisMatrix\\right)\\mappingVector = \\basisMatrix^\\top \\dataVector\n", " $$\n", "- See `np.linalg.solve`\n", "- In practice use $\\mathbf{Q}\\mathbf{R}$ decomposition (see lab class\n", " notes).\n", "\n", "## Polynomial Fits to Olympic Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import teaching_plots as plot\n", "import mlai\n", "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "basis = mlai.polynomial\n", "\n", "data = pods.datasets.olympic_marathon_men()\n", "\n", "x = data['X']\n", "y = data['Y']\n", "\n", "xlim = [1892, 2020]\n", "max_basis = 27\n", "\n", "ll = np.array([np.nan]*(max_basis))\n", "sum_squares = np.array([np.nan]*(max_basis))\n", "basis=mlai.Basis(mlai.polynomial, number=1, data_limits=xlim)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('olympic_LM_polynomial_number{num_basis:0>3}.svg',\n", " directory='../slides/diagrams/ml', \n", " num_basis=IntSlider(1,1,28,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Non-linear but Linear in the Parameters\n", "\n", "One rather nice aspect of our model is that whilst it is non-linear in\n", "the inputs, it is still linear in the parameters $\\mappingVector$. This\n", "means that our derivations from before continue to operate to allow us\n", "to work with this model. In fact, although this is a non-linear\n", "regression it is still known as a *linear model* because it is linear in\n", "the parameters,\n", "\n", "$$\n", "\\mappingFunction(\\inputVector) = \\mappingVector^\\top \\basisVector(\\inputVector)\n", "$$ where the vector $\\inputVector$ appears inside the basis functions,\n", "making our result, $\\mappingFunction(\\inputVector)$ non-linear in the\n", "inputs, but $\\mappingVector$ appears outside our basis function, making\n", "our result *linear* in the parameters. In practice, our basis function\n", "itself may contain its own set of parameters, $$\n", "\\mappingFunction(\\inputVector) = \\mappingVector^\\top \\basisVector(\\inputVector;\n", "\\boldsymbol{\\theta}),\n", "$$ that we've denoted here as $\\boldsymbol{\\theta}$. If these parameters\n", "appear inside the basis function then our model is *non-linear* in these\n", "parameters.\n", "\n", "### Question 4\n", "\n", "For the following prediction functions state whether the model is linear\n", "in the inputs, the parameters or both.\n", "\n", "(a) $\\mappingFunction(\\inputScalar) = \\mappingScalar_1\\inputScalar_1 + \\mappingScalar_2$\n", "\n", "(b) $\\mappingFunction(\\inputScalar) = \\mappingScalar_1\\exp(\\inputScalar_1) + \\mappingScalar_2\\inputScalar_2 + \\mappingScalar_3$\n", "\n", "(c) $\\mappingFunction(\\inputScalar) = \\log(\\inputScalar_1^{\\mappingScalar_1}) + \\mappingScalar_2\\inputScalar_2^2 + \\mappingScalar_3$\n", "\n", "(d) $\\mappingFunction(\\inputScalar) = \\exp(-\\sum_i(\\inputScalar_i - \\mappingScalar_i)^2)$\n", "\n", "(e) $\\mappingFunction(\\inputScalar) = \\exp(-\\mappingVector^\\top \\inputVector)$\n", "\n", "*25 marks*\n", "\n", "### Write your answer to Question 4 here\n", "\n", "## Fitting the Model Yourself\n", "\n", "You now have everything you need to fit a non- linear (in the inputs)\n", "basis function model to the marathon data.\n", "\n", "### Question 5\n", "\n", "Choose one of the basis functions you have explored above. Compute the\n", "design matrix on the covariates (or input data), `x`. Use the design\n", "matrix and the response variable `y` to solve the following linear\n", "system for the model parameters `w`. $$\n", "\\basisVector^\\top\\basisVector\\mappingVector = \\basisVector^\\top \\dataVector\n", "$$ Compute the corresponding error on the training data. How does it\n", "compare to the error you were able to achieve fitting the basis above?\n", "Plot the form of your prediction function from the least squares\n", "estimate alongside the form of you prediction function you fitted by\n", "hand.\n", "\n", "*35 marks*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write code for your answer to Question 5 in this box\n", "# provide the answers so that the code runs correctly otherwise you will loose marks!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\addreading{@Rogers:book11}{Section 1.4}\n", "\\addreading{@Bishop:book06}{Chapter 1, pg 1-6}\n", "\\addreading{@Bishop:book06}{Chapter 3, Section 3.1 up to pg 143}\n", "\\reading\n", "## Lecture on Basis Functions from GPRS Uganda" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('PoNbOnUnOao')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use of QR Decomposition for Numerical Stability\n", "\n", "In the last session we showed how rather than computing\n", "$\\inputMatrix^\\top\\inputMatrix$ as an intermediate step to our solution,\n", "we could compute the solution to the regressiond directly through\n", "[QR-decomposition](http://en.wikipedia.org/wiki/QR_decomposition). Now\n", "we will consider an example with non linear basis functions where such\n", "computation is critical for forming the right answer.\n", "\n", "*TODO* example with polynomials." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.random.normal(size=(10, 1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Phi = fourier(x, 5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(np.dot(Phi.T,Phi))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Phi*Phi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }