{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gaussian Processes\n",
"### [Neil D. Lawrence](http://inverseprobability.com), Amazon Cambridge and University of Sheffield\n",
"### 2019-01-09\n",
"\n",
"**Abstract**: Classical machine learning and statistical approaches to learning, such\n",
"as neural networks and linear regression, assume a parametric form for\n",
"functions. Gaussian process models are an alternative approach that\n",
"assumes a probabilistic prior over functions. This brings benefits, in\n",
"that uncertainty of function estimation is sustained throughout\n",
"inference, and some challenges: algorithms for fitting Gaussian\n",
"processes tend to be more complex than parametric models. In this\n",
"sessions I will introduce Gaussian processes and explain why sustaining\n",
"uncertainty is important.\n",
"\n",
"$$\n",
"\\newcommand{\\Amatrix}{\\mathbf{A}}\n",
"\\newcommand{\\KL}[2]{\\text{KL}\\left( #1\\,\\|\\,#2 \\right)}\n",
"\\newcommand{\\Kaast}{\\kernelMatrix_{\\mathbf{ \\ast}\\mathbf{ \\ast}}}\n",
"\\newcommand{\\Kastu}{\\kernelMatrix_{\\mathbf{ \\ast} \\inducingVector}}\n",
"\\newcommand{\\Kff}{\\kernelMatrix_{\\mappingFunctionVector \\mappingFunctionVector}}\n",
"\\newcommand{\\Kfu}{\\kernelMatrix_{\\mappingFunctionVector \\inducingVector}}\n",
"\\newcommand{\\Kuast}{\\kernelMatrix_{\\inducingVector \\bf\\ast}}\n",
"\\newcommand{\\Kuf}{\\kernelMatrix_{\\inducingVector \\mappingFunctionVector}}\n",
"\\newcommand{\\Kuu}{\\kernelMatrix_{\\inducingVector \\inducingVector}}\n",
"\\newcommand{\\Kuui}{\\Kuu^{-1}}\n",
"\\newcommand{\\Qaast}{\\mathbf{Q}_{\\bf \\ast \\ast}}\n",
"\\newcommand{\\Qastf}{\\mathbf{Q}_{\\ast \\mappingFunction}}\n",
"\\newcommand{\\Qfast}{\\mathbf{Q}_{\\mappingFunctionVector \\bf \\ast}}\n",
"\\newcommand{\\Qff}{\\mathbf{Q}_{\\mappingFunctionVector \\mappingFunctionVector}}\n",
"\\newcommand{\\aMatrix}{\\mathbf{A}}\n",
"\\newcommand{\\aScalar}{a}\n",
"\\newcommand{\\aVector}{\\mathbf{a}}\n",
"\\newcommand{\\acceleration}{a}\n",
"\\newcommand{\\bMatrix}{\\mathbf{B}}\n",
"\\newcommand{\\bScalar}{b}\n",
"\\newcommand{\\bVector}{\\mathbf{b}}\n",
"\\newcommand{\\basisFunc}{\\phi}\n",
"\\newcommand{\\basisFuncVector}{\\boldsymbol{ \\basisFunc}}\n",
"\\newcommand{\\basisFunction}{\\phi}\n",
"\\newcommand{\\basisLocation}{\\mu}\n",
"\\newcommand{\\basisMatrix}{\\boldsymbol{ \\Phi}}\n",
"\\newcommand{\\basisScalar}{\\basisFunction}\n",
"\\newcommand{\\basisVector}{\\boldsymbol{ \\basisFunction}}\n",
"\\newcommand{\\activationFunction}{\\phi}\n",
"\\newcommand{\\activationMatrix}{\\boldsymbol{ \\Phi}}\n",
"\\newcommand{\\activationScalar}{\\basisFunction}\n",
"\\newcommand{\\activationVector}{\\boldsymbol{ \\basisFunction}}\n",
"\\newcommand{\\bigO}{\\mathcal{O}}\n",
"\\newcommand{\\binomProb}{\\pi}\n",
"\\newcommand{\\cMatrix}{\\mathbf{C}}\n",
"\\newcommand{\\cbasisMatrix}{\\hat{\\boldsymbol{ \\Phi}}}\n",
"\\newcommand{\\cdataMatrix}{\\hat{\\dataMatrix}}\n",
"\\newcommand{\\cdataScalar}{\\hat{\\dataScalar}}\n",
"\\newcommand{\\cdataVector}{\\hat{\\dataVector}}\n",
"\\newcommand{\\centeredKernelMatrix}{\\mathbf{ \\MakeUppercase{\\centeredKernelScalar}}}\n",
"\\newcommand{\\centeredKernelScalar}{b}\n",
"\\newcommand{\\centeredKernelVector}{\\centeredKernelScalar}\n",
"\\newcommand{\\centeringMatrix}{\\mathbf{H}}\n",
"\\newcommand{\\chiSquaredDist}[2]{\\chi_{#1}^{2}\\left(#2\\right)}\n",
"\\newcommand{\\chiSquaredSamp}[1]{\\chi_{#1}^{2}}\n",
"\\newcommand{\\conditionalCovariance}{\\boldsymbol{ \\Sigma}}\n",
"\\newcommand{\\coregionalizationMatrix}{\\mathbf{B}}\n",
"\\newcommand{\\coregionalizationScalar}{b}\n",
"\\newcommand{\\coregionalizationVector}{\\mathbf{ \\coregionalizationScalar}}\n",
"\\newcommand{\\covDist}[2]{\\text{cov}_{#2}\\left(#1\\right)}\n",
"\\newcommand{\\covSamp}[1]{\\text{cov}\\left(#1\\right)}\n",
"\\newcommand{\\covarianceScalar}{c}\n",
"\\newcommand{\\covarianceVector}{\\mathbf{ \\covarianceScalar}}\n",
"\\newcommand{\\covarianceMatrix}{\\mathbf{C}}\n",
"\\newcommand{\\covarianceMatrixTwo}{\\boldsymbol{ \\Sigma}}\n",
"\\newcommand{\\croupierScalar}{s}\n",
"\\newcommand{\\croupierVector}{\\mathbf{ \\croupierScalar}}\n",
"\\newcommand{\\croupierMatrix}{\\mathbf{ \\MakeUppercase{\\croupierScalar}}}\n",
"\\newcommand{\\dataDim}{p}\n",
"\\newcommand{\\dataIndex}{i}\n",
"\\newcommand{\\dataIndexTwo}{j}\n",
"\\newcommand{\\dataMatrix}{\\mathbf{Y}}\n",
"\\newcommand{\\dataScalar}{y}\n",
"\\newcommand{\\dataSet}{\\mathcal{D}}\n",
"\\newcommand{\\dataStd}{\\sigma}\n",
"\\newcommand{\\dataVector}{\\mathbf{ \\dataScalar}}\n",
"\\newcommand{\\decayRate}{d}\n",
"\\newcommand{\\degreeMatrix}{\\mathbf{ \\MakeUppercase{\\degreeScalar}}}\n",
"\\newcommand{\\degreeScalar}{d}\n",
"\\newcommand{\\degreeVector}{\\mathbf{ \\degreeScalar}}\n",
"% Already defined by latex\n",
"%\\newcommand{\\det}[1]{\\left|#1\\right|}\n",
"\\newcommand{\\diag}[1]{\\text{diag}\\left(#1\\right)}\n",
"\\newcommand{\\diagonalMatrix}{\\mathbf{D}}\n",
"\\newcommand{\\diff}[2]{\\frac{\\text{d}#1}{\\text{d}#2}}\n",
"\\newcommand{\\diffTwo}[2]{\\frac{\\text{d}^2#1}{\\text{d}#2^2}}\n",
"\\newcommand{\\displacement}{x}\n",
"\\newcommand{\\displacementVector}{\\textbf{\\displacement}}\n",
"\\newcommand{\\distanceMatrix}{\\mathbf{ \\MakeUppercase{\\distanceScalar}}}\n",
"\\newcommand{\\distanceScalar}{d}\n",
"\\newcommand{\\distanceVector}{\\mathbf{ \\distanceScalar}}\n",
"\\newcommand{\\eigenvaltwo}{\\ell}\n",
"\\newcommand{\\eigenvaltwoMatrix}{\\mathbf{L}}\n",
"\\newcommand{\\eigenvaltwoVector}{\\mathbf{l}}\n",
"\\newcommand{\\eigenvalue}{\\lambda}\n",
"\\newcommand{\\eigenvalueMatrix}{\\boldsymbol{ \\Lambda}}\n",
"\\newcommand{\\eigenvalueVector}{\\boldsymbol{ \\lambda}}\n",
"\\newcommand{\\eigenvector}{\\mathbf{ \\eigenvectorScalar}}\n",
"\\newcommand{\\eigenvectorMatrix}{\\mathbf{U}}\n",
"\\newcommand{\\eigenvectorScalar}{u}\n",
"\\newcommand{\\eigenvectwo}{\\mathbf{v}}\n",
"\\newcommand{\\eigenvectwoMatrix}{\\mathbf{V}}\n",
"\\newcommand{\\eigenvectwoScalar}{v}\n",
"\\newcommand{\\entropy}[1]{\\mathcal{H}\\left(#1\\right)}\n",
"\\newcommand{\\errorFunction}{E}\n",
"\\newcommand{\\expDist}[2]{\\left<#1\\right>_{#2}}\n",
"\\newcommand{\\expSamp}[1]{\\left<#1\\right>}\n",
"\\newcommand{\\expectation}[1]{\\left\\langle #1 \\right\\rangle }\n",
"\\newcommand{\\expectationDist}[2]{\\left\\langle #1 \\right\\rangle _{#2}}\n",
"\\newcommand{\\expectedDistanceMatrix}{\\mathcal{D}}\n",
"\\newcommand{\\eye}{\\mathbf{I}}\n",
"\\newcommand{\\fantasyDim}{r}\n",
"\\newcommand{\\fantasyMatrix}{\\mathbf{ \\MakeUppercase{\\fantasyScalar}}}\n",
"\\newcommand{\\fantasyScalar}{z}\n",
"\\newcommand{\\fantasyVector}{\\mathbf{ \\fantasyScalar}}\n",
"\\newcommand{\\featureStd}{\\varsigma}\n",
"\\newcommand{\\gammaCdf}[3]{\\mathcal{GAMMA CDF}\\left(#1|#2,#3\\right)}\n",
"\\newcommand{\\gammaDist}[3]{\\mathcal{G}\\left(#1|#2,#3\\right)}\n",
"\\newcommand{\\gammaSamp}[2]{\\mathcal{G}\\left(#1,#2\\right)}\n",
"\\newcommand{\\gaussianDist}[3]{\\mathcal{N}\\left(#1|#2,#3\\right)}\n",
"\\newcommand{\\gaussianSamp}[2]{\\mathcal{N}\\left(#1,#2\\right)}\n",
"\\newcommand{\\given}{|}\n",
"\\newcommand{\\half}{\\frac{1}{2}}\n",
"\\newcommand{\\heaviside}{H}\n",
"\\newcommand{\\hiddenMatrix}{\\mathbf{ \\MakeUppercase{\\hiddenScalar}}}\n",
"\\newcommand{\\hiddenScalar}{h}\n",
"\\newcommand{\\hiddenVector}{\\mathbf{ \\hiddenScalar}}\n",
"\\newcommand{\\identityMatrix}{\\eye}\n",
"\\newcommand{\\inducingInputScalar}{z}\n",
"\\newcommand{\\inducingInputVector}{\\mathbf{ \\inducingInputScalar}}\n",
"\\newcommand{\\inducingInputMatrix}{\\mathbf{Z}}\n",
"\\newcommand{\\inducingScalar}{u}\n",
"\\newcommand{\\inducingVector}{\\mathbf{ \\inducingScalar}}\n",
"\\newcommand{\\inducingMatrix}{\\mathbf{U}}\n",
"\\newcommand{\\inlineDiff}[2]{\\text{d}#1/\\text{d}#2}\n",
"\\newcommand{\\inputDim}{q}\n",
"\\newcommand{\\inputMatrix}{\\mathbf{X}}\n",
"\\newcommand{\\inputScalar}{x}\n",
"\\newcommand{\\inputSpace}{\\mathcal{X}}\n",
"\\newcommand{\\inputVals}{\\inputVector}\n",
"\\newcommand{\\inputVector}{\\mathbf{ \\inputScalar}}\n",
"\\newcommand{\\iterNum}{k}\n",
"\\newcommand{\\kernel}{\\kernelScalar}\n",
"\\newcommand{\\kernelMatrix}{\\mathbf{K}}\n",
"\\newcommand{\\kernelScalar}{k}\n",
"\\newcommand{\\kernelVector}{\\mathbf{ \\kernelScalar}}\n",
"\\newcommand{\\kff}{\\kernelScalar_{\\mappingFunction \\mappingFunction}}\n",
"\\newcommand{\\kfu}{\\kernelVector_{\\mappingFunction \\inducingScalar}}\n",
"\\newcommand{\\kuf}{\\kernelVector_{\\inducingScalar \\mappingFunction}}\n",
"\\newcommand{\\kuu}{\\kernelVector_{\\inducingScalar \\inducingScalar}}\n",
"\\newcommand{\\lagrangeMultiplier}{\\lambda}\n",
"\\newcommand{\\lagrangeMultiplierMatrix}{\\boldsymbol{ \\Lambda}}\n",
"\\newcommand{\\lagrangian}{L}\n",
"\\newcommand{\\laplacianFactor}{\\mathbf{ \\MakeUppercase{\\laplacianFactorScalar}}}\n",
"\\newcommand{\\laplacianFactorScalar}{m}\n",
"\\newcommand{\\laplacianFactorVector}{\\mathbf{ \\laplacianFactorScalar}}\n",
"\\newcommand{\\laplacianMatrix}{\\mathbf{L}}\n",
"\\newcommand{\\laplacianScalar}{\\ell}\n",
"\\newcommand{\\laplacianVector}{\\mathbf{ \\ell}}\n",
"\\newcommand{\\latentDim}{q}\n",
"\\newcommand{\\latentDistanceMatrix}{\\boldsymbol{ \\Delta}}\n",
"\\newcommand{\\latentDistanceScalar}{\\delta}\n",
"\\newcommand{\\latentDistanceVector}{\\boldsymbol{ \\delta}}\n",
"\\newcommand{\\latentForce}{f}\n",
"\\newcommand{\\latentFunction}{u}\n",
"\\newcommand{\\latentFunctionVector}{\\mathbf{ \\latentFunction}}\n",
"\\newcommand{\\latentFunctionMatrix}{\\mathbf{ \\MakeUppercase{\\latentFunction}}}\n",
"\\newcommand{\\latentIndex}{j}\n",
"\\newcommand{\\latentScalar}{z}\n",
"\\newcommand{\\latentVector}{\\mathbf{ \\latentScalar}}\n",
"\\newcommand{\\latentMatrix}{\\mathbf{Z}}\n",
"\\newcommand{\\learnRate}{\\eta}\n",
"\\newcommand{\\lengthScale}{\\ell}\n",
"\\newcommand{\\rbfWidth}{\\ell}\n",
"\\newcommand{\\likelihoodBound}{\\mathcal{L}}\n",
"\\newcommand{\\likelihoodFunction}{L}\n",
"\\newcommand{\\locationScalar}{\\mu}\n",
"\\newcommand{\\locationVector}{\\boldsymbol{ \\locationScalar}}\n",
"\\newcommand{\\locationMatrix}{\\mathbf{M}}\n",
"\\newcommand{\\variance}[1]{\\text{var}\\left( #1 \\right)}\n",
"\\newcommand{\\mappingFunction}{f}\n",
"\\newcommand{\\mappingFunctionMatrix}{\\mathbf{F}}\n",
"\\newcommand{\\mappingFunctionTwo}{g}\n",
"\\newcommand{\\mappingFunctionTwoMatrix}{\\mathbf{G}}\n",
"\\newcommand{\\mappingFunctionTwoVector}{\\mathbf{ \\mappingFunctionTwo}}\n",
"\\newcommand{\\mappingFunctionVector}{\\mathbf{ \\mappingFunction}}\n",
"\\newcommand{\\scaleScalar}{s}\n",
"\\newcommand{\\mappingScalar}{w}\n",
"\\newcommand{\\mappingVector}{\\mathbf{ \\mappingScalar}}\n",
"\\newcommand{\\mappingMatrix}{\\mathbf{W}}\n",
"\\newcommand{\\mappingScalarTwo}{v}\n",
"\\newcommand{\\mappingVectorTwo}{\\mathbf{ \\mappingScalarTwo}}\n",
"\\newcommand{\\mappingMatrixTwo}{\\mathbf{V}}\n",
"\\newcommand{\\maxIters}{K}\n",
"\\newcommand{\\meanMatrix}{\\mathbf{M}}\n",
"\\newcommand{\\meanScalar}{\\mu}\n",
"\\newcommand{\\meanTwoMatrix}{\\mathbf{M}}\n",
"\\newcommand{\\meanTwoScalar}{m}\n",
"\\newcommand{\\meanTwoVector}{\\mathbf{ \\meanTwoScalar}}\n",
"\\newcommand{\\meanVector}{\\boldsymbol{ \\meanScalar}}\n",
"\\newcommand{\\mrnaConcentration}{m}\n",
"\\newcommand{\\naturalFrequency}{\\omega}\n",
"\\newcommand{\\neighborhood}[1]{\\mathcal{N}\\left( #1 \\right)}\n",
"\\newcommand{\\neilurl}{http://inverseprobability.com/}\n",
"\\newcommand{\\noiseMatrix}{\\boldsymbol{ E}}\n",
"\\newcommand{\\noiseScalar}{\\epsilon}\n",
"\\newcommand{\\noiseVector}{\\boldsymbol{ \\epsilon}}\n",
"\\newcommand{\\norm}[1]{\\left\\Vert #1 \\right\\Vert}\n",
"\\newcommand{\\normalizedLaplacianMatrix}{\\hat{\\mathbf{L}}}\n",
"\\newcommand{\\normalizedLaplacianScalar}{\\hat{\\ell}}\n",
"\\newcommand{\\normalizedLaplacianVector}{\\hat{\\mathbf{ \\ell}}}\n",
"\\newcommand{\\numActive}{m}\n",
"\\newcommand{\\numBasisFunc}{m}\n",
"\\newcommand{\\numComponents}{m}\n",
"\\newcommand{\\numComps}{K}\n",
"\\newcommand{\\numData}{n}\n",
"\\newcommand{\\numFeatures}{K}\n",
"\\newcommand{\\numHidden}{h}\n",
"\\newcommand{\\numInducing}{m}\n",
"\\newcommand{\\numLayers}{\\ell}\n",
"\\newcommand{\\numNeighbors}{K}\n",
"\\newcommand{\\numSequences}{s}\n",
"\\newcommand{\\numSuccess}{s}\n",
"\\newcommand{\\numTasks}{m}\n",
"\\newcommand{\\numTime}{T}\n",
"\\newcommand{\\numTrials}{S}\n",
"\\newcommand{\\outputIndex}{j}\n",
"\\newcommand{\\paramVector}{\\boldsymbol{ \\theta}}\n",
"\\newcommand{\\parameterMatrix}{\\boldsymbol{ \\Theta}}\n",
"\\newcommand{\\parameterScalar}{\\theta}\n",
"\\newcommand{\\parameterVector}{\\boldsymbol{ \\parameterScalar}}\n",
"\\newcommand{\\partDiff}[2]{\\frac{\\partial#1}{\\partial#2}}\n",
"\\newcommand{\\precisionScalar}{j}\n",
"\\newcommand{\\precisionVector}{\\mathbf{ \\precisionScalar}}\n",
"\\newcommand{\\precisionMatrix}{\\mathbf{J}}\n",
"\\newcommand{\\pseudotargetScalar}{\\widetilde{y}}\n",
"\\newcommand{\\pseudotargetVector}{\\mathbf{ \\pseudotargetScalar}}\n",
"\\newcommand{\\pseudotargetMatrix}{\\mathbf{ \\widetilde{Y}}}\n",
"\\newcommand{\\rank}[1]{\\text{rank}\\left(#1\\right)}\n",
"\\newcommand{\\rayleighDist}[2]{\\mathcal{R}\\left(#1|#2\\right)}\n",
"\\newcommand{\\rayleighSamp}[1]{\\mathcal{R}\\left(#1\\right)}\n",
"\\newcommand{\\responsibility}{r}\n",
"\\newcommand{\\rotationScalar}{r}\n",
"\\newcommand{\\rotationVector}{\\mathbf{ \\rotationScalar}}\n",
"\\newcommand{\\rotationMatrix}{\\mathbf{R}}\n",
"\\newcommand{\\sampleCovScalar}{s}\n",
"\\newcommand{\\sampleCovVector}{\\mathbf{ \\sampleCovScalar}}\n",
"\\newcommand{\\sampleCovMatrix}{\\mathbf{s}}\n",
"\\newcommand{\\scalarProduct}[2]{\\left\\langle{#1},{#2}\\right\\rangle}\n",
"\\newcommand{\\sign}[1]{\\text{sign}\\left(#1\\right)}\n",
"\\newcommand{\\sigmoid}[1]{\\sigma\\left(#1\\right)}\n",
"\\newcommand{\\singularvalue}{\\ell}\n",
"\\newcommand{\\singularvalueMatrix}{\\mathbf{L}}\n",
"\\newcommand{\\singularvalueVector}{\\mathbf{l}}\n",
"\\newcommand{\\sorth}{\\mathbf{u}}\n",
"\\newcommand{\\spar}{\\lambda}\n",
"\\newcommand{\\trace}[1]{\\text{tr}\\left(#1\\right)}\n",
"\\newcommand{\\BasalRate}{B}\n",
"\\newcommand{\\DampingCoefficient}{C}\n",
"\\newcommand{\\DecayRate}{D}\n",
"\\newcommand{\\Displacement}{X}\n",
"\\newcommand{\\LatentForce}{F}\n",
"\\newcommand{\\Mass}{M}\n",
"\\newcommand{\\Sensitivity}{S}\n",
"\\newcommand{\\basalRate}{b}\n",
"\\newcommand{\\dampingCoefficient}{c}\n",
"\\newcommand{\\mass}{m}\n",
"\\newcommand{\\sensitivity}{s}\n",
"\\newcommand{\\springScalar}{\\kappa}\n",
"\\newcommand{\\springVector}{\\boldsymbol{ \\kappa}}\n",
"\\newcommand{\\springMatrix}{\\boldsymbol{ \\mathcal{K}}}\n",
"\\newcommand{\\tfConcentration}{p}\n",
"\\newcommand{\\tfDecayRate}{\\delta}\n",
"\\newcommand{\\tfMrnaConcentration}{f}\n",
"\\newcommand{\\tfVector}{\\mathbf{ \\tfConcentration}}\n",
"\\newcommand{\\velocity}{v}\n",
"\\newcommand{\\sufficientStatsScalar}{g}\n",
"\\newcommand{\\sufficientStatsVector}{\\mathbf{ \\sufficientStatsScalar}}\n",
"\\newcommand{\\sufficientStatsMatrix}{\\mathbf{G}}\n",
"\\newcommand{\\switchScalar}{s}\n",
"\\newcommand{\\switchVector}{\\mathbf{ \\switchScalar}}\n",
"\\newcommand{\\switchMatrix}{\\mathbf{S}}\n",
"\\newcommand{\\tr}[1]{\\text{tr}\\left(#1\\right)}\n",
"\\newcommand{\\loneNorm}[1]{\\left\\Vert #1 \\right\\Vert_1}\n",
"\\newcommand{\\ltwoNorm}[1]{\\left\\Vert #1 \\right\\Vert_2}\n",
"\\newcommand{\\onenorm}[1]{\\left\\vert#1\\right\\vert_1}\n",
"\\newcommand{\\twonorm}[1]{\\left\\Vert #1 \\right\\Vert}\n",
"\\newcommand{\\vScalar}{v}\n",
"\\newcommand{\\vVector}{\\mathbf{v}}\n",
"\\newcommand{\\vMatrix}{\\mathbf{V}}\n",
"\\newcommand{\\varianceDist}[2]{\\text{var}_{#2}\\left( #1 \\right)}\n",
"% Already defined by latex\n",
"%\\newcommand{\\vec}{#1:}\n",
"\\newcommand{\\vecb}[1]{\\left(#1\\right):}\n",
"\\newcommand{\\weightScalar}{w}\n",
"\\newcommand{\\weightVector}{\\mathbf{ \\weightScalar}}\n",
"\\newcommand{\\weightMatrix}{\\mathbf{W}}\n",
"\\newcommand{\\weightedAdjacencyMatrix}{\\mathbf{A}}\n",
"\\newcommand{\\weightedAdjacencyScalar}{a}\n",
"\\newcommand{\\weightedAdjacencyVector}{\\mathbf{ \\weightedAdjacencyScalar}}\n",
"\\newcommand{\\onesVector}{\\mathbf{1}}\n",
"\\newcommand{\\zerosVector}{\\mathbf{0}}\n",
"$$\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\#\n",
"\n",
"[@Rasmussen:book06]{style=\"text-align:right\"}\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### What is Machine Learning?\n",
"\n",
". . .\n",
"\n",
"$$ \\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}$$\n",
"\n",
". . .\n",
"\n",
"- **data** : observations, could be actively or passively acquired\n",
" (meta-data).\n",
"\n",
". . .\n",
"\n",
"- **model** : assumptions, based on previous experience (other data!\n",
" transfer learning etc), or beliefs about the regularities of the\n",
" universe. Inductive bias.\n",
"\n",
". . .\n",
"\n",
"- **prediction** : an action to be taken or a categorization or a\n",
" quality score.\n",
"\n",
". . .\n",
"\n",
"- Royal Society Report: [Machine Learning: Power and Promise of\n",
" Computers that Learn by\n",
" Example](https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf)\n",
"\n",
"### What is Machine Learning?\n",
"\n",
"$$\\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}$$\n",
"\n",
"> - To combine data with a model need:\n",
"> - **a prediction function** $\\mappingFunction(\\cdot)$ includes our\n",
"> beliefs about the regularities of the universe\n",
"> - **an objective function** $\\errorFunction(\\cdot)$ defines the cost\n",
"> of misprediction.\n",
"\n",
"### Artificial Intelligence\n",
"\n",
"- Machine learning is a mainstay because of importance of prediction.\n",
"\n",
"### Uncertainty\n",
"\n",
"- Uncertainty in prediction arises from:\n",
"- scarcity of training data and\n",
"- mismatch between the set of prediction functions we choose and all\n",
" possible prediction functions.\n",
"- Also uncertainties in objective, leave those for another day.\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Neural Networks and Prediction Functions\n",
"\n",
"- adaptive non-linear function models inspired by simple neuron models\n",
" [@McCulloch:neuron43]\n",
"- have become popular because of their ability to model data.\n",
"- can be composed to form highly complex functions\n",
"- start by focussing on one hidden layer\n",
"\n",
"### Prediction Function of One Hidden Layer\n",
"\n",
"$$\n",
"\\mappingFunction(\\inputVector) = \\left.\\mappingVector^{(2)}\\right.^\\top \\activationVector(\\mappingMatrix_{1}, \\inputVector)\n",
"$$\n",
"\n",
"$\\mappingFunction(\\cdot)$ is a scalar function with vector inputs,\n",
"\n",
"$\\activationVector(\\cdot)$ is a vector function with vector inputs.\n",
"\n",
"- dimensionality of the vector function is known as the number of\n",
" hidden units, or the number of neurons.\n",
"\n",
"- elements of $\\activationVector(\\cdot)$ are the *activation* function\n",
" of the neural network\n",
"\n",
"- elements of $\\mappingMatrix_{1}$ are the parameters of the\n",
" activation functions.\n",
"\n",
"### Relations with Classical Statistics\n",
"\n",
"- In statistics activation functions are known as *basis functions*.\n",
"\n",
"- would think of this as a *linear model*: not linear predictions,\n",
" linear in the parameters\n",
"\n",
"- $\\mappingVector_{1}$ are *static* parameters.\n",
"\n",
"### Adaptive Basis Functions\n",
"\n",
"- In machine learning we optimize $\\mappingMatrix_{1}$ as well as\n",
" $\\mappingMatrix_{2}$ (which would normally be denoted in statistics\n",
" by $\\boldsymbol{\\beta}$).\n",
"\n",
"- Revisit that decision: follow the path of @Neal:bayesian94 and\n",
" @MacKay:bayesian92.\n",
"\n",
"- Consider the probabilistic approach.\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Probabilistic Modelling\n",
"\n",
"- Probabilistically we want, $$\n",
" p(\\dataScalar_*|\\dataVector, \\inputMatrix, \\inputVector_*),\n",
" $$ $\\dataScalar_*$ is a test output $\\inputVector_*$ is a test input\n",
" $\\inputMatrix$ is a training input matrix $\\dataVector$ is training\n",
" outputs\n",
"\n",
"### Joint Model of World\n",
"\n",
"$$\n",
"p(\\dataScalar_*|\\dataVector, \\inputMatrix, \\inputVector_*) = \\int p(\\dataScalar_*|\\inputVector_*, \\mappingMatrix) p(\\mappingMatrix | \\dataVector, \\inputMatrix) \\text{d} \\mappingMatrix\n",
"$$\n",
"\n",
". . .\n",
"\n",
"$\\mappingMatrix$ contains $\\mappingMatrix_1$ and $\\mappingMatrix_2$\n",
"\n",
"$p(\\mappingMatrix | \\dataVector, \\inputMatrix)$ is posterior density\n",
"\n",
"### Likelihood\n",
"\n",
"$p(\\dataScalar|\\inputVector, \\mappingMatrix)$ is the *likelihood* of\n",
"data point\n",
"\n",
". . .\n",
"\n",
"Normally assume independence: $$\n",
"p(\\dataVector|\\inputMatrix, \\mappingMatrix) = \\prod_{i=1}^\\numData p(\\dataScalar_i|\\inputVector_i, \\mappingMatrix),$$\n",
"\n",
"### Likelihood and Prediction Function\n",
"\n",
"$$\n",
"p(\\dataScalar_i | \\mappingFunction(\\inputVector_i)) = \\frac{1}{\\sqrt{2\\pi \\dataStd^2}} \\exp\\left(-\\frac{\\left(\\dataScalar_i - \\mappingFunction(\\inputVector_i)\\right)^2}{2\\dataStd^2}\\right)\n",
"$$\n",
"\n",
"### Unsupervised Learning\n",
"\n",
"- Can also consider priors over latents $$\n",
" p(\\dataVector_*|\\dataVector) = \\int p(\\dataVector_*|\\inputMatrix_*, \\mappingMatrix) p(\\mappingMatrix | \\dataVector, \\inputMatrix) p(\\inputMatrix) p(\\inputMatrix_*) \\text{d} \\mappingMatrix \\text{d} \\inputMatrix \\text{d}\\inputMatrix_*\n",
" $$\n",
"\n",
"- This gives *unsupervised learning*.\n",
"\n",
"### Probabilistic Inference\n",
"\n",
"- Data: $\\dataVector$\n",
"\n",
"- Model: $p(\\dataVector, \\dataVector^*)$\n",
"\n",
"- Prediction: $p(\\dataVector^*| \\dataVector)$\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Graphical Models\n",
"\n",
"- Represent joint distribution through *conditional dependencies*.\n",
"- E.g. Markov chain\n",
"\n",
"$$p(\\dataVector) = p(\\dataScalar_\\numData | \\dataScalar_{\\numData-1}) p(\\dataScalar_{\\numData-1}|\\dataScalar_{\\numData-2}) \\dots p(\\dataScalar_{2} | \\dataScalar_{1})$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import daft\n",
"from matplotlib import rc\n",
"\n",
"rc(\"font\", **{'family':'sans-serif','sans-serif':['Helvetica']}, size=30)\n",
"rc(\"text\", usetex=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pgm = daft.PGM(shape=[3, 1],\n",
" origin=[0, 0], \n",
" grid_unit=5, \n",
" node_unit=1.9, \n",
" observed_style='shaded',\n",
" line_width=3)\n",
"\n",
"\n",
"pgm.add_node(daft.Node(\"y_1\", r\"$y_1$\", 0.5, 0.5, fixed=False))\n",
"pgm.add_node(daft.Node(\"y_2\", r\"$y_2$\", 1.5, 0.5, fixed=False))\n",
"pgm.add_node(daft.Node(\"y_3\", r\"$y_3$\", 2.5, 0.5, fixed=False))\n",
"pgm.add_edge(\"y_1\", \"y_2\")\n",
"pgm.add_edge(\"y_2\", \"y_3\")\n",
"\n",
"pgm.render().figure.savefig(\"../slides/diagrams/ml/markov.svg\", transparent=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"### \n",
"\n",
"Predict Perioperative Risk of Clostridium Difficile Infection Following\n",
"Colon Surgery [@Steele:predictive12]\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Performing Inference\n",
"\n",
"- Easy to write in probabilities\n",
"\n",
"- But underlying this is a wealth of computational challenges.\n",
"\n",
"- High dimensional integrals typically require approximation.\n",
"\n",
"### Linear Models\n",
"\n",
"- In statistics, focussed more on *linear* model implied by $$\n",
" \\mappingFunction(\\inputVector) = \\left.\\mappingVector^{(2)}\\right.^\\top \\activationVector(\\mappingMatrix_1, \\inputVector)\n",
" $$\n",
"\n",
"- Hold $\\mappingMatrix_1$ fixed for given analysis.\n",
"\n",
"- Gaussian prior for $\\mappingMatrix$, $$\n",
" \\mappingVector^{(2)} \\sim \\gaussianSamp{\\zerosVector}{\\covarianceMatrix}.\n",
" $$ $$\n",
" \\dataScalar_i = \\mappingFunction(\\inputVector_i) + \\noiseScalar_i,\n",
" $$ where $$\n",
" \\noiseScalar_i \\sim \\gaussianSamp{0}{\\dataStd^2}\n",
" $$\n",
"\n",
"\\newslides{Linear Gaussian Models}\n",
"\n",
"- Normally integrals are complex but for this Gaussian linear case\n",
" they are trivial.\n",
"\n",
"### Multivariate Gaussian Properties\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Recall Univariate Gaussian Properties\n",
"\n",
". . .\n",
"\n",
"1. Sum of Gaussian variables is also Gaussian.\n",
"\n",
"$$\\dataScalar_i \\sim \\gaussianSamp{\\meanScalar_i}{\\dataStd_i^2}$$\n",
"\n",
". . .\n",
"\n",
"$$\\sum_{i=1}^{\\numData} \\dataScalar_i \\sim \\gaussianSamp{\\sum_{i=1}^\\numData \\meanScalar_i}{\\sum_{i=1}^\\numData\\dataStd_i^2}$$\n",
"\n",
"### Recall Univariate Gaussian Properties\n",
"\n",
"2. Scaling a Gaussian leads to a Gaussian.\n",
"\n",
". . .\n",
"\n",
"$$\\dataScalar \\sim \\gaussianSamp{\\meanScalar}{\\dataStd^2}$$\n",
"\n",
". . .\n",
"\n",
"$$\\mappingScalar\\dataScalar\\sim \\gaussianSamp{\\mappingScalar\\meanScalar}{\\mappingScalar^2 \\dataStd^2}$$\n",
"\n",
"### Multivariate Consequence\n",
"\n",
"[If]{style=\"text-align:left\"}\n",
"$$\\inputVector \\sim \\gaussianSamp{\\meanVector}{\\covarianceMatrix}$$\n",
"\n",
". . .\n",
"\n",
"[And]{style=\"text-align:left\"}\n",
"$$\\dataVector= \\mappingMatrix\\inputVector$$\n",
"\n",
". . .\n",
"\n",
"[Then]{style=\"text-align:left\"}\n",
"$$\\dataVector \\sim \\gaussianSamp{\\mappingMatrix\\meanVector}{\\mappingMatrix\\covarianceMatrix\\mappingMatrix^\\top}$$\n",
"\n",
"### Linear Gaussian Models\n",
"\n",
"1. linear Gaussian models are easier to deal with\n",
"2. Even the parameters *within* the process can be handled, by\n",
" considering a particular limit.\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Multivariate Gaussian Properties\n",
"\n",
"- If $$\n",
" \\dataVector = \\mappingMatrix \\inputVector + \\noiseVector,\n",
" $$\n",
"\n",
"- Assume $$\n",
" \\begin{align}\n",
" \\inputVector & \\sim \\gaussianSamp{\\meanVector}{\\covarianceMatrix}\\\\\n",
" \\noiseVector & \\sim \\gaussianSamp{\\zerosVector}{\\covarianceMatrixTwo}\n",
" \\end{align}\n",
" $$\n",
"- Then $$\n",
" \\dataVector \\sim \\gaussianSamp{\\mappingMatrix\\meanVector}{\\mappingMatrix\\covarianceMatrix\\mappingMatrix^\\top + \\covarianceMatrixTwo}.\n",
" $$ If $\\covarianceMatrixTwo=\\dataStd^2\\eye$, this is Probabilistic\n",
" Principal Component Analysis [@Tipping:probpca99], because we\n",
" integrated out the inputs (or *latent* variables they would be\n",
" called in that case).\n",
"\n",
"### Non linear on Inputs\n",
"\n",
"- Set each activation function computed at each data point to be\n",
"\n",
"$$\n",
"\\activationScalar_{i,j} = \\activationScalar(\\mappingVector^{(1)}_{j}, \\inputVector_{i})\n",
"$$ Define *design matrix* $$\n",
"\\activationMatrix = \n",
"\\begin{bmatrix}\n",
"\\activationScalar_{1, 1} & \\activationScalar_{1, 2} & \\dots & \\activationScalar_{1, \\numHidden} \\\\\n",
"\\activationScalar_{1, 2} & \\activationScalar_{1, 2} & \\dots & \\activationScalar_{1, \\numData} \\\\\n",
"\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"\\activationScalar_{\\numData, 1} & \\activationScalar_{\\numData, 2} & \\dots & \\activationScalar_{\\numData, \\numHidden}\n",
"\\end{bmatrix}.\n",
"$$\n",
"\n",
"### Matrix Representation of a Neural Network\n",
"\n",
"$$\\dataScalar\\left(\\inputVector\\right) = \\activationVector\\left(\\inputVector\\right)^\\top \\mappingVector + \\noiseScalar$$\n",
"\n",
". . .\n",
"\n",
"$$\\dataVector = \\activationMatrix\\mappingVector + \\noiseVector$$\n",
"\n",
". . .\n",
"\n",
"$$\\noiseVector \\sim \\gaussianSamp{\\zerosVector}{\\dataStd^2\\eye}$$\n",
"\n",
"### Prior Density\n",
"\n",
"- Define\n",
"\n",
"$$\n",
"\\mappingVector \\sim \\gaussianSamp{\\zerosVector}{\\alpha\\eye},\n",
"$$\n",
"\n",
"- Rules of multivariate Gaussians to see that,\n",
"\n",
"$$\n",
"\\dataVector \\sim \\gaussianSamp{\\zerosVector}{\\alpha \\activationMatrix \\activationMatrix^\\top + \\dataStd^2 \\eye}.\n",
"$$\n",
"\n",
"$$\n",
"\\kernelMatrix = \\alpha \\activationMatrix \\activationMatrix^\\top + \\dataStd^2 \\eye.\n",
"$$\n",
"\n",
"### Joint Gaussian Density\n",
"\n",
"- Elements are a function\n",
" $\\kernel_{i,j} = \\kernel\\left(\\inputVector_i, \\inputVector_j\\right)$\n",
"\n",
"$$\n",
"\\kernelMatrix = \\alpha \\activationMatrix \\activationMatrix^\\top + \\dataStd^2 \\eye.\n",
"$$\n",
"\n",
"### Covariance Function\n",
"\n",
"$$\n",
"\\kernel_\\mappingFunction\\left(\\inputVector_i, \\inputVector_j\\right) = \\alpha \\activationVector\\left(\\mappingMatrix_1, \\inputVector_i\\right)^\\top \\activationVector\\left(\\mappingMatrix_1, \\inputVector_j\\right)\n",
"$$\n",
"\n",
"- formed by inner products of the rows of the *design matrix*.\n",
"\n",
"### Gaussian Process\n",
"\n",
"- Instead of making assumptions about our density over each data\n",
" point, $\\dataScalar_i$ as i.i.d.\n",
"\n",
"- make a joint Gaussian assumption over our data.\n",
"\n",
"- covariance matrix is now a function of both the parameters of the\n",
" activation function, $\\mappingMatrix_1$, and the input variables,\n",
" $\\inputMatrix$.\n",
"\n",
"- Arises from integrating out $\\mappingVector^{(2)}$.\n",
"\n",
"### Basis Functions\n",
"\n",
"- Can be very complex, such as deep kernels, [@Cho:deep09] or could\n",
" even put a convolutional neural network inside.\n",
"- Viewing a neural network in this way is also what allows us to\n",
" beform sensible *batch* normalizations [@Ioffe:batch15].\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Non-degenerate Gaussian Processes\n",
"\n",
"- This process is *degenerate*.\n",
"- Covariance function is of rank at most $\\numHidden$.\n",
"- As $\\numData \\rightarrow \\infty$, covariance matrix is not full\n",
" rank.\n",
"- Leading to $\\det{\\kernelMatrix} = 0$\n",
"\n",
"### Infinite Networks\n",
"\n",
"- In ML Radford Neal [@Neal:bayesian94] asked \"what would happen if\n",
" you took $\\numHidden \\rightarrow \\infty$?\"\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Roughly Speaking\n",
"\n",
"- Instead of\n",
"\n",
"$$\n",
" \\begin{align*}\n",
" \\kernel_\\mappingFunction\\left(\\inputVector_i, \\inputVector_j\\right) & = \\alpha \\activationVector\\left(\\mappingMatrix_1, \\inputVector_i\\right)^\\top \\activationVector\\left(\\mappingMatrix_1, \\inputVector_j\\right)\\\\\n",
" & = \\alpha \\sum_k \\activationScalar\\left(\\mappingVector^{(1)}_k, \\inputVector_i\\right) \\activationScalar\\left(\\mappingVector^{(1)}_k, \\inputVector_j\\right)\n",
" \\end{align*}\n",
" $$\n",
"\n",
"- Sample infinitely many from a prior density,\n",
" $p(\\mappingVector^{(1)})$,\n",
"\n",
"$$\n",
"\\kernel_\\mappingFunction\\left(\\inputVector_i, \\inputVector_j\\right) = \\alpha \\int \\activationScalar\\left(\\mappingVector^{(1)}, \\inputVector_i\\right) \\activationScalar\\left(\\mappingVector^{(1)}, \\inputVector_j\\right) p(\\mappingVector^{(1)}) \\text{d}\\mappingVector^{(1)}\n",
"$$\n",
"\n",
"- Also applies for non-Gaussian $p(\\mappingVector^{(1)})$ because of\n",
" the *central limit theorem*.\n",
"\n",
"### Simple Probabilistic Program\n",
"\n",
"- If $$\n",
" \\begin{align*} \n",
" \\mappingVector^{(1)} & \\sim p(\\cdot)\\\\ \\phi_i & = \\activationScalar\\left(\\mappingVector^{(1)}, \\inputVector_i\\right), \n",
" \\end{align*}\n",
" $$ has finite variance.\n",
"\n",
"- Then taking number of hidden units to infinity, is also a Gaussian\n",
" process.\n",
"\n",
"### Further Reading\n",
"\n",
"- Chapter 2 of Neal's thesis [@Neal:bayesian94]\n",
"\n",
"- Rest of Neal's thesis. [@Neal:bayesian94]\n",
"\n",
"- David MacKay's PhD thesis [@MacKay:bayesian92]\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import Kernel"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import eq_cov"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kernel = Kernel(function=eq_cov,\n",
" name='Exponentiated Quadratic',\n",
" shortname='eq', \n",
" lengthscale=0.25)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"np.random.seed(10)\n",
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot.rejection_samples(kernel=kernel, \n",
" diagrams='../slides/diagrams/gp')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pods\n",
"from ipywidgets import IntSlider"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pods.notebook.display_plots('gp_rejection_sample{sample:0>3}.png', \n",
" directory='../slides/diagrams/gp', \n",
" sample=IntSlider(1,1,5,1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"### Distributions over Functions\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"np.random.seed(4949)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sampling a Function\n",
"\n",
"**Multi-variate Gaussians**\n",
"\n",
"- We will consider a Gaussian with a particular structure of\n",
" covariance matrix.\n",
"- Generate a single sample from this 25 dimensional Gaussian density,\n",
" $$\n",
" \\mappingFunctionVector=\\left[\\mappingFunction_{1},\\mappingFunction_{2}\\dots \\mappingFunction_{25}\\right].\n",
" $$\n",
"- We will plot these points against their index."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot\n",
"from mlai import Kernel, exponentiated_quadratic"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kernel=Kernel(function=exponentiated_quadratic, lengthscale=0.5)\n",
"plot.two_point_sample(kernel.K, diagrams='../slides/diagrams/gp')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pods\n",
"from ipywidgets import IntSlider"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(0, 0, 8, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gaussian Distribution Sample\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"❮\n",
" \n",
"\n",
"❯\n",
" \n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pods\n",
"from ipywidgets import IntSlider"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(9, 9, 12, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prediction of $\\mappingFunction_{2}$ from $\\mappingFunction_{1}$\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"❮\n",
" \n",
"\n",
"❯\n",
" \n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Uluru\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Prediction with Correlated Gaussians\n",
"\n",
"- Prediction of $\\mappingFunction_2$ from $\\mappingFunction_1$\n",
" requires *conditional density*.\n",
"- Conditional density is *also* Gaussian. $$\n",
" p(\\mappingFunction_2|\\mappingFunction_1) = \\gaussianDist{\\mappingFunction_2}{\\frac{\\kernelScalar_{1, 2}}{\\kernelScalar_{1, 1}}\\mappingFunction_1}{ \\kernelScalar_{2, 2} - \\frac{\\kernelScalar_{1,2}^2}{\\kernelScalar_{1,1}}}\n",
" $$ where covariance of joint density is given by $$\n",
" \\kernelMatrix = \\begin{bmatrix} \\kernelScalar_{1, 1} & \\kernelScalar_{1, 2}\\\\ \\kernelScalar_{2, 1} & \\kernelScalar_{2, 2}.\\end{bmatrix}\n",
" $$\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pods\n",
"from ipywidgets import IntSlider"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(13, 13, 17, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prediction of $\\mappingFunction_{8}$ from $\\mappingFunction_{1}$\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"❮\n",
" \n",
"\n",
"❯\n",
" \n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Key Object\n",
"\n",
"- Covariance function, $\\kernelMatrix$\n",
"- Determines properties of samples.\n",
"- Function of $\\inputMatrix$,\n",
" $$\\kernelScalar_{i,j} = \\kernelScalar(\\inputVector_i, \\inputVector_j)$$\n",
"\n",
"### Linear Algebra\n",
"\n",
"- Posterior mean\n",
" $$\\mappingFunction_D(\\inputVector_*) = \\kernelVector(\\inputVector_*, \\inputMatrix) \\kernelMatrix^{-1}\n",
" \\mathbf{y}$$\n",
"\n",
"- Posterior covariance\n",
" $$\\mathbf{C}_* = \\kernelMatrix_{*,*} - \\kernelMatrix_{*,\\mappingFunctionVector}\n",
" \\kernelMatrix^{-1} \\kernelMatrix_{\\mappingFunctionVector, *}$$\n",
"\n",
"### Linear Algebra\n",
"\n",
"- Posterior mean"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"$$\\mappingFunction_D(\\inputVector_*) = \\kernelVector(\\inputVector_*, \\inputMatrix) \\boldsymbol{\\alpha}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Posterior covariance\n",
" $$\\covarianceMatrix_* = \\kernelMatrix_{*,*} - \\kernelMatrix_{*,\\mappingFunctionVector}\n",
" \\kernelMatrix^{-1} \\kernelMatrix_{\\mappingFunctionVector, *}$$\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Exponentiated Quadratic Covariance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import Kernel"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import eq_cov"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kernel = Kernel(function=eq_cov,\n",
" name='Exponentiated Quadratic',\n",
" shortname='eq', \n",
" formula='\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\exp\\left(-\\frac{\\ltwoNorm{\\inputVector-\\inputVector^\\prime}^2}{2\\lengthScale^2}\\right)',\n",
" lengthscale=0.2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot.covariance_func(kernel=kernel, diagrams='../slides/diagrams/kern/')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"$$\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\exp\\left(-\\frac{\\ltwoNorm{\\inputVector-\\inputVector^\\prime}^2}{2\\lengthScale^2}\\right)$$\n",
" \n",
"\n",
"\n",
"\n",
"\\includesvgclass{../slides/diagrams/kern/eq_covariance.svg}\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### Olympic Marathon Data\n",
"\n",
"\n",
"\n",
"\n",
"- Gold medal times for Olympic Marathon since 1896.\n",
"- Marathons before 1924 didn’t have a standardised distance.\n",
"- Present results using pace per km.\n",
"- In 1904 Marathon was badly organised leading to very slow times.\n",
"\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"Image from Wikimedia Commons \n",
" \n",
" \n",
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import teaching_plots as plot\n",
"import mlai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"xlim = (1875,2030)\n",
"ylim = (2.5, 6.5)\n",
"yhat = (y-offset)/scale\n",
"\n",
"fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
"_ = ax.plot(x, y, 'r.',markersize=10)\n",
"ax.set_xlabel('year', fontsize=20)\n",
"ax.set_ylabel('pace min/km', fontsize=20)\n",
"ax.set_xlim(xlim)\n",
"ax.set_ylim(ylim)\n",
"\n",
"mlai.write_figure(figure=fig, \n",
" filename='../slides/diagrams/datasets/olympic-marathon.svg', \n",
" transparent=True, \n",
" frameon=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Olympic Marathon Data\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### Alan Turing\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"### Probability Winning Olympics?\n",
"\n",
"- He was a formidable Marathon runner.\n",
"- In 1946 he ran a time 2 hours 46 minutes.\n",
" - That's a pace of 3.95 min/km.\n",
"- What is the probability he would have won an Olympics if one had\n",
" been held in 1946?\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
"plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km', fontsize=20, portion=0.2)\n",
"ax.set_xlim(xlim)\n",
"ax.set_ylim(ylim)\n",
"mlai.write_figure(figure=fig,\n",
" filename='../slides/diagrams/gp/olympic-marathon-gp.svg', \n",
" transparent=True, frameon=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Olympic Marathon Data GP\n",
"\n",
" \n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gpoptimizeInit"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Learning Covariance Parameters\n",
"\n",
"Can we determine covariance parameters from the data?\n",
"\n",
"### \n",
"\n",
"$$\\gaussianDist{\\dataVector}{\\mathbf{0}}{\\kernelMatrix}=\\frac{1}{(2\\pi)^\\frac{\\numData}{2}{\\det{\\kernelMatrix}^{\\frac{1}{2}}}}{\\exp\\left(-\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}\\right)}$$\n",
"\n",
"### \n",
"\n",
"$$\\begin{aligned}\n",
" \\gaussianDist{\\dataVector}{\\mathbf{0}}{\\kernelMatrix}=\\frac{1}{(2\\pi)^\\frac{\\numData}{2}{\\color{white} \\det{\\kernelMatrix}^{\\frac{1}{2}}}}{\\color{white}\\exp\\left(-\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}\\right)}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"### \n",
"\n",
"$$\n",
"\\begin{aligned}\n",
" \\log \\gaussianDist{\\dataVector}{\\mathbf{0}}{\\kernelMatrix}=&{\\color{white}-\\frac{1}{2}\\log\\det{\\kernelMatrix}}{\\color{white}-\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}} \\\\ &-\\frac{\\numData}{2}\\log2\\pi\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"$$\n",
"\\errorFunction(\\parameterVector) = {\\color{white} \\frac{1}{2}\\log\\det{\\kernelMatrix}} + {\\color{white} \\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}}\n",
"$$\n",
"\n",
"### \n",
"\n",
"The parameters are *inside* the covariance function (matrix).\n",
"\\normalsize\n",
"$$\\kernelScalar_{i, j} = \\kernelScalar(\\inputVals_i, \\inputVals_j; \\parameterVector)$$\n",
"\n",
"### Eigendecomposition of Covariance\n",
"\n",
"[\\Large\n",
"$$\\kernelMatrix = \\rotationMatrix \\eigenvalueMatrix^2 \\rotationMatrix^\\top$$]{}\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
"\n",
"$\\eigenvalueMatrix$ represents distance on axes. $\\rotationMatrix$ gives\n",
"rotation.\n",
" \n",
" \n",
"
\n",
"### Eigendecomposition of Covariance\n",
"\n",
"- $\\eigenvalueMatrix$ is *diagonal*,\n",
" $\\rotationMatrix^\\top\\rotationMatrix = \\eye$.\n",
"- Useful representation since\n",
" $\\det{\\kernelMatrix} = \\det{\\eigenvalueMatrix^2} = \\det{\\eigenvalueMatrix}^2$.\n",
"\n",
"### Capacity control: ${\\color{white} \\log \\det{\\kernelMatrix}}$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"fill_color = [1., 1., 0.]\n",
"black_color = [0., 0., 0.]\n",
"blue_color = [0., 0., 1.]\n",
"magenta_color = [1., 0., 1.]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=plot.big_figsize)\n",
"ax.set_axis_off()\n",
"cax = fig.add_axes([0., 0., 1., 1.])\n",
"cax.set_axis_off()\n",
"\n",
"counter = 1\n",
"lambda1 = 0.5\n",
"lambda2 = 0.3\n",
"\n",
"cax.set_xlim([0., 1.])\n",
"cax.set_ylim([0., 1.])\n",
"\n",
"# Matrix label axes\n",
"tax2 = fig.add_axes([0, 0.47, 0.1, 0.1])\n",
"tax2.set_xlim([0, 1.])\n",
"tax2.set_ylim([0, 1.])\n",
"tax2.set_axis_off()\n",
"label_eigenvalue = tax2.text(0.5, 0.5, \"\\Large $\\eigenvalueMatrix=$\")\n",
"\n",
"ax = fig.add_axes([0.5, 0.25, 0.5, 0.5])\n",
"ax.set_xlim([-0.25, 0.6])\n",
"ax.set_ylim([-0.25, 0.6])\n",
"from matplotlib.patches import Polygon\n",
"pat_hand = ax.add_patch(Polygon(np.column_stack(([0, 0, lambda1, lambda1], \n",
" [0, lambda2, lambda2, 0])), \n",
" facecolor=fill_color, \n",
" edgecolor=black_color, \n",
" visible=False))\n",
"data = pat_hand.get_path().vertices\n",
"rotation_matrix = np.asarray([[np.sqrt(2)/2, -np.sqrt(2)/2], \n",
" [np.sqrt(2)/2, np.sqrt(2)/2]])\n",
"new = np.dot(rotation_matrix,data.T)\n",
"pat_hand = ax.add_patch(Polygon(np.column_stack(([0, 0, lambda1, lambda1], \n",
" [0, lambda2, lambda2, 0])), \n",
" facecolor=fill_color, \n",
" edgecolor=black_color, \n",
" visible=False))\n",
"pat_hand_rot = ax.add_patch(Polygon(new.T, \n",
" facecolor=fill_color, \n",
" edgecolor=black_color))\n",
"pat_hand_rot.set(visible=False)\n",
"\n",
"# 3D box\n",
"pat_hand3 = [ax.add_patch(Polygon(np.column_stack(([0, -0.2*lambda1, 0.8*lambda1, lambda1], \n",
" [0, -0.2*lambda2, -0.2*lambda2, 0])), \n",
" facecolor=fill_color, \n",
" edgecolor=black_color))]\n",
" \n",
"pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([0, -0.2*lambda1, -0.2*lambda1, 0], \n",
" [0, -0.2*lambda2, 0.8*lambda2, lambda2])), \n",
" facecolor=fill_color, \n",
" edgecolor=black_color)))\n",
"\n",
"pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([-0.2*lambda1, 0, lambda1, 0.8*lambda1], \n",
" [0.8*lambda2, lambda2, lambda2, 0.8*lambda2])), \n",
" facecolor=fill_color,\n",
" edgecolor=black_color)))\n",
"pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([lambda1, 0.8*lambda1, 0.8*lambda1, lambda1], \n",
" [lambda2, 0.8*lambda2, -0.2*lambda2, 0])), \n",
" facecolor=fill_color, \n",
" edgecolor=black_color)))\n",
"\n",
"for hand in pat_hand3:\n",
" hand.set(visible=False)\n",
"\n",
"ax.set_aspect('equal')\n",
"ax.set_axis_off()\n",
"xlim = ax.get_xlim()\n",
"ylim = ax.get_ylim()\n",
"xspan = xlim[1] - xlim[0]\n",
"yspan = ylim[1] - ylim[0]\n",
"#ar_one = ax.arrow([0, lambda1], [0, 0])\n",
"ar_one = ax.arrow(x=0, y=0, dx=lambda1, dy=0)\n",
"#ar_two = ax.arrow([0, 0], [0, lambda2])\n",
"ar_two = ax.arrow(x=0, y=0, dx=0, dy=lambda2)\n",
"#ar_three = ax.arrow([0, -0.2*lambda1], [0, -0.2*lambda2])\n",
"ar_three = ax.arrow(x=0, y=0, dx=-0.2*lambda1, dy=-0.2*lambda2)\n",
"ar_one_text = ax.text(0.5*lambda1, -0.05*yspan, \n",
" '$\\eigenvalue_1$', \n",
" horizontalalignment='center')\n",
"ar_two_text = ax.text(-0.05*xspan, 0.5*lambda2, \n",
" '$\\eigenvalue_2$', \n",
" horizontalalignment='center')\n",
"ar_three_text = ax.text(-0.05*xspan-0.1*lambda1, -0.1*lambda2+0.05*yspan, \n",
" '$\\eigenvalue_3$', \n",
" horizontalalignment='center')\n",
"ar_one.set(linewidth=3, visible=False, color=blue_color)\n",
"ar_one_text.set(visible=False)\n",
"\n",
"ar_two.set(linewidth=3, visible=False, color=blue_color)\n",
"ar_two_text.set(visible=False)\n",
"\n",
"ar_three.set(linewidth=3, visible=False, color=blue_color)\n",
"ar_three_text.set(visible=False)\n",
"\n",
"\n",
"matrix_ax = fig.add_axes([0.2, 0.35, 0.3, 0.3])\n",
"matrix_ax.set_aspect('equal')\n",
"matrix_ax.set_axis_off()\n",
"eigenvals = [['$\\eigenvalue_1$', '$0$'],['$0$', '$\\eigenvalue_2$']]\n",
"plot.matrix(eigenvals, \n",
" matrix_ax, \n",
" bracket_style='square', \n",
" type='entries', \n",
" bracket_color=black_color)\n",
"\n",
"\n",
"# First arrow\n",
"matrix_ax.cla()\n",
"plot.matrix(eigenvals, \n",
" matrix_ax, \n",
" bracket_style='square', \n",
" type='entries',\n",
" highlight=True,\n",
" highlight_row=[0, 0],\n",
" highlight_col=':',\n",
" highlight_color=magenta_color,\n",
" bracket_color=black_color)\n",
"\n",
"ar_one.set(visible=True)\n",
"ar_one_text.set(visible=True)\n",
"\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"# Second arrow\n",
"matrix_ax.cla()\n",
"plot.matrix(eigenvals, \n",
" matrix_ax, \n",
" bracket_style='square', \n",
" type='entries', \n",
" highlight=True,\n",
" highlight_row=[1,1],\n",
" highlight_col=':',\n",
" highlight_color=magenta_color,\n",
" bracket_color=black_color)\n",
"\n",
"ar_two.set(visible=True)\n",
"ar_two_text.set(visible=True)\n",
"\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"matrix_ax.cla()\n",
"plot.matrix(eigenvals, matrix_ax, \n",
" bracket_style='square', \n",
" type='entries', \n",
" bracket_color=black_color)\n",
"\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"\n",
"tax = fig.add_axes([0.1, 0.1, 0.8, 0.1])\n",
"tax.set_axis_off()\n",
"tax.set_xlim([0, 1])\n",
"tax.set_ylim([0, 1])\n",
"det_text = text(0.5, 0.5,\n",
" '\\Large $\\det{\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2$', \n",
" horizontalalignment='center')\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"axes(ax)\n",
"pat_hand.set(visible=True)\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"det_text_plot = text(0.5*lambda1, \n",
" 0.5*lambda2, \n",
" '\\Large $\\det{\\eigenvalueMatrix}$', \n",
" horizontalalignment='center')\n",
" \n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"\n",
"eigenvals2 = {'$\\eigenvalue_1$', '$0$' '$0$'; '$0$', '$\\eigenvalue_2$' '$0$'; '$0$', '$0$' '$\\eigenvalue_3$'}\n",
"axes(matrix_ax)\n",
"matrix_ax.cla()\n",
"plot.matrix(eigenvals2, matrix_ax, \n",
" bracket_style='square', \n",
" type='entries',\n",
" highlight=True,\n",
" highlight_row=[2,2],\n",
" highlight_col=':',\n",
" highlight_color=magenta_color)\n",
"\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"\n",
"ar_three.set(visible=True)\n",
"ar_three_text.set(visible=True)\n",
"pat_hand3.set(visible=True)\n",
"det_text.set(string='\\Large $\\det{\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2\\eigenvalue_3$')\n",
"\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"matrix_ax.cla()\n",
"plot.matrix(eigenvals, \n",
" matrix_ax, \n",
" bracket_style='square', \n",
" type='entries', \n",
" bracket_color=black_color)\n",
" \n",
"ar_three.set(visible=False)\n",
"ar_three_text.set(visible=False)\n",
"pat_hand3.set(visible=False)\n",
"det_text.set(string='\\Large $\\det{\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2$')\n",
"\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1\n",
"\n",
"\n",
"\n",
"det_text.set(string='\\Large $\\det{\\rotationMatrix\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2$')\n",
"label_eigenvalue.set(string='\\Large $\\rotationMatrix\\eigenvalueMatrix=$')\n",
"\n",
"\n",
"\n",
"rotate_object(rotation_matrix, ar_one)\n",
"rotate_object(rotation_matrix, ar_one_text)\n",
"rotate_object(rotation_matrix, ar_two)\n",
"rotate_object(rotation_matrix, ar_two_text)\n",
"rotate_object(rotation_matrix, det_textPlot)\n",
"pat_hand_rot.set(visible=True)\n",
"pat_hand.set(visible=False)\n",
"\n",
"W = [['$\\mappingScalar_{1, 1}$', '$\\mappingScalar_{1, 2}$'],[ '$\\mappingScalar_{2, 1}$', '$\\mappingScalar_{2, 2}$']]\n",
"plot.matrix(W, \n",
" matrix_ax, \n",
" bracket_style='square', \n",
" type='entries', \n",
" bracket_color=black_color)\n",
"\n",
"\n",
"file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n",
"mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n",
"counter += 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Data Fit: ${\\color{white} \\frac{\\dataVector^\\top\\kernelMatrix^{-1}\\dataVector}{2}}$\n",
"\n",
"\n",
"### $$\\errorFunction(\\parameterVector) = {\\color{white}\\frac{1}{2}\\log\\det{\\kernelMatrix}}+{\\color{white}\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}}$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import GPy\n",
"import teaching_plots as plot\n",
"import mlai\n",
"import gp_tutorial"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(125)\n",
"diagrams = '../slides/diagrams/gp'\n",
"\n",
"black_color=[0., 0., 0.]\n",
"red_color=[1., 0., 0.]\n",
"blue_color=[0., 0., 1.]\n",
"magenta_color=[1., 0., 1.]\n",
"fontsize=18"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_lim = [-2.2, 2.2]\n",
"y_ticks = [-2, -1, 0, 1, 2]\n",
"x_lim = [-2, 2]\n",
"x_ticks = [-2, -1, 0, 1, 2]\n",
"err_y_lim = [-12, 20]\n",
"\n",
"linewidth=3\n",
"markersize=15\n",
"markertype='.'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.linspace(-1, 1, 6)[:, np.newaxis]\n",
"xtest = np.linspace(x_lim[0], x_lim[1], 200)[:, np.newaxis]\n",
"\n",
"# True data\n",
"true_kern = GPy.kern.RBF(1) + GPy.kern.White(1)\n",
"true_kern.rbf.lengthscale = 1.0\n",
"true_kern.white.variance = 0.01\n",
"K = true_kern.K(x) \n",
"y = np.random.multivariate_normal(np.zeros((6,)), K, 1).T"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Fitted model\n",
"kern = GPy.kern.RBF(1) + GPy.kern.White(1)\n",
"kern.rbf.lengthscale = 1.0\n",
"kern.white.variance = 0.01\n",
"\n",
"lengthscales = np.asarray([0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 4, 8, 16, 100])\n",
"\n",
"fig1, ax1 = plt.subplots(figsize=plot.one_figsize) \n",
"fig2, ax2 = plt.subplots(figsize=plot.one_figsize) \n",
"line = ax2.semilogx(np.NaN, np.NaN, 'x-', \n",
" color=black_color)\n",
"ax.set_ylim(err_y_lim)\n",
"ax.set_xlim([0.025, 32])\n",
"ax.grid(True)\n",
"ax.set_xticks([0.01, 0.1, 1, 10, 100])\n",
"ax.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])\n",
"\n",
"\n",
"err = np.zeros_like(lengthscales)\n",
"err_log_det = np.zeros_like(lengthscales)\n",
"err_fit = np.zeros_like(lengthscales)\n",
"\n",
"counter = 0\n",
"for i, ls in enumerate(lengthscales):\n",
" kern.rbf.lengthscale=ls\n",
" K = kern.K(x) \n",
" invK, L, Li, log_det_K = GPy.util.linalg.pdinv(K)\n",
" err[i] = 0.5*(log_det_K + np.dot(np.dot(y.T,invK),y))\n",
" err_log_det[i] = 0.5*log_det_K\n",
" err_fit[i] = 0.5*np.dot(np.dot(y.T,invK), y)\n",
" Kx = kern.K(x, xtest)\n",
" ypred_mean = np.dot(np.dot(Kx.T, invK), y)\n",
" ypred_var = kern.Kdiag(xtest) - np.sum((np.dot(Kx.T,invK))*Kx.T, 1)\n",
" ypred_sd = np.sqrt(ypred_var)\n",
" ax1.clear()\n",
" _ = gp_tutorial.gpplot(xtest.flatten(),\n",
" ypred_mean.flatten(),\n",
" ypred_mean.flatten()-2*ypred_sd.flatten(),\n",
" ypred_mean.flatten()+2*ypred_sd.flatten(), \n",
" ax=ax1)\n",
" x_lim = ax1.get_xlim()\n",
" ax1.set_ylabel('$f(x)$', fontsize=fontsize)\n",
" ax1.set_xlabel('$x$', fontsize=fontsize)\n",
"\n",
" p = ax1.plot(x, y, markertype, color=black_color, markersize=markersize, linewidth=linewidth)\n",
" ax1.set_ylim(y_lim)\n",
" ax1.set_xlim(x_lim) \n",
" ax1.set_xticks(x_ticks)\n",
" #ax.set(box=False)\n",
" \n",
" ax1.plot([x_lim[0], x_lim[0]], y_lim, color=black_color)\n",
" ax1.plot(x_lim, [y_lim[0], y_lim[0]], color=black_color)\n",
"\n",
" file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)\n",
" mlai.write_figure(os.path.join(diagrams, file_name),\n",
" figure=fig1,\n",
" transparent=True)\n",
" counter += 1\n",
"\n",
" ax2.clear()\n",
" t = ax2.semilogx(lengthscales[0:i+1], err[0:i+1], 'x-', \n",
" color=magenta_color, \n",
" markersize=markersize,\n",
" linewidth=linewidth)\n",
" t2 = ax2.semilogx(lengthscales[0:i+1], err_log_det[0:i+1], 'x-', \n",
" color=blue_color, \n",
" markersize=markersize,\n",
" linewidth=linewidth)\n",
" t3 = ax2.semilogx(lengthscales[0:i+1], err_fit[0:i+1], 'x-', \n",
" color=red_color, \n",
" markersize=markersize,\n",
" linewidth=linewidth)\n",
" ax2.set_ylim(err_y_lim)\n",
" ax2.set_xlim([0.025, 32])\n",
" ax2.set_xticks([0.01, 0.1, 1, 10, 100])\n",
" ax2.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])\n",
"\n",
" ax2.grid(True)\n",
"\n",
" ax2.set_ylabel('negative log likelihood', fontsize=fontsize)\n",
" ax2.set_xlabel('length scale, $\\ell$', fontsize=fontsize)\n",
" file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)\n",
" mlai.write_figure(os.path.join(diagrams, file_name),\n",
" figure=fig2,\n",
" transparent=True)\n",
" counter += 1\n",
" #ax.set_box(False)\n",
" xlim = ax2.get_xlim()\n",
" ax2.plot([xlim[0], xlim[0]], err_y_lim, color=black_color)\n",
" ax2.plot(xlim, [err_y_lim[0], err_y_lim[0]], color=black_color)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" \n",
"\n",
"❮\n",
" \n",
"\n",
"❯\n",
" \n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise001}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise003}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise005}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise007}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise009}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise011}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise013}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise015}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise017}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise019}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
" \n",
" \n",
"\n",
"\\includesvg{../slides/diagrams/gp/gp-optimise021}\n",
" \n",
" \n",
"
\n",
"\n",
"
\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### Della Gatta Gene Data\n",
"\n",
"- Given given expression levels in the form of a time series from\n",
" @DellaGatta:direct08."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import teaching_plots as plot\n",
"import mlai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"xlim = (-20,260)\n",
"ylim = (5, 7.5)\n",
"yhat = (y-offset)/scale\n",
"\n",
"fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
"_ = ax.plot(x, y, 'r.',markersize=10)\n",
"ax.set_xlabel('time/min', fontsize=20)\n",
"ax.set_ylabel('expression', fontsize=20)\n",
"ax.set_xlim(xlim)\n",
"ax.set_ylim(ylim)\n",
"\n",
"mlai.write_figure(figure=fig, \n",
" filename='../slides/diagrams/datasets/della-gatta-gene.svg', \n",
" transparent=True, \n",
" frameon=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Della Gatta Gene Data\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Gene Expression Example\n",
"\n",
"- Want to detect if a gene is expressed or not, fit a GP to each gene\n",
" @Kalaitzis:simple11.\n",
"\n",
"### \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
" \n",
"###"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
"plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n",
"ax.set_xlim(xlim)\n",
"ax.set_ylim(ylim)\n",
"ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full.log_likelihood()), fontsize=20)\n",
"mlai.write_figure(figure=fig,\n",
" filename='../slides/diagrams/gp/della-gatta-gene-gp.svg', \n",
" transparent=True, frameon=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TP53 Gene Data GP\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
"plot.model_output(m_full2, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n",
"ax.set_xlim(xlim)\n",
"ax.set_ylim(ylim)\n",
"ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full2.log_likelihood()), fontsize=20)\n",
"mlai.write_figure(figure=fig,\n",
" filename='../slides/diagrams/gp/della-gatta-gene-gp2.svg', \n",
" transparent=True, frameon=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TP53 Gene Data GP\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
"plot.model_output(m_full3, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n",
"ax.set_xlim(xlim)\n",
"ax.set_ylim(ylim)\n",
"ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full3.log_likelihood()), fontsize=20)\n",
"mlai.write_figure(figure=fig,\n",
" filename='../slides/diagrams/gp/della-gatta-gene-gp3.svg', \n",
" transparent=True, frameon=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TP53 Gene Data GP\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot.multiple_optima(diagrams='../slides/diagrams/gp')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multiple Optima\n",
"\n",
" \n",
"\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### Example: Prediction of Malaria Incidence in Uganda\n",
"\n",
"[ ]{style=\"text-align:right\"}\n",
"\n",
"- Work with Ricardo Andrade Pacheco, John Quinn and Martin Mubaganzi\n",
" (Makerere University, Uganda)\n",
"- See [AI-DEV Group](http://air.ug/research.html).\n",
"\n",
"### Malaria Prediction in Uganda\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"[[@Andrade:consistent14,@Mubangizi:malaria14]]{style=\"text-align:right\"}\n",
"\n",
"### Kapchorwa District\n",
"\n",
" \n",
"\n",
"### Tororo District\n",
"\n",
" \n",
"\n",
"### Malaria Prediction in Nagongera (Sentinel Site)\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Mubende District\n",
"\n",
" \n",
"\n",
"### Malaria Prediction in Uganda\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### GP School at Makerere\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Kabarole District\n",
"\n",
" \n",
"\n",
"### Early Warning Systems\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Early Warning Systems\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### Additive Covariance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import Kernel"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import linear_cov"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import eq_cov"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlai import add_cov"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kernel = Kernel(function=add_cov,\n",
" name='Additive',\n",
" shortname='add', \n",
" formula='\\kernelScalar_f(\\inputVector, \\inputVector^\\prime) = \\kernelScalar_g(\\inputVector, \\inputVector^\\prime) + \\kernelScalar_h(\\inputVector, \\inputVector^\\prime)', \n",
" kerns=[linear_cov, eq_cov], \n",
" kern_args=[{'variance': 25}, {'lengthscale' : 0.2}])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot.covariance_func(kernel=kernel, diagrams='../slides/diagrams/kern/')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"$$\\kernelScalar_f(\\inputVector, \\inputVector^\\prime) = \\kernelScalar_g(\\inputVector, \\inputVector^\\prime) + \\kernelScalar_h(\\inputVector, \\inputVector^\\prime)$$\n",
" \n",
"\n",
"\n",
"\n",
"\\includesvgclass{../slides/diagrams/kern/add_covariance.svg}\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"{\\#\\#\\# Analysis of US Birth Rates \\#\\#\\#\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Gelman Book\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"@Gelman:bayesian13\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### Basis Function Covariance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot\n",
"import mlai\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"basis = mlai.Basis(function=radial, \n",
" number=3,\n",
" data_limits=[-0.5, 0.5], \n",
" width=0.125)\n",
"kernel = mlai.Kernel(function=basis_cov,\n",
" name='Basis',\n",
" shortname='basis', \n",
" formula='\\kernel(\\inputVector, \\inputVector^\\prime) = \\basisVector(\\inputVector)^\\top \\basisVector(\\inputVector^\\prime)',\n",
" basis=basis)\n",
" \n",
"plot.covariance_func(kernel, diagrams='../slides/diagrams/kern/')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"$$\\kernel(\\inputVector, \\inputVector^\\prime) = \\basisVector(\\inputVector)^\\top \\basisVector(\\inputVector^\\prime)$$\n",
" \n",
"\n",
"\n",
"\n",
"\\includesvgclass{../slides/diagrams/kern/basis_covariance.svg}\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# Brownian Covariance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot\n",
"import mlai\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"t=np.linspace(0, 2, 200)[:, np.newaxis]\n",
"kernel = mlai.Kernel(function=brownian_cov,\n",
" name='Brownian',\n",
" formula='\\kernelScalar(t, t^\\prime)=\\alpha \\min(t, t^\\prime)',\n",
" shortname='brownian')\n",
"plot.covariance_func(kernel, t, diagrams='../slides/diagrams/kern/')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"$$\\kernelScalar(t, t^\\prime)=\\alpha \\min(t, t^\\prime)$$\n",
" \n",
"\n",
"\n",
"\n",
"\\includesvgclass{../slides/diagrams/kern/brownian_covariance.svg}\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\\#\\#\\# MLP Covariance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import teaching_plots as plot\n",
"import mlai\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kernel = mlai.Kernel(function=mlp_cov,\n",
" name='Multilayer Perceptron',\n",
" shortname='mlp', \n",
" formula='\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\arcsin\\left(\\frac{w \\inputVector^\\top \\inputVector^\\prime + b}{\\sqrt{\\left(w \\inputVector^\\top \\inputVector + b + 1\\right)\\left(w \\left.\\inputVector^\\prime\\right.^\\top \\inputVector^\\prime + b + 1\\right)}}\\right)',\n",
" w=5, b=0.5)\n",
" \n",
"plot.covariance_func(kernel, diagrams='../slides/diagrams/kern/')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"$$\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\arcsin\\left(\\frac{w \\inputVector^\\top \\inputVector^\\prime + b}{\\sqrt{\\left(w \\inputVector^\\top \\inputVector + b + 1\\right)\\left(w \\left.\\inputVector^\\prime\\right.^\\top \\inputVector^\\prime + b + 1\\right)}}\\right)$$\n",
" \n",
"\n",
"\n",
"\n",
"\\includesvgclass{../slides/diagrams/kern/mlp_covariance.svg}\n",
" \n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### GPSS: Gaussian Process Summer School\n",
"\n",
"\n",
"\n",
"\n",
"- \n",
"- Next one is in Sheffield in *September 2019*.\n",
"- Many lectures from past meetings available online\n",
"\n",
" \n",
"\n",
"\n",
"\n",
"\\includesvgclass{../slides/diagrams/logo/gpss-logo.svg}\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### GPy: A Gaussian Process Framework in Python\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
" \n",
"### GPy: A Gaussian Process Framework in Python\n",
"\n",
"- BSD Licensed software base.\n",
"- Wide availability of libraries, 'modern' scripting language.\n",
"- Allows us to set projects to undergraduates in Comp Sci that use\n",
" GPs.\n",
"- Available through GitHub \n",
"- Reproducible Research with Jupyter Notebook.\n",
"\n",
"### Features\n",
"\n",
"- Probabilistic-style programming (specify the model, not the\n",
" algorithm).\n",
"- Non-Gaussian likelihoods.\n",
"- Multivariate outputs.\n",
"- Dimensionality reduction.\n",
"- Approximations for large data sets.\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### Other Software\n",
"\n",
"- [GPflow](https://github.com/GPflow/GPflow)\n",
"- [GPyTorch](https://github.com/cornellius-gp/gpytorch)\n",
"\n",
"[\\small{[edit] }]{style=\"text-align:right\"}\n",
"\n",
"### MXFusion: Modular Probabilistic Programming on MXNet\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
" \n",
"### MxFusion\n",
"\n",
"\n",
"\n",
"\n",
"- Work by Eric Meissner and Zhenwen Dai.\n",
"- Probabilistic programming.\n",
"- Available on [Github](https://github.com/amzn/mxfusion)\n",
" \n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"
\n",
"\n",
"
\n",
"\n",
" \n",
" \n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Acknowledgments\n",
"\n",
"Stefanos Eleftheriadis, John Bronskill, Hugh Salimbeni, Rich Turner,\n",
"Zhenwen Dai, Javier Gonzalez, Andreas Damianou, Mark Pullin, Michael\n",
"Smith, James Hensman, John Quinn, Martin Mubangizi.\n",
"\n",
"### Thanks!\n",
"\n",
"- twitter: @lawrennd\n",
"- blog:\n",
" [http://inverseprobability.com](http://inverseprobability.com/blog.html)\n",
"\n",
"### References {#references .unnumbered}"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}