{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gaussian Processes\n", "### [Neil D. Lawrence](http://inverseprobability.com), Amazon Cambridge and University of Sheffield\n", "### 2019-01-09\n", "\n", "**Abstract**: Classical machine learning and statistical approaches to learning, such\n", "as neural networks and linear regression, assume a parametric form for\n", "functions. Gaussian process models are an alternative approach that\n", "assumes a probabilistic prior over functions. This brings benefits, in\n", "that uncertainty of function estimation is sustained throughout\n", "inference, and some challenges: algorithms for fitting Gaussian\n", "processes tend to be more complex than parametric models. In this\n", "sessions I will introduce Gaussian processes and explain why sustaining\n", "uncertainty is important.\n", "\n", "$$\n", "\\newcommand{\\Amatrix}{\\mathbf{A}}\n", "\\newcommand{\\KL}[2]{\\text{KL}\\left( #1\\,\\|\\,#2 \\right)}\n", "\\newcommand{\\Kaast}{\\kernelMatrix_{\\mathbf{ \\ast}\\mathbf{ \\ast}}}\n", "\\newcommand{\\Kastu}{\\kernelMatrix_{\\mathbf{ \\ast} \\inducingVector}}\n", "\\newcommand{\\Kff}{\\kernelMatrix_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kfu}{\\kernelMatrix_{\\mappingFunctionVector \\inducingVector}}\n", "\\newcommand{\\Kuast}{\\kernelMatrix_{\\inducingVector \\bf\\ast}}\n", "\\newcommand{\\Kuf}{\\kernelMatrix_{\\inducingVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kuu}{\\kernelMatrix_{\\inducingVector \\inducingVector}}\n", "\\newcommand{\\Kuui}{\\Kuu^{-1}}\n", "\\newcommand{\\Qaast}{\\mathbf{Q}_{\\bf \\ast \\ast}}\n", "\\newcommand{\\Qastf}{\\mathbf{Q}_{\\ast \\mappingFunction}}\n", "\\newcommand{\\Qfast}{\\mathbf{Q}_{\\mappingFunctionVector \\bf \\ast}}\n", "\\newcommand{\\Qff}{\\mathbf{Q}_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\aMatrix}{\\mathbf{A}}\n", "\\newcommand{\\aScalar}{a}\n", "\\newcommand{\\aVector}{\\mathbf{a}}\n", "\\newcommand{\\acceleration}{a}\n", "\\newcommand{\\bMatrix}{\\mathbf{B}}\n", "\\newcommand{\\bScalar}{b}\n", "\\newcommand{\\bVector}{\\mathbf{b}}\n", "\\newcommand{\\basisFunc}{\\phi}\n", "\\newcommand{\\basisFuncVector}{\\boldsymbol{ \\basisFunc}}\n", "\\newcommand{\\basisFunction}{\\phi}\n", "\\newcommand{\\basisLocation}{\\mu}\n", "\\newcommand{\\basisMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\basisScalar}{\\basisFunction}\n", "\\newcommand{\\basisVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\activationFunction}{\\phi}\n", "\\newcommand{\\activationMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\activationScalar}{\\basisFunction}\n", "\\newcommand{\\activationVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\bigO}{\\mathcal{O}}\n", "\\newcommand{\\binomProb}{\\pi}\n", "\\newcommand{\\cMatrix}{\\mathbf{C}}\n", "\\newcommand{\\cbasisMatrix}{\\hat{\\boldsymbol{ \\Phi}}}\n", "\\newcommand{\\cdataMatrix}{\\hat{\\dataMatrix}}\n", "\\newcommand{\\cdataScalar}{\\hat{\\dataScalar}}\n", "\\newcommand{\\cdataVector}{\\hat{\\dataVector}}\n", "\\newcommand{\\centeredKernelMatrix}{\\mathbf{ \\MakeUppercase{\\centeredKernelScalar}}}\n", "\\newcommand{\\centeredKernelScalar}{b}\n", "\\newcommand{\\centeredKernelVector}{\\centeredKernelScalar}\n", "\\newcommand{\\centeringMatrix}{\\mathbf{H}}\n", "\\newcommand{\\chiSquaredDist}[2]{\\chi_{#1}^{2}\\left(#2\\right)}\n", "\\newcommand{\\chiSquaredSamp}[1]{\\chi_{#1}^{2}}\n", "\\newcommand{\\conditionalCovariance}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\coregionalizationMatrix}{\\mathbf{B}}\n", "\\newcommand{\\coregionalizationScalar}{b}\n", "\\newcommand{\\coregionalizationVector}{\\mathbf{ \\coregionalizationScalar}}\n", "\\newcommand{\\covDist}[2]{\\text{cov}_{#2}\\left(#1\\right)}\n", "\\newcommand{\\covSamp}[1]{\\text{cov}\\left(#1\\right)}\n", "\\newcommand{\\covarianceScalar}{c}\n", "\\newcommand{\\covarianceVector}{\\mathbf{ \\covarianceScalar}}\n", "\\newcommand{\\covarianceMatrix}{\\mathbf{C}}\n", "\\newcommand{\\covarianceMatrixTwo}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\croupierScalar}{s}\n", "\\newcommand{\\croupierVector}{\\mathbf{ \\croupierScalar}}\n", "\\newcommand{\\croupierMatrix}{\\mathbf{ \\MakeUppercase{\\croupierScalar}}}\n", "\\newcommand{\\dataDim}{p}\n", "\\newcommand{\\dataIndex}{i}\n", "\\newcommand{\\dataIndexTwo}{j}\n", "\\newcommand{\\dataMatrix}{\\mathbf{Y}}\n", "\\newcommand{\\dataScalar}{y}\n", "\\newcommand{\\dataSet}{\\mathcal{D}}\n", "\\newcommand{\\dataStd}{\\sigma}\n", "\\newcommand{\\dataVector}{\\mathbf{ \\dataScalar}}\n", "\\newcommand{\\decayRate}{d}\n", "\\newcommand{\\degreeMatrix}{\\mathbf{ \\MakeUppercase{\\degreeScalar}}}\n", "\\newcommand{\\degreeScalar}{d}\n", "\\newcommand{\\degreeVector}{\\mathbf{ \\degreeScalar}}\n", "% Already defined by latex\n", "%\\newcommand{\\det}[1]{\\left|#1\\right|}\n", "\\newcommand{\\diag}[1]{\\text{diag}\\left(#1\\right)}\n", "\\newcommand{\\diagonalMatrix}{\\mathbf{D}}\n", "\\newcommand{\\diff}[2]{\\frac{\\text{d}#1}{\\text{d}#2}}\n", "\\newcommand{\\diffTwo}[2]{\\frac{\\text{d}^2#1}{\\text{d}#2^2}}\n", "\\newcommand{\\displacement}{x}\n", "\\newcommand{\\displacementVector}{\\textbf{\\displacement}}\n", "\\newcommand{\\distanceMatrix}{\\mathbf{ \\MakeUppercase{\\distanceScalar}}}\n", "\\newcommand{\\distanceScalar}{d}\n", "\\newcommand{\\distanceVector}{\\mathbf{ \\distanceScalar}}\n", "\\newcommand{\\eigenvaltwo}{\\ell}\n", "\\newcommand{\\eigenvaltwoMatrix}{\\mathbf{L}}\n", "\\newcommand{\\eigenvaltwoVector}{\\mathbf{l}}\n", "\\newcommand{\\eigenvalue}{\\lambda}\n", "\\newcommand{\\eigenvalueMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\eigenvalueVector}{\\boldsymbol{ \\lambda}}\n", "\\newcommand{\\eigenvector}{\\mathbf{ \\eigenvectorScalar}}\n", "\\newcommand{\\eigenvectorMatrix}{\\mathbf{U}}\n", "\\newcommand{\\eigenvectorScalar}{u}\n", "\\newcommand{\\eigenvectwo}{\\mathbf{v}}\n", "\\newcommand{\\eigenvectwoMatrix}{\\mathbf{V}}\n", "\\newcommand{\\eigenvectwoScalar}{v}\n", "\\newcommand{\\entropy}[1]{\\mathcal{H}\\left(#1\\right)}\n", "\\newcommand{\\errorFunction}{E}\n", "\\newcommand{\\expDist}[2]{\\left<#1\\right>_{#2}}\n", "\\newcommand{\\expSamp}[1]{\\left<#1\\right>}\n", "\\newcommand{\\expectation}[1]{\\left\\langle #1 \\right\\rangle }\n", "\\newcommand{\\expectationDist}[2]{\\left\\langle #1 \\right\\rangle _{#2}}\n", "\\newcommand{\\expectedDistanceMatrix}{\\mathcal{D}}\n", "\\newcommand{\\eye}{\\mathbf{I}}\n", "\\newcommand{\\fantasyDim}{r}\n", "\\newcommand{\\fantasyMatrix}{\\mathbf{ \\MakeUppercase{\\fantasyScalar}}}\n", "\\newcommand{\\fantasyScalar}{z}\n", "\\newcommand{\\fantasyVector}{\\mathbf{ \\fantasyScalar}}\n", "\\newcommand{\\featureStd}{\\varsigma}\n", "\\newcommand{\\gammaCdf}[3]{\\mathcal{GAMMA CDF}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaDist}[3]{\\mathcal{G}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaSamp}[2]{\\mathcal{G}\\left(#1,#2\\right)}\n", "\\newcommand{\\gaussianDist}[3]{\\mathcal{N}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gaussianSamp}[2]{\\mathcal{N}\\left(#1,#2\\right)}\n", "\\newcommand{\\given}{|}\n", "\\newcommand{\\half}{\\frac{1}{2}}\n", "\\newcommand{\\heaviside}{H}\n", "\\newcommand{\\hiddenMatrix}{\\mathbf{ \\MakeUppercase{\\hiddenScalar}}}\n", "\\newcommand{\\hiddenScalar}{h}\n", "\\newcommand{\\hiddenVector}{\\mathbf{ \\hiddenScalar}}\n", "\\newcommand{\\identityMatrix}{\\eye}\n", "\\newcommand{\\inducingInputScalar}{z}\n", "\\newcommand{\\inducingInputVector}{\\mathbf{ \\inducingInputScalar}}\n", "\\newcommand{\\inducingInputMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\inducingScalar}{u}\n", "\\newcommand{\\inducingVector}{\\mathbf{ \\inducingScalar}}\n", "\\newcommand{\\inducingMatrix}{\\mathbf{U}}\n", "\\newcommand{\\inlineDiff}[2]{\\text{d}#1/\\text{d}#2}\n", "\\newcommand{\\inputDim}{q}\n", "\\newcommand{\\inputMatrix}{\\mathbf{X}}\n", "\\newcommand{\\inputScalar}{x}\n", "\\newcommand{\\inputSpace}{\\mathcal{X}}\n", "\\newcommand{\\inputVals}{\\inputVector}\n", "\\newcommand{\\inputVector}{\\mathbf{ \\inputScalar}}\n", "\\newcommand{\\iterNum}{k}\n", "\\newcommand{\\kernel}{\\kernelScalar}\n", "\\newcommand{\\kernelMatrix}{\\mathbf{K}}\n", "\\newcommand{\\kernelScalar}{k}\n", "\\newcommand{\\kernelVector}{\\mathbf{ \\kernelScalar}}\n", "\\newcommand{\\kff}{\\kernelScalar_{\\mappingFunction \\mappingFunction}}\n", "\\newcommand{\\kfu}{\\kernelVector_{\\mappingFunction \\inducingScalar}}\n", "\\newcommand{\\kuf}{\\kernelVector_{\\inducingScalar \\mappingFunction}}\n", "\\newcommand{\\kuu}{\\kernelVector_{\\inducingScalar \\inducingScalar}}\n", "\\newcommand{\\lagrangeMultiplier}{\\lambda}\n", "\\newcommand{\\lagrangeMultiplierMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\lagrangian}{L}\n", "\\newcommand{\\laplacianFactor}{\\mathbf{ \\MakeUppercase{\\laplacianFactorScalar}}}\n", "\\newcommand{\\laplacianFactorScalar}{m}\n", "\\newcommand{\\laplacianFactorVector}{\\mathbf{ \\laplacianFactorScalar}}\n", "\\newcommand{\\laplacianMatrix}{\\mathbf{L}}\n", "\\newcommand{\\laplacianScalar}{\\ell}\n", "\\newcommand{\\laplacianVector}{\\mathbf{ \\ell}}\n", "\\newcommand{\\latentDim}{q}\n", "\\newcommand{\\latentDistanceMatrix}{\\boldsymbol{ \\Delta}}\n", "\\newcommand{\\latentDistanceScalar}{\\delta}\n", "\\newcommand{\\latentDistanceVector}{\\boldsymbol{ \\delta}}\n", "\\newcommand{\\latentForce}{f}\n", "\\newcommand{\\latentFunction}{u}\n", "\\newcommand{\\latentFunctionVector}{\\mathbf{ \\latentFunction}}\n", "\\newcommand{\\latentFunctionMatrix}{\\mathbf{ \\MakeUppercase{\\latentFunction}}}\n", "\\newcommand{\\latentIndex}{j}\n", "\\newcommand{\\latentScalar}{z}\n", "\\newcommand{\\latentVector}{\\mathbf{ \\latentScalar}}\n", "\\newcommand{\\latentMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\learnRate}{\\eta}\n", "\\newcommand{\\lengthScale}{\\ell}\n", "\\newcommand{\\rbfWidth}{\\ell}\n", "\\newcommand{\\likelihoodBound}{\\mathcal{L}}\n", "\\newcommand{\\likelihoodFunction}{L}\n", "\\newcommand{\\locationScalar}{\\mu}\n", "\\newcommand{\\locationVector}{\\boldsymbol{ \\locationScalar}}\n", "\\newcommand{\\locationMatrix}{\\mathbf{M}}\n", "\\newcommand{\\variance}[1]{\\text{var}\\left( #1 \\right)}\n", "\\newcommand{\\mappingFunction}{f}\n", "\\newcommand{\\mappingFunctionMatrix}{\\mathbf{F}}\n", "\\newcommand{\\mappingFunctionTwo}{g}\n", "\\newcommand{\\mappingFunctionTwoMatrix}{\\mathbf{G}}\n", "\\newcommand{\\mappingFunctionTwoVector}{\\mathbf{ \\mappingFunctionTwo}}\n", "\\newcommand{\\mappingFunctionVector}{\\mathbf{ \\mappingFunction}}\n", "\\newcommand{\\scaleScalar}{s}\n", "\\newcommand{\\mappingScalar}{w}\n", "\\newcommand{\\mappingVector}{\\mathbf{ \\mappingScalar}}\n", "\\newcommand{\\mappingMatrix}{\\mathbf{W}}\n", "\\newcommand{\\mappingScalarTwo}{v}\n", "\\newcommand{\\mappingVectorTwo}{\\mathbf{ \\mappingScalarTwo}}\n", "\\newcommand{\\mappingMatrixTwo}{\\mathbf{V}}\n", "\\newcommand{\\maxIters}{K}\n", "\\newcommand{\\meanMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanScalar}{\\mu}\n", "\\newcommand{\\meanTwoMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanTwoScalar}{m}\n", "\\newcommand{\\meanTwoVector}{\\mathbf{ \\meanTwoScalar}}\n", "\\newcommand{\\meanVector}{\\boldsymbol{ \\meanScalar}}\n", "\\newcommand{\\mrnaConcentration}{m}\n", "\\newcommand{\\naturalFrequency}{\\omega}\n", "\\newcommand{\\neighborhood}[1]{\\mathcal{N}\\left( #1 \\right)}\n", "\\newcommand{\\neilurl}{http://inverseprobability.com/}\n", "\\newcommand{\\noiseMatrix}{\\boldsymbol{ E}}\n", "\\newcommand{\\noiseScalar}{\\epsilon}\n", "\\newcommand{\\noiseVector}{\\boldsymbol{ \\epsilon}}\n", "\\newcommand{\\norm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\normalizedLaplacianMatrix}{\\hat{\\mathbf{L}}}\n", "\\newcommand{\\normalizedLaplacianScalar}{\\hat{\\ell}}\n", "\\newcommand{\\normalizedLaplacianVector}{\\hat{\\mathbf{ \\ell}}}\n", "\\newcommand{\\numActive}{m}\n", "\\newcommand{\\numBasisFunc}{m}\n", "\\newcommand{\\numComponents}{m}\n", "\\newcommand{\\numComps}{K}\n", "\\newcommand{\\numData}{n}\n", "\\newcommand{\\numFeatures}{K}\n", "\\newcommand{\\numHidden}{h}\n", "\\newcommand{\\numInducing}{m}\n", "\\newcommand{\\numLayers}{\\ell}\n", "\\newcommand{\\numNeighbors}{K}\n", "\\newcommand{\\numSequences}{s}\n", "\\newcommand{\\numSuccess}{s}\n", "\\newcommand{\\numTasks}{m}\n", "\\newcommand{\\numTime}{T}\n", "\\newcommand{\\numTrials}{S}\n", "\\newcommand{\\outputIndex}{j}\n", "\\newcommand{\\paramVector}{\\boldsymbol{ \\theta}}\n", "\\newcommand{\\parameterMatrix}{\\boldsymbol{ \\Theta}}\n", "\\newcommand{\\parameterScalar}{\\theta}\n", "\\newcommand{\\parameterVector}{\\boldsymbol{ \\parameterScalar}}\n", "\\newcommand{\\partDiff}[2]{\\frac{\\partial#1}{\\partial#2}}\n", "\\newcommand{\\precisionScalar}{j}\n", "\\newcommand{\\precisionVector}{\\mathbf{ \\precisionScalar}}\n", "\\newcommand{\\precisionMatrix}{\\mathbf{J}}\n", "\\newcommand{\\pseudotargetScalar}{\\widetilde{y}}\n", "\\newcommand{\\pseudotargetVector}{\\mathbf{ \\pseudotargetScalar}}\n", "\\newcommand{\\pseudotargetMatrix}{\\mathbf{ \\widetilde{Y}}}\n", "\\newcommand{\\rank}[1]{\\text{rank}\\left(#1\\right)}\n", "\\newcommand{\\rayleighDist}[2]{\\mathcal{R}\\left(#1|#2\\right)}\n", "\\newcommand{\\rayleighSamp}[1]{\\mathcal{R}\\left(#1\\right)}\n", "\\newcommand{\\responsibility}{r}\n", "\\newcommand{\\rotationScalar}{r}\n", "\\newcommand{\\rotationVector}{\\mathbf{ \\rotationScalar}}\n", "\\newcommand{\\rotationMatrix}{\\mathbf{R}}\n", "\\newcommand{\\sampleCovScalar}{s}\n", "\\newcommand{\\sampleCovVector}{\\mathbf{ \\sampleCovScalar}}\n", "\\newcommand{\\sampleCovMatrix}{\\mathbf{s}}\n", "\\newcommand{\\scalarProduct}[2]{\\left\\langle{#1},{#2}\\right\\rangle}\n", "\\newcommand{\\sign}[1]{\\text{sign}\\left(#1\\right)}\n", "\\newcommand{\\sigmoid}[1]{\\sigma\\left(#1\\right)}\n", "\\newcommand{\\singularvalue}{\\ell}\n", "\\newcommand{\\singularvalueMatrix}{\\mathbf{L}}\n", "\\newcommand{\\singularvalueVector}{\\mathbf{l}}\n", "\\newcommand{\\sorth}{\\mathbf{u}}\n", "\\newcommand{\\spar}{\\lambda}\n", "\\newcommand{\\trace}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\BasalRate}{B}\n", "\\newcommand{\\DampingCoefficient}{C}\n", "\\newcommand{\\DecayRate}{D}\n", "\\newcommand{\\Displacement}{X}\n", "\\newcommand{\\LatentForce}{F}\n", "\\newcommand{\\Mass}{M}\n", "\\newcommand{\\Sensitivity}{S}\n", "\\newcommand{\\basalRate}{b}\n", "\\newcommand{\\dampingCoefficient}{c}\n", "\\newcommand{\\mass}{m}\n", "\\newcommand{\\sensitivity}{s}\n", "\\newcommand{\\springScalar}{\\kappa}\n", "\\newcommand{\\springVector}{\\boldsymbol{ \\kappa}}\n", "\\newcommand{\\springMatrix}{\\boldsymbol{ \\mathcal{K}}}\n", "\\newcommand{\\tfConcentration}{p}\n", "\\newcommand{\\tfDecayRate}{\\delta}\n", "\\newcommand{\\tfMrnaConcentration}{f}\n", "\\newcommand{\\tfVector}{\\mathbf{ \\tfConcentration}}\n", "\\newcommand{\\velocity}{v}\n", "\\newcommand{\\sufficientStatsScalar}{g}\n", "\\newcommand{\\sufficientStatsVector}{\\mathbf{ \\sufficientStatsScalar}}\n", "\\newcommand{\\sufficientStatsMatrix}{\\mathbf{G}}\n", "\\newcommand{\\switchScalar}{s}\n", "\\newcommand{\\switchVector}{\\mathbf{ \\switchScalar}}\n", "\\newcommand{\\switchMatrix}{\\mathbf{S}}\n", "\\newcommand{\\tr}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\loneNorm}[1]{\\left\\Vert #1 \\right\\Vert_1}\n", "\\newcommand{\\ltwoNorm}[1]{\\left\\Vert #1 \\right\\Vert_2}\n", "\\newcommand{\\onenorm}[1]{\\left\\vert#1\\right\\vert_1}\n", "\\newcommand{\\twonorm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\vScalar}{v}\n", "\\newcommand{\\vVector}{\\mathbf{v}}\n", "\\newcommand{\\vMatrix}{\\mathbf{V}}\n", "\\newcommand{\\varianceDist}[2]{\\text{var}_{#2}\\left( #1 \\right)}\n", "% Already defined by latex\n", "%\\newcommand{\\vec}{#1:}\n", "\\newcommand{\\vecb}[1]{\\left(#1\\right):}\n", "\\newcommand{\\weightScalar}{w}\n", "\\newcommand{\\weightVector}{\\mathbf{ \\weightScalar}}\n", "\\newcommand{\\weightMatrix}{\\mathbf{W}}\n", "\\newcommand{\\weightedAdjacencyMatrix}{\\mathbf{A}}\n", "\\newcommand{\\weightedAdjacencyScalar}{a}\n", "\\newcommand{\\weightedAdjacencyVector}{\\mathbf{ \\weightedAdjacencyScalar}}\n", "\\newcommand{\\onesVector}{\\mathbf{1}}\n", "\\newcommand{\\zerosVector}{\\mathbf{0}}\n", "$$\n", "\n", "\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\#\n", "\n", "[@Rasmussen:book06]{style=\"text-align:right\"}\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### What is Machine Learning?\n", "\n", ". . .\n", "\n", "$$ \\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}$$\n", "\n", ". . .\n", "\n", "- **data** : observations, could be actively or passively acquired\n", " (meta-data).\n", "\n", ". . .\n", "\n", "- **model** : assumptions, based on previous experience (other data!\n", " transfer learning etc), or beliefs about the regularities of the\n", " universe. Inductive bias.\n", "\n", ". . .\n", "\n", "- **prediction** : an action to be taken or a categorization or a\n", " quality score.\n", "\n", ". . .\n", "\n", "- Royal Society Report: [Machine Learning: Power and Promise of\n", " Computers that Learn by\n", " Example](https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf)\n", "\n", "### What is Machine Learning?\n", "\n", "$$\\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}$$\n", "\n", "> - To combine data with a model need:\n", "> - **a prediction function** $\\mappingFunction(\\cdot)$ includes our\n", "> beliefs about the regularities of the universe\n", "> - **an objective function** $\\errorFunction(\\cdot)$ defines the cost\n", "> of misprediction.\n", "\n", "### Artificial Intelligence\n", "\n", "- Machine learning is a mainstay because of importance of prediction.\n", "\n", "### Uncertainty\n", "\n", "- Uncertainty in prediction arises from:\n", "- scarcity of training data and\n", "- mismatch between the set of prediction functions we choose and all\n", " possible prediction functions.\n", "- Also uncertainties in objective, leave those for another day.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Neural Networks and Prediction Functions\n", "\n", "- adaptive non-linear function models inspired by simple neuron models\n", " [@McCulloch:neuron43]\n", "- have become popular because of their ability to model data.\n", "- can be composed to form highly complex functions\n", "- start by focussing on one hidden layer\n", "\n", "### Prediction Function of One Hidden Layer\n", "\n", "$$\n", "\\mappingFunction(\\inputVector) = \\left.\\mappingVector^{(2)}\\right.^\\top \\activationVector(\\mappingMatrix_{1}, \\inputVector)\n", "$$\n", "\n", "$\\mappingFunction(\\cdot)$ is a scalar function with vector inputs,\n", "\n", "$\\activationVector(\\cdot)$ is a vector function with vector inputs.\n", "\n", "- dimensionality of the vector function is known as the number of\n", " hidden units, or the number of neurons.\n", "\n", "- elements of $\\activationVector(\\cdot)$ are the *activation* function\n", " of the neural network\n", "\n", "- elements of $\\mappingMatrix_{1}$ are the parameters of the\n", " activation functions.\n", "\n", "### Relations with Classical Statistics\n", "\n", "- In statistics activation functions are known as *basis functions*.\n", "\n", "- would think of this as a *linear model*: not linear predictions,\n", " linear in the parameters\n", "\n", "- $\\mappingVector_{1}$ are *static* parameters.\n", "\n", "### Adaptive Basis Functions\n", "\n", "- In machine learning we optimize $\\mappingMatrix_{1}$ as well as\n", " $\\mappingMatrix_{2}$ (which would normally be denoted in statistics\n", " by $\\boldsymbol{\\beta}$).\n", "\n", "- Revisit that decision: follow the path of @Neal:bayesian94 and\n", " @MacKay:bayesian92.\n", "\n", "- Consider the probabilistic approach.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Probabilistic Modelling\n", "\n", "- Probabilistically we want, $$\n", " p(\\dataScalar_*|\\dataVector, \\inputMatrix, \\inputVector_*),\n", " $$ $\\dataScalar_*$ is a test output $\\inputVector_*$ is a test input\n", " $\\inputMatrix$ is a training input matrix $\\dataVector$ is training\n", " outputs\n", "\n", "### Joint Model of World\n", "\n", "$$\n", "p(\\dataScalar_*|\\dataVector, \\inputMatrix, \\inputVector_*) = \\int p(\\dataScalar_*|\\inputVector_*, \\mappingMatrix) p(\\mappingMatrix | \\dataVector, \\inputMatrix) \\text{d} \\mappingMatrix\n", "$$\n", "\n", ". . .\n", "\n", "$\\mappingMatrix$ contains $\\mappingMatrix_1$ and $\\mappingMatrix_2$\n", "\n", "$p(\\mappingMatrix | \\dataVector, \\inputMatrix)$ is posterior density\n", "\n", "### Likelihood\n", "\n", "$p(\\dataScalar|\\inputVector, \\mappingMatrix)$ is the *likelihood* of\n", "data point\n", "\n", ". . .\n", "\n", "Normally assume independence: $$\n", "p(\\dataVector|\\inputMatrix, \\mappingMatrix) = \\prod_{i=1}^\\numData p(\\dataScalar_i|\\inputVector_i, \\mappingMatrix),$$\n", "\n", "### Likelihood and Prediction Function\n", "\n", "$$\n", "p(\\dataScalar_i | \\mappingFunction(\\inputVector_i)) = \\frac{1}{\\sqrt{2\\pi \\dataStd^2}} \\exp\\left(-\\frac{\\left(\\dataScalar_i - \\mappingFunction(\\inputVector_i)\\right)^2}{2\\dataStd^2}\\right)\n", "$$\n", "\n", "### Unsupervised Learning\n", "\n", "- Can also consider priors over latents $$\n", " p(\\dataVector_*|\\dataVector) = \\int p(\\dataVector_*|\\inputMatrix_*, \\mappingMatrix) p(\\mappingMatrix | \\dataVector, \\inputMatrix) p(\\inputMatrix) p(\\inputMatrix_*) \\text{d} \\mappingMatrix \\text{d} \\inputMatrix \\text{d}\\inputMatrix_*\n", " $$\n", "\n", "- This gives *unsupervised learning*.\n", "\n", "### Probabilistic Inference\n", "\n", "- Data: $\\dataVector$\n", "\n", "- Model: $p(\\dataVector, \\dataVector^*)$\n", "\n", "- Prediction: $p(\\dataVector^*| \\dataVector)$\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Graphical Models\n", "\n", "- Represent joint distribution through *conditional dependencies*.\n", "- E.g. Markov chain\n", "\n", "$$p(\\dataVector) = p(\\dataScalar_\\numData | \\dataScalar_{\\numData-1}) p(\\dataScalar_{\\numData-1}|\\dataScalar_{\\numData-2}) \\dots p(\\dataScalar_{2} | \\dataScalar_{1})$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import daft\n", "from matplotlib import rc\n", "\n", "rc(\"font\", **{'family':'sans-serif','sans-serif':['Helvetica']}, size=30)\n", "rc(\"text\", usetex=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pgm = daft.PGM(shape=[3, 1],\n", " origin=[0, 0], \n", " grid_unit=5, \n", " node_unit=1.9, \n", " observed_style='shaded',\n", " line_width=3)\n", "\n", "\n", "pgm.add_node(daft.Node(\"y_1\", r\"$y_1$\", 0.5, 0.5, fixed=False))\n", "pgm.add_node(daft.Node(\"y_2\", r\"$y_2$\", 1.5, 0.5, fixed=False))\n", "pgm.add_node(daft.Node(\"y_3\", r\"$y_3$\", 2.5, 0.5, fixed=False))\n", "pgm.add_edge(\"y_1\", \"y_2\")\n", "pgm.add_edge(\"y_2\", \"y_3\")\n", "\n", "pgm.render().figure.savefig(\"../slides/diagrams/ml/markov.svg\", transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### \n", "\n", "Predict Perioperative Risk of Clostridium Difficile Infection Following\n", "Colon Surgery [@Steele:predictive12]\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Performing Inference\n", "\n", "- Easy to write in probabilities\n", "\n", "- But underlying this is a wealth of computational challenges.\n", "\n", "- High dimensional integrals typically require approximation.\n", "\n", "### Linear Models\n", "\n", "- In statistics, focussed more on *linear* model implied by $$\n", " \\mappingFunction(\\inputVector) = \\left.\\mappingVector^{(2)}\\right.^\\top \\activationVector(\\mappingMatrix_1, \\inputVector)\n", " $$\n", "\n", "- Hold $\\mappingMatrix_1$ fixed for given analysis.\n", "\n", "- Gaussian prior for $\\mappingMatrix$, $$\n", " \\mappingVector^{(2)} \\sim \\gaussianSamp{\\zerosVector}{\\covarianceMatrix}.\n", " $$ $$\n", " \\dataScalar_i = \\mappingFunction(\\inputVector_i) + \\noiseScalar_i,\n", " $$ where $$\n", " \\noiseScalar_i \\sim \\gaussianSamp{0}{\\dataStd^2}\n", " $$\n", "\n", "\\newslides{Linear Gaussian Models}\n", "\n", "- Normally integrals are complex but for this Gaussian linear case\n", " they are trivial.\n", "\n", "### Multivariate Gaussian Properties\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Recall Univariate Gaussian Properties\n", "\n", ". . .\n", "\n", "1. Sum of Gaussian variables is also Gaussian.\n", "\n", "$$\\dataScalar_i \\sim \\gaussianSamp{\\meanScalar_i}{\\dataStd_i^2}$$\n", "\n", ". . .\n", "\n", "$$\\sum_{i=1}^{\\numData} \\dataScalar_i \\sim \\gaussianSamp{\\sum_{i=1}^\\numData \\meanScalar_i}{\\sum_{i=1}^\\numData\\dataStd_i^2}$$\n", "\n", "### Recall Univariate Gaussian Properties\n", "\n", "2. Scaling a Gaussian leads to a Gaussian.\n", "\n", ". . .\n", "\n", "$$\\dataScalar \\sim \\gaussianSamp{\\meanScalar}{\\dataStd^2}$$\n", "\n", ". . .\n", "\n", "$$\\mappingScalar\\dataScalar\\sim \\gaussianSamp{\\mappingScalar\\meanScalar}{\\mappingScalar^2 \\dataStd^2}$$\n", "\n", "### Multivariate Consequence\n", "\n", "[If]{style=\"text-align:left\"}\n", "$$\\inputVector \\sim \\gaussianSamp{\\meanVector}{\\covarianceMatrix}$$\n", "\n", ". . .\n", "\n", "[And]{style=\"text-align:left\"}\n", "$$\\dataVector= \\mappingMatrix\\inputVector$$\n", "\n", ". . .\n", "\n", "[Then]{style=\"text-align:left\"}\n", "$$\\dataVector \\sim \\gaussianSamp{\\mappingMatrix\\meanVector}{\\mappingMatrix\\covarianceMatrix\\mappingMatrix^\\top}$$\n", "\n", "### Linear Gaussian Models\n", "\n", "1. linear Gaussian models are easier to deal with\n", "2. Even the parameters *within* the process can be handled, by\n", " considering a particular limit.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Multivariate Gaussian Properties\n", "\n", "- If $$\n", " \\dataVector = \\mappingMatrix \\inputVector + \\noiseVector,\n", " $$\n", "\n", "- Assume $$\n", " \\begin{align}\n", " \\inputVector & \\sim \\gaussianSamp{\\meanVector}{\\covarianceMatrix}\\\\\n", " \\noiseVector & \\sim \\gaussianSamp{\\zerosVector}{\\covarianceMatrixTwo}\n", " \\end{align}\n", " $$\n", "- Then $$\n", " \\dataVector \\sim \\gaussianSamp{\\mappingMatrix\\meanVector}{\\mappingMatrix\\covarianceMatrix\\mappingMatrix^\\top + \\covarianceMatrixTwo}.\n", " $$ If $\\covarianceMatrixTwo=\\dataStd^2\\eye$, this is Probabilistic\n", " Principal Component Analysis [@Tipping:probpca99], because we\n", " integrated out the inputs (or *latent* variables they would be\n", " called in that case).\n", "\n", "### Non linear on Inputs\n", "\n", "- Set each activation function computed at each data point to be\n", "\n", "$$\n", "\\activationScalar_{i,j} = \\activationScalar(\\mappingVector^{(1)}_{j}, \\inputVector_{i})\n", "$$ Define *design matrix* $$\n", "\\activationMatrix = \n", "\\begin{bmatrix}\n", "\\activationScalar_{1, 1} & \\activationScalar_{1, 2} & \\dots & \\activationScalar_{1, \\numHidden} \\\\\n", "\\activationScalar_{1, 2} & \\activationScalar_{1, 2} & \\dots & \\activationScalar_{1, \\numData} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\activationScalar_{\\numData, 1} & \\activationScalar_{\\numData, 2} & \\dots & \\activationScalar_{\\numData, \\numHidden}\n", "\\end{bmatrix}.\n", "$$\n", "\n", "### Matrix Representation of a Neural Network\n", "\n", "$$\\dataScalar\\left(\\inputVector\\right) = \\activationVector\\left(\\inputVector\\right)^\\top \\mappingVector + \\noiseScalar$$\n", "\n", ". . .\n", "\n", "$$\\dataVector = \\activationMatrix\\mappingVector + \\noiseVector$$\n", "\n", ". . .\n", "\n", "$$\\noiseVector \\sim \\gaussianSamp{\\zerosVector}{\\dataStd^2\\eye}$$\n", "\n", "### Prior Density\n", "\n", "- Define\n", "\n", "$$\n", "\\mappingVector \\sim \\gaussianSamp{\\zerosVector}{\\alpha\\eye},\n", "$$\n", "\n", "- Rules of multivariate Gaussians to see that,\n", "\n", "$$\n", "\\dataVector \\sim \\gaussianSamp{\\zerosVector}{\\alpha \\activationMatrix \\activationMatrix^\\top + \\dataStd^2 \\eye}.\n", "$$\n", "\n", "$$\n", "\\kernelMatrix = \\alpha \\activationMatrix \\activationMatrix^\\top + \\dataStd^2 \\eye.\n", "$$\n", "\n", "### Joint Gaussian Density\n", "\n", "- Elements are a function\n", " $\\kernel_{i,j} = \\kernel\\left(\\inputVector_i, \\inputVector_j\\right)$\n", "\n", "$$\n", "\\kernelMatrix = \\alpha \\activationMatrix \\activationMatrix^\\top + \\dataStd^2 \\eye.\n", "$$\n", "\n", "### Covariance Function\n", "\n", "$$\n", "\\kernel_\\mappingFunction\\left(\\inputVector_i, \\inputVector_j\\right) = \\alpha \\activationVector\\left(\\mappingMatrix_1, \\inputVector_i\\right)^\\top \\activationVector\\left(\\mappingMatrix_1, \\inputVector_j\\right)\n", "$$\n", "\n", "- formed by inner products of the rows of the *design matrix*.\n", "\n", "### Gaussian Process\n", "\n", "- Instead of making assumptions about our density over each data\n", " point, $\\dataScalar_i$ as i.i.d.\n", "\n", "- make a joint Gaussian assumption over our data.\n", "\n", "- covariance matrix is now a function of both the parameters of the\n", " activation function, $\\mappingMatrix_1$, and the input variables,\n", " $\\inputMatrix$.\n", "\n", "- Arises from integrating out $\\mappingVector^{(2)}$.\n", "\n", "### Basis Functions\n", "\n", "- Can be very complex, such as deep kernels, [@Cho:deep09] or could\n", " even put a convolutional neural network inside.\n", "- Viewing a neural network in this way is also what allows us to\n", " beform sensible *batch* normalizations [@Ioffe:batch15].\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Non-degenerate Gaussian Processes\n", "\n", "- This process is *degenerate*.\n", "- Covariance function is of rank at most $\\numHidden$.\n", "- As $\\numData \\rightarrow \\infty$, covariance matrix is not full\n", " rank.\n", "- Leading to $\\det{\\kernelMatrix} = 0$\n", "\n", "### Infinite Networks\n", "\n", "- In ML Radford Neal [@Neal:bayesian94] asked \"what would happen if\n", " you took $\\numHidden \\rightarrow \\infty$?\"\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Roughly Speaking\n", "\n", "- Instead of\n", "\n", "$$\n", " \\begin{align*}\n", " \\kernel_\\mappingFunction\\left(\\inputVector_i, \\inputVector_j\\right) & = \\alpha \\activationVector\\left(\\mappingMatrix_1, \\inputVector_i\\right)^\\top \\activationVector\\left(\\mappingMatrix_1, \\inputVector_j\\right)\\\\\n", " & = \\alpha \\sum_k \\activationScalar\\left(\\mappingVector^{(1)}_k, \\inputVector_i\\right) \\activationScalar\\left(\\mappingVector^{(1)}_k, \\inputVector_j\\right)\n", " \\end{align*}\n", " $$\n", "\n", "- Sample infinitely many from a prior density,\n", " $p(\\mappingVector^{(1)})$,\n", "\n", "$$\n", "\\kernel_\\mappingFunction\\left(\\inputVector_i, \\inputVector_j\\right) = \\alpha \\int \\activationScalar\\left(\\mappingVector^{(1)}, \\inputVector_i\\right) \\activationScalar\\left(\\mappingVector^{(1)}, \\inputVector_j\\right) p(\\mappingVector^{(1)}) \\text{d}\\mappingVector^{(1)}\n", "$$\n", "\n", "- Also applies for non-Gaussian $p(\\mappingVector^{(1)})$ because of\n", " the *central limit theorem*.\n", "\n", "### Simple Probabilistic Program\n", "\n", "- If $$\n", " \\begin{align*} \n", " \\mappingVector^{(1)} & \\sim p(\\cdot)\\\\ \\phi_i & = \\activationScalar\\left(\\mappingVector^{(1)}, \\inputVector_i\\right), \n", " \\end{align*}\n", " $$ has finite variance.\n", "\n", "- Then taking number of hidden units to infinity, is also a Gaussian\n", " process.\n", "\n", "### Further Reading\n", "\n", "- Chapter 2 of Neal's thesis [@Neal:bayesian94]\n", "\n", "- Rest of Neal's thesis. [@Neal:bayesian94]\n", "\n", "- David MacKay's PhD thesis [@MacKay:bayesian92]\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import Kernel" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import eq_cov" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernel = Kernel(function=eq_cov,\n", " name='Exponentiated Quadratic',\n", " shortname='eq', \n", " lengthscale=0.25)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(10)\n", "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.rejection_samples(kernel=kernel, \n", " diagrams='../slides/diagrams/gp')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('gp_rejection_sample{sample:0>3}.png', \n", " directory='../slides/diagrams/gp', \n", " sample=IntSlider(1,1,5,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "### Distributions over Functions\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(4949)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sampling a Function\n", "\n", "**Multi-variate Gaussians**\n", "\n", "- We will consider a Gaussian with a particular structure of\n", " covariance matrix.\n", "- Generate a single sample from this 25 dimensional Gaussian density,\n", " $$\n", " \\mappingFunctionVector=\\left[\\mappingFunction_{1},\\mappingFunction_{2}\\dots \\mappingFunction_{25}\\right].\n", " $$\n", "- We will plot these points against their index." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot\n", "from mlai import Kernel, exponentiated_quadratic" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernel=Kernel(function=exponentiated_quadratic, lengthscale=0.5)\n", "plot.two_point_sample(kernel.K, diagrams='../slides/diagrams/gp')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(0, 0, 8, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gaussian Distribution Sample\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(9, 9, 12, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction of $\\mappingFunction_{2}$ from $\\mappingFunction_{1}$\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Uluru\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Prediction with Correlated Gaussians\n", "\n", "- Prediction of $\\mappingFunction_2$ from $\\mappingFunction_1$\n", " requires *conditional density*.\n", "- Conditional density is *also* Gaussian. $$\n", " p(\\mappingFunction_2|\\mappingFunction_1) = \\gaussianDist{\\mappingFunction_2}{\\frac{\\kernelScalar_{1, 2}}{\\kernelScalar_{1, 1}}\\mappingFunction_1}{ \\kernelScalar_{2, 2} - \\frac{\\kernelScalar_{1,2}^2}{\\kernelScalar_{1,1}}}\n", " $$ where covariance of joint density is given by $$\n", " \\kernelMatrix = \\begin{bmatrix} \\kernelScalar_{1, 1} & \\kernelScalar_{1, 2}\\\\ \\kernelScalar_{2, 1} & \\kernelScalar_{2, 2}.\\end{bmatrix}\n", " $$\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('two_point_sample{sample:0>3}.svg', '../slides/diagrams/gp', sample=IntSlider(13, 13, 17, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction of $\\mappingFunction_{8}$ from $\\mappingFunction_{1}$\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Key Object\n", "\n", "- Covariance function, $\\kernelMatrix$\n", "- Determines properties of samples.\n", "- Function of $\\inputMatrix$,\n", " $$\\kernelScalar_{i,j} = \\kernelScalar(\\inputVector_i, \\inputVector_j)$$\n", "\n", "### Linear Algebra\n", "\n", "- Posterior mean\n", " $$\\mappingFunction_D(\\inputVector_*) = \\kernelVector(\\inputVector_*, \\inputMatrix) \\kernelMatrix^{-1}\n", " \\mathbf{y}$$\n", "\n", "- Posterior covariance\n", " $$\\mathbf{C}_* = \\kernelMatrix_{*,*} - \\kernelMatrix_{*,\\mappingFunctionVector}\n", " \\kernelMatrix^{-1} \\kernelMatrix_{\\mappingFunctionVector, *}$$\n", "\n", "### Linear Algebra\n", "\n", "- Posterior mean" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "$$\\mappingFunction_D(\\inputVector_*) = \\kernelVector(\\inputVector_*, \\inputMatrix) \\boldsymbol{\\alpha}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Posterior covariance\n", " $$\\covarianceMatrix_* = \\kernelMatrix_{*,*} - \\kernelMatrix_{*,\\mappingFunctionVector}\n", " \\kernelMatrix^{-1} \\kernelMatrix_{\\mappingFunctionVector, *}$$\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Exponentiated Quadratic Covariance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import Kernel" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import eq_cov" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernel = Kernel(function=eq_cov,\n", " name='Exponentiated Quadratic',\n", " shortname='eq', \n", " formula='\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\exp\\left(-\\frac{\\ltwoNorm{\\inputVector-\\inputVector^\\prime}^2}{2\\lengthScale^2}\\right)',\n", " lengthscale=0.2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.covariance_func(kernel=kernel, diagrams='../slides/diagrams/kern/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "$$\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\exp\\left(-\\frac{\\ltwoNorm{\\inputVector-\\inputVector^\\prime}^2}{2\\lengthScale^2}\\right)$$\n", "
\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\\includesvgclass{../slides/diagrams/kern/eq_covariance.svg}\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Olympic Marathon Data\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "- Gold medal times for Olympic Marathon since 1896.\n", "- Marathons before 1924 didn’t have a standardised distance.\n", "- Present results using pace per km.\n", "- In 1904 Marathon was badly organised leading to very slow times.\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "Image from Wikimedia Commons \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "xlim = (1875,2030)\n", "ylim = (2.5, 6.5)\n", "yhat = (y-offset)/scale\n", "\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x, y, 'r.',markersize=10)\n", "ax.set_xlabel('year', fontsize=20)\n", "ax.set_ylabel('pace min/km', fontsize=20)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "\n", "mlai.write_figure(figure=fig, \n", " filename='../slides/diagrams/datasets/olympic-marathon.svg', \n", " transparent=True, \n", " frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Alan Turing\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "### Probability Winning Olympics?\n", "\n", "- He was a formidable Marathon runner.\n", "- In 1946 he ran a time 2 hours 46 minutes.\n", " - That's a pace of 3.95 min/km.\n", "- What is the probability he would have won an Olympics if one had\n", " been held in 1946?\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/olympic-marathon-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data GP\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gpoptimizeInit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning Covariance Parameters\n", "\n", "Can we determine covariance parameters from the data?\n", "\n", "### \n", "\n", "$$\\gaussianDist{\\dataVector}{\\mathbf{0}}{\\kernelMatrix}=\\frac{1}{(2\\pi)^\\frac{\\numData}{2}{\\det{\\kernelMatrix}^{\\frac{1}{2}}}}{\\exp\\left(-\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}\\right)}$$\n", "\n", "### \n", "\n", "$$\\begin{aligned}\n", " \\gaussianDist{\\dataVector}{\\mathbf{0}}{\\kernelMatrix}=\\frac{1}{(2\\pi)^\\frac{\\numData}{2}{\\color{white} \\det{\\kernelMatrix}^{\\frac{1}{2}}}}{\\color{white}\\exp\\left(-\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}\\right)}\n", "\\end{aligned}\n", "$$\n", "\n", "### \n", "\n", "$$\n", "\\begin{aligned}\n", " \\log \\gaussianDist{\\dataVector}{\\mathbf{0}}{\\kernelMatrix}=&{\\color{white}-\\frac{1}{2}\\log\\det{\\kernelMatrix}}{\\color{white}-\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}} \\\\ &-\\frac{\\numData}{2}\\log2\\pi\n", "\\end{aligned}\n", "$$\n", "\n", "$$\n", "\\errorFunction(\\parameterVector) = {\\color{white} \\frac{1}{2}\\log\\det{\\kernelMatrix}} + {\\color{white} \\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}}\n", "$$\n", "\n", "### \n", "\n", "The parameters are *inside* the covariance function (matrix).\n", "\\normalsize\n", "$$\\kernelScalar_{i, j} = \\kernelScalar(\\inputVals_i, \\inputVals_j; \\parameterVector)$$\n", "\n", "### Eigendecomposition of Covariance\n", "\n", "[\\Large\n", "$$\\kernelMatrix = \\rotationMatrix \\eigenvalueMatrix^2 \\rotationMatrix^\\top$$]{}\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "$\\eigenvalueMatrix$ represents distance on axes. $\\rotationMatrix$ gives\n", "rotation.\n", "
\n", "### Eigendecomposition of Covariance\n", "\n", "- $\\eigenvalueMatrix$ is *diagonal*,\n", " $\\rotationMatrix^\\top\\rotationMatrix = \\eye$.\n", "- Useful representation since\n", " $\\det{\\kernelMatrix} = \\det{\\eigenvalueMatrix^2} = \\det{\\eigenvalueMatrix}^2$.\n", "\n", "### Capacity control: ${\\color{white} \\log \\det{\\kernelMatrix}}$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "fill_color = [1., 1., 0.]\n", "black_color = [0., 0., 0.]\n", "blue_color = [0., 0., 1.]\n", "magenta_color = [1., 0., 1.]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "ax.set_axis_off()\n", "cax = fig.add_axes([0., 0., 1., 1.])\n", "cax.set_axis_off()\n", "\n", "counter = 1\n", "lambda1 = 0.5\n", "lambda2 = 0.3\n", "\n", "cax.set_xlim([0., 1.])\n", "cax.set_ylim([0., 1.])\n", "\n", "# Matrix label axes\n", "tax2 = fig.add_axes([0, 0.47, 0.1, 0.1])\n", "tax2.set_xlim([0, 1.])\n", "tax2.set_ylim([0, 1.])\n", "tax2.set_axis_off()\n", "label_eigenvalue = tax2.text(0.5, 0.5, \"\\Large $\\eigenvalueMatrix=$\")\n", "\n", "ax = fig.add_axes([0.5, 0.25, 0.5, 0.5])\n", "ax.set_xlim([-0.25, 0.6])\n", "ax.set_ylim([-0.25, 0.6])\n", "from matplotlib.patches import Polygon\n", "pat_hand = ax.add_patch(Polygon(np.column_stack(([0, 0, lambda1, lambda1], \n", " [0, lambda2, lambda2, 0])), \n", " facecolor=fill_color, \n", " edgecolor=black_color, \n", " visible=False))\n", "data = pat_hand.get_path().vertices\n", "rotation_matrix = np.asarray([[np.sqrt(2)/2, -np.sqrt(2)/2], \n", " [np.sqrt(2)/2, np.sqrt(2)/2]])\n", "new = np.dot(rotation_matrix,data.T)\n", "pat_hand = ax.add_patch(Polygon(np.column_stack(([0, 0, lambda1, lambda1], \n", " [0, lambda2, lambda2, 0])), \n", " facecolor=fill_color, \n", " edgecolor=black_color, \n", " visible=False))\n", "pat_hand_rot = ax.add_patch(Polygon(new.T, \n", " facecolor=fill_color, \n", " edgecolor=black_color))\n", "pat_hand_rot.set(visible=False)\n", "\n", "# 3D box\n", "pat_hand3 = [ax.add_patch(Polygon(np.column_stack(([0, -0.2*lambda1, 0.8*lambda1, lambda1], \n", " [0, -0.2*lambda2, -0.2*lambda2, 0])), \n", " facecolor=fill_color, \n", " edgecolor=black_color))]\n", " \n", "pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([0, -0.2*lambda1, -0.2*lambda1, 0], \n", " [0, -0.2*lambda2, 0.8*lambda2, lambda2])), \n", " facecolor=fill_color, \n", " edgecolor=black_color)))\n", "\n", "pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([-0.2*lambda1, 0, lambda1, 0.8*lambda1], \n", " [0.8*lambda2, lambda2, lambda2, 0.8*lambda2])), \n", " facecolor=fill_color,\n", " edgecolor=black_color)))\n", "pat_hand3.append(ax.add_patch(Polygon(np.column_stack(([lambda1, 0.8*lambda1, 0.8*lambda1, lambda1], \n", " [lambda2, 0.8*lambda2, -0.2*lambda2, 0])), \n", " facecolor=fill_color, \n", " edgecolor=black_color)))\n", "\n", "for hand in pat_hand3:\n", " hand.set(visible=False)\n", "\n", "ax.set_aspect('equal')\n", "ax.set_axis_off()\n", "xlim = ax.get_xlim()\n", "ylim = ax.get_ylim()\n", "xspan = xlim[1] - xlim[0]\n", "yspan = ylim[1] - ylim[0]\n", "#ar_one = ax.arrow([0, lambda1], [0, 0])\n", "ar_one = ax.arrow(x=0, y=0, dx=lambda1, dy=0)\n", "#ar_two = ax.arrow([0, 0], [0, lambda2])\n", "ar_two = ax.arrow(x=0, y=0, dx=0, dy=lambda2)\n", "#ar_three = ax.arrow([0, -0.2*lambda1], [0, -0.2*lambda2])\n", "ar_three = ax.arrow(x=0, y=0, dx=-0.2*lambda1, dy=-0.2*lambda2)\n", "ar_one_text = ax.text(0.5*lambda1, -0.05*yspan, \n", " '$\\eigenvalue_1$', \n", " horizontalalignment='center')\n", "ar_two_text = ax.text(-0.05*xspan, 0.5*lambda2, \n", " '$\\eigenvalue_2$', \n", " horizontalalignment='center')\n", "ar_three_text = ax.text(-0.05*xspan-0.1*lambda1, -0.1*lambda2+0.05*yspan, \n", " '$\\eigenvalue_3$', \n", " horizontalalignment='center')\n", "ar_one.set(linewidth=3, visible=False, color=blue_color)\n", "ar_one_text.set(visible=False)\n", "\n", "ar_two.set(linewidth=3, visible=False, color=blue_color)\n", "ar_two_text.set(visible=False)\n", "\n", "ar_three.set(linewidth=3, visible=False, color=blue_color)\n", "ar_three_text.set(visible=False)\n", "\n", "\n", "matrix_ax = fig.add_axes([0.2, 0.35, 0.3, 0.3])\n", "matrix_ax.set_aspect('equal')\n", "matrix_ax.set_axis_off()\n", "eigenvals = [['$\\eigenvalue_1$', '$0$'],['$0$', '$\\eigenvalue_2$']]\n", "plot.matrix(eigenvals, \n", " matrix_ax, \n", " bracket_style='square', \n", " type='entries', \n", " bracket_color=black_color)\n", "\n", "\n", "# First arrow\n", "matrix_ax.cla()\n", "plot.matrix(eigenvals, \n", " matrix_ax, \n", " bracket_style='square', \n", " type='entries',\n", " highlight=True,\n", " highlight_row=[0, 0],\n", " highlight_col=':',\n", " highlight_color=magenta_color,\n", " bracket_color=black_color)\n", "\n", "ar_one.set(visible=True)\n", "ar_one_text.set(visible=True)\n", "\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "# Second arrow\n", "matrix_ax.cla()\n", "plot.matrix(eigenvals, \n", " matrix_ax, \n", " bracket_style='square', \n", " type='entries', \n", " highlight=True,\n", " highlight_row=[1,1],\n", " highlight_col=':',\n", " highlight_color=magenta_color,\n", " bracket_color=black_color)\n", "\n", "ar_two.set(visible=True)\n", "ar_two_text.set(visible=True)\n", "\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "matrix_ax.cla()\n", "plot.matrix(eigenvals, matrix_ax, \n", " bracket_style='square', \n", " type='entries', \n", " bracket_color=black_color)\n", "\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "\n", "tax = fig.add_axes([0.1, 0.1, 0.8, 0.1])\n", "tax.set_axis_off()\n", "tax.set_xlim([0, 1])\n", "tax.set_ylim([0, 1])\n", "det_text = text(0.5, 0.5,\n", " '\\Large $\\det{\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2$', \n", " horizontalalignment='center')\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "axes(ax)\n", "pat_hand.set(visible=True)\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "det_text_plot = text(0.5*lambda1, \n", " 0.5*lambda2, \n", " '\\Large $\\det{\\eigenvalueMatrix}$', \n", " horizontalalignment='center')\n", " \n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "\n", "eigenvals2 = {'$\\eigenvalue_1$', '$0$' '$0$'; '$0$', '$\\eigenvalue_2$' '$0$'; '$0$', '$0$' '$\\eigenvalue_3$'}\n", "axes(matrix_ax)\n", "matrix_ax.cla()\n", "plot.matrix(eigenvals2, matrix_ax, \n", " bracket_style='square', \n", " type='entries',\n", " highlight=True,\n", " highlight_row=[2,2],\n", " highlight_col=':',\n", " highlight_color=magenta_color)\n", "\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "\n", "ar_three.set(visible=True)\n", "ar_three_text.set(visible=True)\n", "pat_hand3.set(visible=True)\n", "det_text.set(string='\\Large $\\det{\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2\\eigenvalue_3$')\n", "\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "matrix_ax.cla()\n", "plot.matrix(eigenvals, \n", " matrix_ax, \n", " bracket_style='square', \n", " type='entries', \n", " bracket_color=black_color)\n", " \n", "ar_three.set(visible=False)\n", "ar_three_text.set(visible=False)\n", "pat_hand3.set(visible=False)\n", "det_text.set(string='\\Large $\\det{\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2$')\n", "\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1\n", "\n", "\n", "\n", "det_text.set(string='\\Large $\\det{\\rotationMatrix\\eigenvalueMatrix} = \\eigenvalue_1 \\eigenvalue_2$')\n", "label_eigenvalue.set(string='\\Large $\\rotationMatrix\\eigenvalueMatrix=$')\n", "\n", "\n", "\n", "rotate_object(rotation_matrix, ar_one)\n", "rotate_object(rotation_matrix, ar_one_text)\n", "rotate_object(rotation_matrix, ar_two)\n", "rotate_object(rotation_matrix, ar_two_text)\n", "rotate_object(rotation_matrix, det_textPlot)\n", "pat_hand_rot.set(visible=True)\n", "pat_hand.set(visible=False)\n", "\n", "W = [['$\\mappingScalar_{1, 1}$', '$\\mappingScalar_{1, 2}$'],[ '$\\mappingScalar_{2, 1}$', '$\\mappingScalar_{2, 2}$']]\n", "plot.matrix(W, \n", " matrix_ax, \n", " bracket_style='square', \n", " type='entries', \n", " bracket_color=black_color)\n", "\n", "\n", "file_name = 'gp-optimise-determinant{counter:>3}'.format(counter=counter)\n", "mlai.write_figure(os.path.join(diagrams, file_name), transparent=True)\n", "counter += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Data Fit: ${\\color{white} \\frac{\\dataVector^\\top\\kernelMatrix^{-1}\\dataVector}{2}}$\n", "\n", "\n", "### $$\\errorFunction(\\parameterVector) = {\\color{white}\\frac{1}{2}\\log\\det{\\kernelMatrix}}+{\\color{white}\\frac{\\dataVector^{\\top}\\kernelMatrix^{-1}\\dataVector}{2}}$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import os" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import GPy\n", "import teaching_plots as plot\n", "import mlai\n", "import gp_tutorial" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(125)\n", "diagrams = '../slides/diagrams/gp'\n", "\n", "black_color=[0., 0., 0.]\n", "red_color=[1., 0., 0.]\n", "blue_color=[0., 0., 1.]\n", "magenta_color=[1., 0., 1.]\n", "fontsize=18" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_lim = [-2.2, 2.2]\n", "y_ticks = [-2, -1, 0, 1, 2]\n", "x_lim = [-2, 2]\n", "x_ticks = [-2, -1, 0, 1, 2]\n", "err_y_lim = [-12, 20]\n", "\n", "linewidth=3\n", "markersize=15\n", "markertype='.'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(-1, 1, 6)[:, np.newaxis]\n", "xtest = np.linspace(x_lim[0], x_lim[1], 200)[:, np.newaxis]\n", "\n", "# True data\n", "true_kern = GPy.kern.RBF(1) + GPy.kern.White(1)\n", "true_kern.rbf.lengthscale = 1.0\n", "true_kern.white.variance = 0.01\n", "K = true_kern.K(x) \n", "y = np.random.multivariate_normal(np.zeros((6,)), K, 1).T" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# Fitted model\n", "kern = GPy.kern.RBF(1) + GPy.kern.White(1)\n", "kern.rbf.lengthscale = 1.0\n", "kern.white.variance = 0.01\n", "\n", "lengthscales = np.asarray([0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 4, 8, 16, 100])\n", "\n", "fig1, ax1 = plt.subplots(figsize=plot.one_figsize) \n", "fig2, ax2 = plt.subplots(figsize=plot.one_figsize) \n", "line = ax2.semilogx(np.NaN, np.NaN, 'x-', \n", " color=black_color)\n", "ax.set_ylim(err_y_lim)\n", "ax.set_xlim([0.025, 32])\n", "ax.grid(True)\n", "ax.set_xticks([0.01, 0.1, 1, 10, 100])\n", "ax.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])\n", "\n", "\n", "err = np.zeros_like(lengthscales)\n", "err_log_det = np.zeros_like(lengthscales)\n", "err_fit = np.zeros_like(lengthscales)\n", "\n", "counter = 0\n", "for i, ls in enumerate(lengthscales):\n", " kern.rbf.lengthscale=ls\n", " K = kern.K(x) \n", " invK, L, Li, log_det_K = GPy.util.linalg.pdinv(K)\n", " err[i] = 0.5*(log_det_K + np.dot(np.dot(y.T,invK),y))\n", " err_log_det[i] = 0.5*log_det_K\n", " err_fit[i] = 0.5*np.dot(np.dot(y.T,invK), y)\n", " Kx = kern.K(x, xtest)\n", " ypred_mean = np.dot(np.dot(Kx.T, invK), y)\n", " ypred_var = kern.Kdiag(xtest) - np.sum((np.dot(Kx.T,invK))*Kx.T, 1)\n", " ypred_sd = np.sqrt(ypred_var)\n", " ax1.clear()\n", " _ = gp_tutorial.gpplot(xtest.flatten(),\n", " ypred_mean.flatten(),\n", " ypred_mean.flatten()-2*ypred_sd.flatten(),\n", " ypred_mean.flatten()+2*ypred_sd.flatten(), \n", " ax=ax1)\n", " x_lim = ax1.get_xlim()\n", " ax1.set_ylabel('$f(x)$', fontsize=fontsize)\n", " ax1.set_xlabel('$x$', fontsize=fontsize)\n", "\n", " p = ax1.plot(x, y, markertype, color=black_color, markersize=markersize, linewidth=linewidth)\n", " ax1.set_ylim(y_lim)\n", " ax1.set_xlim(x_lim) \n", " ax1.set_xticks(x_ticks)\n", " #ax.set(box=False)\n", " \n", " ax1.plot([x_lim[0], x_lim[0]], y_lim, color=black_color)\n", " ax1.plot(x_lim, [y_lim[0], y_lim[0]], color=black_color)\n", "\n", " file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)\n", " mlai.write_figure(os.path.join(diagrams, file_name),\n", " figure=fig1,\n", " transparent=True)\n", " counter += 1\n", "\n", " ax2.clear()\n", " t = ax2.semilogx(lengthscales[0:i+1], err[0:i+1], 'x-', \n", " color=magenta_color, \n", " markersize=markersize,\n", " linewidth=linewidth)\n", " t2 = ax2.semilogx(lengthscales[0:i+1], err_log_det[0:i+1], 'x-', \n", " color=blue_color, \n", " markersize=markersize,\n", " linewidth=linewidth)\n", " t3 = ax2.semilogx(lengthscales[0:i+1], err_fit[0:i+1], 'x-', \n", " color=red_color, \n", " markersize=markersize,\n", " linewidth=linewidth)\n", " ax2.set_ylim(err_y_lim)\n", " ax2.set_xlim([0.025, 32])\n", " ax2.set_xticks([0.01, 0.1, 1, 10, 100])\n", " ax2.set_xticklabels(['$10^{-2}$', '$10^{-1}$', '$10^0$', '$10^1$', '$10^2$'])\n", "\n", " ax2.grid(True)\n", "\n", " ax2.set_ylabel('negative log likelihood', fontsize=fontsize)\n", " ax2.set_xlabel('length scale, $\\ell$', fontsize=fontsize)\n", " file_name = 'gp-optimise{counter:0>3}.svg'.format(counter=counter)\n", " mlai.write_figure(os.path.join(diagrams, file_name),\n", " figure=fig2,\n", " transparent=True)\n", " counter += 1\n", " #ax.set_box(False)\n", " xlim = ax2.get_xlim()\n", " ax2.plot([xlim[0], xlim[0]], err_y_lim, color=black_color)\n", " ax2.plot(xlim, [err_y_lim[0], err_y_lim[0]], color=black_color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise001}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise003}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise005}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise007}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise009}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise011}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise013}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise015}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise017}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise019}\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\\includesvg{../slides/diagrams/gp/gp-optimise021}\n", "
\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Della Gatta Gene Data\n", "\n", "- Given given expression levels in the form of a time series from\n", " @DellaGatta:direct08." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "xlim = (-20,260)\n", "ylim = (5, 7.5)\n", "yhat = (y-offset)/scale\n", "\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x, y, 'r.',markersize=10)\n", "ax.set_xlabel('time/min', fontsize=20)\n", "ax.set_ylabel('expression', fontsize=20)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "\n", "mlai.write_figure(figure=fig, \n", " filename='../slides/diagrams/datasets/della-gatta-gene.svg', \n", " transparent=True, \n", " frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Della Gatta Gene Data\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Gene Expression Example\n", "\n", "- Want to detect if a gene is expressed or not, fit a GP to each gene\n", " @Kalaitzis:simple11.\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "###" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full.log_likelihood()), fontsize=20)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/della-gatta-gene-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full2, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full2.log_likelihood()), fontsize=20)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/della-gatta-gene-gp2.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full3, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full3.log_likelihood()), fontsize=20)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/della-gatta-gene-gp3.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.multiple_optima(diagrams='../slides/diagrams/gp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multiple Optima\n", "\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Example: Prediction of Malaria Incidence in Uganda\n", "\n", "[]{style=\"text-align:right\"}\n", "\n", "- Work with Ricardo Andrade Pacheco, John Quinn and Martin Mubaganzi\n", " (Makerere University, Uganda)\n", "- See [AI-DEV Group](http://air.ug/research.html).\n", "\n", "### Malaria Prediction in Uganda\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[[@Andrade:consistent14,@Mubangizi:malaria14]]{style=\"text-align:right\"}\n", "\n", "### Kapchorwa District\n", "\n", "\n", "\n", "### Tororo District\n", "\n", "\n", "\n", "### Malaria Prediction in Nagongera (Sentinel Site)\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Mubende District\n", "\n", "\n", "\n", "### Malaria Prediction in Uganda\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### GP School at Makerere\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Kabarole District\n", "\n", "\n", "\n", "### Early Warning Systems\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Early Warning Systems\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Additive Covariance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import Kernel" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import linear_cov" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import eq_cov" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlai import add_cov" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernel = Kernel(function=add_cov,\n", " name='Additive',\n", " shortname='add', \n", " formula='\\kernelScalar_f(\\inputVector, \\inputVector^\\prime) = \\kernelScalar_g(\\inputVector, \\inputVector^\\prime) + \\kernelScalar_h(\\inputVector, \\inputVector^\\prime)', \n", " kerns=[linear_cov, eq_cov], \n", " kern_args=[{'variance': 25}, {'lengthscale' : 0.2}])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.covariance_func(kernel=kernel, diagrams='../slides/diagrams/kern/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "$$\\kernelScalar_f(\\inputVector, \\inputVector^\\prime) = \\kernelScalar_g(\\inputVector, \\inputVector^\\prime) + \\kernelScalar_h(\\inputVector, \\inputVector^\\prime)$$\n", "
\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\\includesvgclass{../slides/diagrams/kern/add_covariance.svg}\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "{\\#\\#\\# Analysis of US Birth Rates \\#\\#\\#\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Gelman Book\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "@Gelman:bayesian13\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Basis Function Covariance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot\n", "import mlai\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "basis = mlai.Basis(function=radial, \n", " number=3,\n", " data_limits=[-0.5, 0.5], \n", " width=0.125)\n", "kernel = mlai.Kernel(function=basis_cov,\n", " name='Basis',\n", " shortname='basis', \n", " formula='\\kernel(\\inputVector, \\inputVector^\\prime) = \\basisVector(\\inputVector)^\\top \\basisVector(\\inputVector^\\prime)',\n", " basis=basis)\n", " \n", "plot.covariance_func(kernel, diagrams='../slides/diagrams/kern/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "$$\\kernel(\\inputVector, \\inputVector^\\prime) = \\basisVector(\\inputVector)^\\top \\basisVector(\\inputVector^\\prime)$$\n", "
\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\\includesvgclass{../slides/diagrams/kern/basis_covariance.svg}\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Brownian Covariance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot\n", "import mlai\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t=np.linspace(0, 2, 200)[:, np.newaxis]\n", "kernel = mlai.Kernel(function=brownian_cov,\n", " name='Brownian',\n", " formula='\\kernelScalar(t, t^\\prime)=\\alpha \\min(t, t^\\prime)',\n", " shortname='brownian')\n", "plot.covariance_func(kernel, t, diagrams='../slides/diagrams/kern/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "$$\\kernelScalar(t, t^\\prime)=\\alpha \\min(t, t^\\prime)$$\n", "
\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\\includesvgclass{../slides/diagrams/kern/brownian_covariance.svg}\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# MLP Covariance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot\n", "import mlai\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernel = mlai.Kernel(function=mlp_cov,\n", " name='Multilayer Perceptron',\n", " shortname='mlp', \n", " formula='\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\arcsin\\left(\\frac{w \\inputVector^\\top \\inputVector^\\prime + b}{\\sqrt{\\left(w \\inputVector^\\top \\inputVector + b + 1\\right)\\left(w \\left.\\inputVector^\\prime\\right.^\\top \\inputVector^\\prime + b + 1\\right)}}\\right)',\n", " w=5, b=0.5)\n", " \n", "plot.covariance_func(kernel, diagrams='../slides/diagrams/kern/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "$$\\kernelScalar(\\inputVector, \\inputVector^\\prime) = \\alpha \\arcsin\\left(\\frac{w \\inputVector^\\top \\inputVector^\\prime + b}{\\sqrt{\\left(w \\inputVector^\\top \\inputVector + b + 1\\right)\\left(w \\left.\\inputVector^\\prime\\right.^\\top \\inputVector^\\prime + b + 1\\right)}}\\right)$$\n", "
\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\\includesvgclass{../slides/diagrams/kern/mlp_covariance.svg}\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### GPSS: Gaussian Process Summer School\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "- \n", "- Next one is in Sheffield in *September 2019*.\n", "- Many lectures from past meetings available online\n", "\n", "\n", "
\n", "\n", "\\includesvgclass{../slides/diagrams/logo/gpss-logo.svg}\n", "\n", "
\n", "\n", "
\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### GPy: A Gaussian Process Framework in Python\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "### GPy: A Gaussian Process Framework in Python\n", "\n", "- BSD Licensed software base.\n", "- Wide availability of libraries, 'modern' scripting language.\n", "- Allows us to set projects to undergraduates in Comp Sci that use\n", " GPs.\n", "- Available through GitHub \n", "- Reproducible Research with Jupyter Notebook.\n", "\n", "### Features\n", "\n", "- Probabilistic-style programming (specify the model, not the\n", " algorithm).\n", "- Non-Gaussian likelihoods.\n", "- Multivariate outputs.\n", "- Dimensionality reduction.\n", "- Approximations for large data sets.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Other Software\n", "\n", "- [GPflow](https://github.com/GPflow/GPflow)\n", "- [GPyTorch](https://github.com/cornellius-gp/gpytorch)\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### MXFusion: Modular Probabilistic Programming on MXNet\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "### MxFusion\n", "\n", "\n", "\n", "\n", " \n", "\n", "
\n", "- Work by Eric Meissner and Zhenwen Dai.\n", "- Probabilistic programming.\n", "- Available on [Github](https://github.com/amzn/mxfusion)\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Acknowledgments\n", "\n", "Stefanos Eleftheriadis, John Bronskill, Hugh Salimbeni, Rich Turner,\n", "Zhenwen Dai, Javier Gonzalez, Andreas Damianou, Mark Pullin, Michael\n", "Smith, James Hensman, John Quinn, Martin Mubangizi.\n", "\n", "### Thanks!\n", "\n", "- twitter: @lawrennd\n", "- blog:\n", " [http://inverseprobability.com](http://inverseprobability.com/blog.html)\n", "\n", "### References {#references .unnumbered}" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }