{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Deep Gaussian Processes\n", "### [Neil D. Lawrence](http://inverseprobability.com), Amazon Cambridge and University of Sheffield\n", "### 2019-01-11\n", "\n", "**Abstract**: Gaussian process models provide a flexible, non-parametric approach to\n", "modelling that sustains uncertainty about the function. However,\n", "computational demands and the joint Gaussian assumption make them\n", "inappropriate for some applications. In this talk we review low rank\n", "approximations for Gaussian processes and use stochastic process\n", "composition to create non-Gaussian processes. We illustrate the models\n", "on simple regression tasks to give a sense of how uncertainty propagates\n", "through the model. We end will demonstrations on unsupervised learning\n", "of digits and motion capture data.\n", "\n", "$$\n", "\\newcommand{\\Amatrix}{\\mathbf{A}}\n", "\\newcommand{\\KL}[2]{\\text{KL}\\left( #1\\,\\|\\,#2 \\right)}\n", "\\newcommand{\\Kaast}{\\kernelMatrix_{\\mathbf{ \\ast}\\mathbf{ \\ast}}}\n", "\\newcommand{\\Kastu}{\\kernelMatrix_{\\mathbf{ \\ast} \\inducingVector}}\n", "\\newcommand{\\Kff}{\\kernelMatrix_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kfu}{\\kernelMatrix_{\\mappingFunctionVector \\inducingVector}}\n", "\\newcommand{\\Kuast}{\\kernelMatrix_{\\inducingVector \\bf\\ast}}\n", "\\newcommand{\\Kuf}{\\kernelMatrix_{\\inducingVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kuu}{\\kernelMatrix_{\\inducingVector \\inducingVector}}\n", "\\newcommand{\\Kuui}{\\Kuu^{-1}}\n", "\\newcommand{\\Qaast}{\\mathbf{Q}_{\\bf \\ast \\ast}}\n", "\\newcommand{\\Qastf}{\\mathbf{Q}_{\\ast \\mappingFunction}}\n", "\\newcommand{\\Qfast}{\\mathbf{Q}_{\\mappingFunctionVector \\bf \\ast}}\n", "\\newcommand{\\Qff}{\\mathbf{Q}_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\aMatrix}{\\mathbf{A}}\n", "\\newcommand{\\aScalar}{a}\n", "\\newcommand{\\aVector}{\\mathbf{a}}\n", "\\newcommand{\\acceleration}{a}\n", "\\newcommand{\\bMatrix}{\\mathbf{B}}\n", "\\newcommand{\\bScalar}{b}\n", "\\newcommand{\\bVector}{\\mathbf{b}}\n", "\\newcommand{\\basisFunc}{\\phi}\n", "\\newcommand{\\basisFuncVector}{\\boldsymbol{ \\basisFunc}}\n", "\\newcommand{\\basisFunction}{\\phi}\n", "\\newcommand{\\basisLocation}{\\mu}\n", "\\newcommand{\\basisMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\basisScalar}{\\basisFunction}\n", "\\newcommand{\\basisVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\activationFunction}{\\phi}\n", "\\newcommand{\\activationMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\activationScalar}{\\basisFunction}\n", "\\newcommand{\\activationVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\bigO}{\\mathcal{O}}\n", "\\newcommand{\\binomProb}{\\pi}\n", "\\newcommand{\\cMatrix}{\\mathbf{C}}\n", "\\newcommand{\\cbasisMatrix}{\\hat{\\boldsymbol{ \\Phi}}}\n", "\\newcommand{\\cdataMatrix}{\\hat{\\dataMatrix}}\n", "\\newcommand{\\cdataScalar}{\\hat{\\dataScalar}}\n", "\\newcommand{\\cdataVector}{\\hat{\\dataVector}}\n", "\\newcommand{\\centeredKernelMatrix}{\\mathbf{ \\MakeUppercase{\\centeredKernelScalar}}}\n", "\\newcommand{\\centeredKernelScalar}{b}\n", "\\newcommand{\\centeredKernelVector}{\\centeredKernelScalar}\n", "\\newcommand{\\centeringMatrix}{\\mathbf{H}}\n", "\\newcommand{\\chiSquaredDist}[2]{\\chi_{#1}^{2}\\left(#2\\right)}\n", "\\newcommand{\\chiSquaredSamp}[1]{\\chi_{#1}^{2}}\n", "\\newcommand{\\conditionalCovariance}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\coregionalizationMatrix}{\\mathbf{B}}\n", "\\newcommand{\\coregionalizationScalar}{b}\n", "\\newcommand{\\coregionalizationVector}{\\mathbf{ \\coregionalizationScalar}}\n", "\\newcommand{\\covDist}[2]{\\text{cov}_{#2}\\left(#1\\right)}\n", "\\newcommand{\\covSamp}[1]{\\text{cov}\\left(#1\\right)}\n", "\\newcommand{\\covarianceScalar}{c}\n", "\\newcommand{\\covarianceVector}{\\mathbf{ \\covarianceScalar}}\n", "\\newcommand{\\covarianceMatrix}{\\mathbf{C}}\n", "\\newcommand{\\covarianceMatrixTwo}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\croupierScalar}{s}\n", "\\newcommand{\\croupierVector}{\\mathbf{ \\croupierScalar}}\n", "\\newcommand{\\croupierMatrix}{\\mathbf{ \\MakeUppercase{\\croupierScalar}}}\n", "\\newcommand{\\dataDim}{p}\n", "\\newcommand{\\dataIndex}{i}\n", "\\newcommand{\\dataIndexTwo}{j}\n", "\\newcommand{\\dataMatrix}{\\mathbf{Y}}\n", "\\newcommand{\\dataScalar}{y}\n", "\\newcommand{\\dataSet}{\\mathcal{D}}\n", "\\newcommand{\\dataStd}{\\sigma}\n", "\\newcommand{\\dataVector}{\\mathbf{ \\dataScalar}}\n", "\\newcommand{\\decayRate}{d}\n", "\\newcommand{\\degreeMatrix}{\\mathbf{ \\MakeUppercase{\\degreeScalar}}}\n", "\\newcommand{\\degreeScalar}{d}\n", "\\newcommand{\\degreeVector}{\\mathbf{ \\degreeScalar}}\n", "% Already defined by latex\n", "%\\newcommand{\\det}[1]{\\left|#1\\right|}\n", "\\newcommand{\\diag}[1]{\\text{diag}\\left(#1\\right)}\n", "\\newcommand{\\diagonalMatrix}{\\mathbf{D}}\n", "\\newcommand{\\diff}[2]{\\frac{\\text{d}#1}{\\text{d}#2}}\n", "\\newcommand{\\diffTwo}[2]{\\frac{\\text{d}^2#1}{\\text{d}#2^2}}\n", "\\newcommand{\\displacement}{x}\n", "\\newcommand{\\displacementVector}{\\textbf{\\displacement}}\n", "\\newcommand{\\distanceMatrix}{\\mathbf{ \\MakeUppercase{\\distanceScalar}}}\n", "\\newcommand{\\distanceScalar}{d}\n", "\\newcommand{\\distanceVector}{\\mathbf{ \\distanceScalar}}\n", "\\newcommand{\\eigenvaltwo}{\\ell}\n", "\\newcommand{\\eigenvaltwoMatrix}{\\mathbf{L}}\n", "\\newcommand{\\eigenvaltwoVector}{\\mathbf{l}}\n", "\\newcommand{\\eigenvalue}{\\lambda}\n", "\\newcommand{\\eigenvalueMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\eigenvalueVector}{\\boldsymbol{ \\lambda}}\n", "\\newcommand{\\eigenvector}{\\mathbf{ \\eigenvectorScalar}}\n", "\\newcommand{\\eigenvectorMatrix}{\\mathbf{U}}\n", "\\newcommand{\\eigenvectorScalar}{u}\n", "\\newcommand{\\eigenvectwo}{\\mathbf{v}}\n", "\\newcommand{\\eigenvectwoMatrix}{\\mathbf{V}}\n", "\\newcommand{\\eigenvectwoScalar}{v}\n", "\\newcommand{\\entropy}[1]{\\mathcal{H}\\left(#1\\right)}\n", "\\newcommand{\\errorFunction}{E}\n", "\\newcommand{\\expDist}[2]{\\left<#1\\right>_{#2}}\n", "\\newcommand{\\expSamp}[1]{\\left<#1\\right>}\n", "\\newcommand{\\expectation}[1]{\\left\\langle #1 \\right\\rangle }\n", "\\newcommand{\\expectationDist}[2]{\\left\\langle #1 \\right\\rangle _{#2}}\n", "\\newcommand{\\expectedDistanceMatrix}{\\mathcal{D}}\n", "\\newcommand{\\eye}{\\mathbf{I}}\n", "\\newcommand{\\fantasyDim}{r}\n", "\\newcommand{\\fantasyMatrix}{\\mathbf{ \\MakeUppercase{\\fantasyScalar}}}\n", "\\newcommand{\\fantasyScalar}{z}\n", "\\newcommand{\\fantasyVector}{\\mathbf{ \\fantasyScalar}}\n", "\\newcommand{\\featureStd}{\\varsigma}\n", "\\newcommand{\\gammaCdf}[3]{\\mathcal{GAMMA CDF}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaDist}[3]{\\mathcal{G}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaSamp}[2]{\\mathcal{G}\\left(#1,#2\\right)}\n", "\\newcommand{\\gaussianDist}[3]{\\mathcal{N}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gaussianSamp}[2]{\\mathcal{N}\\left(#1,#2\\right)}\n", "\\newcommand{\\given}{|}\n", "\\newcommand{\\half}{\\frac{1}{2}}\n", "\\newcommand{\\heaviside}{H}\n", "\\newcommand{\\hiddenMatrix}{\\mathbf{ \\MakeUppercase{\\hiddenScalar}}}\n", "\\newcommand{\\hiddenScalar}{h}\n", "\\newcommand{\\hiddenVector}{\\mathbf{ \\hiddenScalar}}\n", "\\newcommand{\\identityMatrix}{\\eye}\n", "\\newcommand{\\inducingInputScalar}{z}\n", "\\newcommand{\\inducingInputVector}{\\mathbf{ \\inducingInputScalar}}\n", "\\newcommand{\\inducingInputMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\inducingScalar}{u}\n", "\\newcommand{\\inducingVector}{\\mathbf{ \\inducingScalar}}\n", "\\newcommand{\\inducingMatrix}{\\mathbf{U}}\n", "\\newcommand{\\inlineDiff}[2]{\\text{d}#1/\\text{d}#2}\n", "\\newcommand{\\inputDim}{q}\n", "\\newcommand{\\inputMatrix}{\\mathbf{X}}\n", "\\newcommand{\\inputScalar}{x}\n", "\\newcommand{\\inputSpace}{\\mathcal{X}}\n", "\\newcommand{\\inputVals}{\\inputVector}\n", "\\newcommand{\\inputVector}{\\mathbf{ \\inputScalar}}\n", "\\newcommand{\\iterNum}{k}\n", "\\newcommand{\\kernel}{\\kernelScalar}\n", "\\newcommand{\\kernelMatrix}{\\mathbf{K}}\n", "\\newcommand{\\kernelScalar}{k}\n", "\\newcommand{\\kernelVector}{\\mathbf{ \\kernelScalar}}\n", "\\newcommand{\\kff}{\\kernelScalar_{\\mappingFunction \\mappingFunction}}\n", "\\newcommand{\\kfu}{\\kernelVector_{\\mappingFunction \\inducingScalar}}\n", "\\newcommand{\\kuf}{\\kernelVector_{\\inducingScalar \\mappingFunction}}\n", "\\newcommand{\\kuu}{\\kernelVector_{\\inducingScalar \\inducingScalar}}\n", "\\newcommand{\\lagrangeMultiplier}{\\lambda}\n", "\\newcommand{\\lagrangeMultiplierMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\lagrangian}{L}\n", "\\newcommand{\\laplacianFactor}{\\mathbf{ \\MakeUppercase{\\laplacianFactorScalar}}}\n", "\\newcommand{\\laplacianFactorScalar}{m}\n", "\\newcommand{\\laplacianFactorVector}{\\mathbf{ \\laplacianFactorScalar}}\n", "\\newcommand{\\laplacianMatrix}{\\mathbf{L}}\n", "\\newcommand{\\laplacianScalar}{\\ell}\n", "\\newcommand{\\laplacianVector}{\\mathbf{ \\ell}}\n", "\\newcommand{\\latentDim}{q}\n", "\\newcommand{\\latentDistanceMatrix}{\\boldsymbol{ \\Delta}}\n", "\\newcommand{\\latentDistanceScalar}{\\delta}\n", "\\newcommand{\\latentDistanceVector}{\\boldsymbol{ \\delta}}\n", "\\newcommand{\\latentForce}{f}\n", "\\newcommand{\\latentFunction}{u}\n", "\\newcommand{\\latentFunctionVector}{\\mathbf{ \\latentFunction}}\n", "\\newcommand{\\latentFunctionMatrix}{\\mathbf{ \\MakeUppercase{\\latentFunction}}}\n", "\\newcommand{\\latentIndex}{j}\n", "\\newcommand{\\latentScalar}{z}\n", "\\newcommand{\\latentVector}{\\mathbf{ \\latentScalar}}\n", "\\newcommand{\\latentMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\learnRate}{\\eta}\n", "\\newcommand{\\lengthScale}{\\ell}\n", "\\newcommand{\\rbfWidth}{\\ell}\n", "\\newcommand{\\likelihoodBound}{\\mathcal{L}}\n", "\\newcommand{\\likelihoodFunction}{L}\n", "\\newcommand{\\locationScalar}{\\mu}\n", "\\newcommand{\\locationVector}{\\boldsymbol{ \\locationScalar}}\n", "\\newcommand{\\locationMatrix}{\\mathbf{M}}\n", "\\newcommand{\\variance}[1]{\\text{var}\\left( #1 \\right)}\n", "\\newcommand{\\mappingFunction}{f}\n", "\\newcommand{\\mappingFunctionMatrix}{\\mathbf{F}}\n", "\\newcommand{\\mappingFunctionTwo}{g}\n", "\\newcommand{\\mappingFunctionTwoMatrix}{\\mathbf{G}}\n", "\\newcommand{\\mappingFunctionTwoVector}{\\mathbf{ \\mappingFunctionTwo}}\n", "\\newcommand{\\mappingFunctionVector}{\\mathbf{ \\mappingFunction}}\n", "\\newcommand{\\scaleScalar}{s}\n", "\\newcommand{\\mappingScalar}{w}\n", "\\newcommand{\\mappingVector}{\\mathbf{ \\mappingScalar}}\n", "\\newcommand{\\mappingMatrix}{\\mathbf{W}}\n", "\\newcommand{\\mappingScalarTwo}{v}\n", "\\newcommand{\\mappingVectorTwo}{\\mathbf{ \\mappingScalarTwo}}\n", "\\newcommand{\\mappingMatrixTwo}{\\mathbf{V}}\n", "\\newcommand{\\maxIters}{K}\n", "\\newcommand{\\meanMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanScalar}{\\mu}\n", "\\newcommand{\\meanTwoMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanTwoScalar}{m}\n", "\\newcommand{\\meanTwoVector}{\\mathbf{ \\meanTwoScalar}}\n", "\\newcommand{\\meanVector}{\\boldsymbol{ \\meanScalar}}\n", "\\newcommand{\\mrnaConcentration}{m}\n", "\\newcommand{\\naturalFrequency}{\\omega}\n", "\\newcommand{\\neighborhood}[1]{\\mathcal{N}\\left( #1 \\right)}\n", "\\newcommand{\\neilurl}{http://inverseprobability.com/}\n", "\\newcommand{\\noiseMatrix}{\\boldsymbol{ E}}\n", "\\newcommand{\\noiseScalar}{\\epsilon}\n", "\\newcommand{\\noiseVector}{\\boldsymbol{ \\epsilon}}\n", "\\newcommand{\\norm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\normalizedLaplacianMatrix}{\\hat{\\mathbf{L}}}\n", "\\newcommand{\\normalizedLaplacianScalar}{\\hat{\\ell}}\n", "\\newcommand{\\normalizedLaplacianVector}{\\hat{\\mathbf{ \\ell}}}\n", "\\newcommand{\\numActive}{m}\n", "\\newcommand{\\numBasisFunc}{m}\n", "\\newcommand{\\numComponents}{m}\n", "\\newcommand{\\numComps}{K}\n", "\\newcommand{\\numData}{n}\n", "\\newcommand{\\numFeatures}{K}\n", "\\newcommand{\\numHidden}{h}\n", "\\newcommand{\\numInducing}{m}\n", "\\newcommand{\\numLayers}{\\ell}\n", "\\newcommand{\\numNeighbors}{K}\n", "\\newcommand{\\numSequences}{s}\n", "\\newcommand{\\numSuccess}{s}\n", "\\newcommand{\\numTasks}{m}\n", "\\newcommand{\\numTime}{T}\n", "\\newcommand{\\numTrials}{S}\n", "\\newcommand{\\outputIndex}{j}\n", "\\newcommand{\\paramVector}{\\boldsymbol{ \\theta}}\n", "\\newcommand{\\parameterMatrix}{\\boldsymbol{ \\Theta}}\n", "\\newcommand{\\parameterScalar}{\\theta}\n", "\\newcommand{\\parameterVector}{\\boldsymbol{ \\parameterScalar}}\n", "\\newcommand{\\partDiff}[2]{\\frac{\\partial#1}{\\partial#2}}\n", "\\newcommand{\\precisionScalar}{j}\n", "\\newcommand{\\precisionVector}{\\mathbf{ \\precisionScalar}}\n", "\\newcommand{\\precisionMatrix}{\\mathbf{J}}\n", "\\newcommand{\\pseudotargetScalar}{\\widetilde{y}}\n", "\\newcommand{\\pseudotargetVector}{\\mathbf{ \\pseudotargetScalar}}\n", "\\newcommand{\\pseudotargetMatrix}{\\mathbf{ \\widetilde{Y}}}\n", "\\newcommand{\\rank}[1]{\\text{rank}\\left(#1\\right)}\n", "\\newcommand{\\rayleighDist}[2]{\\mathcal{R}\\left(#1|#2\\right)}\n", "\\newcommand{\\rayleighSamp}[1]{\\mathcal{R}\\left(#1\\right)}\n", "\\newcommand{\\responsibility}{r}\n", "\\newcommand{\\rotationScalar}{r}\n", "\\newcommand{\\rotationVector}{\\mathbf{ \\rotationScalar}}\n", "\\newcommand{\\rotationMatrix}{\\mathbf{R}}\n", "\\newcommand{\\sampleCovScalar}{s}\n", "\\newcommand{\\sampleCovVector}{\\mathbf{ \\sampleCovScalar}}\n", "\\newcommand{\\sampleCovMatrix}{\\mathbf{s}}\n", "\\newcommand{\\scalarProduct}[2]{\\left\\langle{#1},{#2}\\right\\rangle}\n", "\\newcommand{\\sign}[1]{\\text{sign}\\left(#1\\right)}\n", "\\newcommand{\\sigmoid}[1]{\\sigma\\left(#1\\right)}\n", "\\newcommand{\\singularvalue}{\\ell}\n", "\\newcommand{\\singularvalueMatrix}{\\mathbf{L}}\n", "\\newcommand{\\singularvalueVector}{\\mathbf{l}}\n", "\\newcommand{\\sorth}{\\mathbf{u}}\n", "\\newcommand{\\spar}{\\lambda}\n", "\\newcommand{\\trace}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\BasalRate}{B}\n", "\\newcommand{\\DampingCoefficient}{C}\n", "\\newcommand{\\DecayRate}{D}\n", "\\newcommand{\\Displacement}{X}\n", "\\newcommand{\\LatentForce}{F}\n", "\\newcommand{\\Mass}{M}\n", "\\newcommand{\\Sensitivity}{S}\n", "\\newcommand{\\basalRate}{b}\n", "\\newcommand{\\dampingCoefficient}{c}\n", "\\newcommand{\\mass}{m}\n", "\\newcommand{\\sensitivity}{s}\n", "\\newcommand{\\springScalar}{\\kappa}\n", "\\newcommand{\\springVector}{\\boldsymbol{ \\kappa}}\n", "\\newcommand{\\springMatrix}{\\boldsymbol{ \\mathcal{K}}}\n", "\\newcommand{\\tfConcentration}{p}\n", "\\newcommand{\\tfDecayRate}{\\delta}\n", "\\newcommand{\\tfMrnaConcentration}{f}\n", "\\newcommand{\\tfVector}{\\mathbf{ \\tfConcentration}}\n", "\\newcommand{\\velocity}{v}\n", "\\newcommand{\\sufficientStatsScalar}{g}\n", "\\newcommand{\\sufficientStatsVector}{\\mathbf{ \\sufficientStatsScalar}}\n", "\\newcommand{\\sufficientStatsMatrix}{\\mathbf{G}}\n", "\\newcommand{\\switchScalar}{s}\n", "\\newcommand{\\switchVector}{\\mathbf{ \\switchScalar}}\n", "\\newcommand{\\switchMatrix}{\\mathbf{S}}\n", "\\newcommand{\\tr}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\loneNorm}[1]{\\left\\Vert #1 \\right\\Vert_1}\n", "\\newcommand{\\ltwoNorm}[1]{\\left\\Vert #1 \\right\\Vert_2}\n", "\\newcommand{\\onenorm}[1]{\\left\\vert#1\\right\\vert_1}\n", "\\newcommand{\\twonorm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\vScalar}{v}\n", "\\newcommand{\\vVector}{\\mathbf{v}}\n", "\\newcommand{\\vMatrix}{\\mathbf{V}}\n", "\\newcommand{\\varianceDist}[2]{\\text{var}_{#2}\\left( #1 \\right)}\n", "% Already defined by latex\n", "%\\newcommand{\\vec}{#1:}\n", "\\newcommand{\\vecb}[1]{\\left(#1\\right):}\n", "\\newcommand{\\weightScalar}{w}\n", "\\newcommand{\\weightVector}{\\mathbf{ \\weightScalar}}\n", "\\newcommand{\\weightMatrix}{\\mathbf{W}}\n", "\\newcommand{\\weightedAdjacencyMatrix}{\\mathbf{A}}\n", "\\newcommand{\\weightedAdjacencyScalar}{a}\n", "\\newcommand{\\weightedAdjacencyVector}{\\mathbf{ \\weightedAdjacencyScalar}}\n", "\\newcommand{\\onesVector}{\\mathbf{1}}\n", "\\newcommand{\\zerosVector}{\\mathbf{0}}\n", "$$\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "$=f\\Bigg($$\\Bigg)$\n", "\n", "
\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Approximations\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Approximations\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Approximations\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Approximations\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Low Rank Motivation\n", "\n", "- Inference in a GP has the following demands:\n", "\n", " ------------- ---------------------\n", " Complexity: $\\bigO(\\numData^3)$\n", " Storage: $\\bigO(\\numData^2)$\n", " ------------- ---------------------\n", "\n", "- Inference in a low rank GP has the following demands:\n", "\n", " ------------- ---------------------------------\n", " Complexity: $\\bigO(\\numData\\numInducing^2)$\n", " Storage: $\\bigO(\\numData\\numInducing)$\n", " ------------- ---------------------------------\n", "\n", "where $\\numInducing$ is a user chosen parameter.\n", "\n", "@Snelson:pseudo05,@Quinonero:unifying05,@Lawrence:larger07,@Titsias:variational09,@Thang:unifying17\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Variational Compression\n", "\n", "- Inducing variables are a compression of the real observations.\n", "- They are like pseudo-data. They can be in space of\n", " $\\mappingFunctionVector$ or a space that is related through a linear\n", " operator [@Alvarez:efficient10] — e.g. a gradient or convolution.\n", "\n", "### Variational Compression II\n", "\n", "- Introduce *inducing* variables.\n", "- Compress information into the inducing variables and avoid the need\n", " to store all the data.\n", "- Allow for scaling e.g. stochastic variational @Hensman:bigdata13 or\n", " parallelization @Gal:distributed14,@Dai:gpu14, @Seeger:auto17\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Nonparametric Gaussian Processes\n", "\n", "- We’ve seen how we go from parametric to non-parametric.\n", "\n", "- The limit implies infinite dimensional $\\mappingVector$.\n", "\n", "- Gaussian processes are generally non-parametric: combine data with\n", " covariance function to get model.\n", "\n", "- This representation *cannot* be summarized by a parameter vector of\n", " a fixed size.\n", "\n", "### The Parametric Bottleneck\n", "\n", "- Parametric models have a representation that does not respond to\n", " increasing training set size.\n", "\n", "- Bayesian posterior distributions over parameters contain the\n", " information about the training data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "- Use Bayes’ rule from training data,\n", " $p\\left(\\mappingVector|\\dataVector, \\inputMatrix\\right)$,\n", "\n", "- Make predictions on test data\n", " $$p\\left(\\dataScalar_*|\\inputMatrix_*, \\dataVector, \\inputMatrix\\right) = \\int\n", " p\\left(\\dataScalar_*|\\mappingVector,\\inputMatrix_*\\right)p\\left(\\mappingVector|\\dataVector,\n", " \\inputMatrix)\\text{d}\\mappingVector\\right).$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- $\\mappingVector$ becomes a bottleneck for information about the\n", " training set to pass to the test set.\n", "\n", "- Solution: increase $\\numBasisFunc$ so that the bottleneck is so\n", " large that it no longer presents a problem.\n", "\n", "- How big is big enough for $\\numBasisFunc$? Non-parametrics says\n", " $\\numBasisFunc \\rightarrow \\infty$.\n", "\n", "### The Parametric Bottleneck\n", "\n", "- Now no longer possible to manipulate the model through the standard\n", " parametric form.\n", "\n", ". . .\n", "\n", "- However, it *is* possible to express *parametric* as GPs:\n", " $$\\kernelScalar\\left(\\inputVector_i,\\inputVector_j\\right)=\\basisFunction_:\\left(\\inputVector_i\\right)^\\top\\basisFunction_:\\left(\\inputVector_j\\right).$$\n", "\n", ". . .\n", "\n", "- These are known as degenerate covariance matrices.\n", "\n", ". . .\n", "\n", "- Their rank is at most $\\numBasisFunc$, non-parametric models have\n", " full rank covariance matrices.\n", "\n", ". . .\n", "\n", "- Most well known is the “linear kernel”,\n", " $\\kernelScalar(\\inputVector_i, \\inputVector_j) = \\inputVector_i^\\top\\inputVector_j$.\n", "\n", "### Making Predictions\n", "\n", "- For non-parametrics prediction at new points\n", " $\\mappingFunctionVector_*$ is made by conditioning on\n", " $\\mappingFunctionVector$ in the joint distribution.\n", "\n", ". . .\n", "\n", "- In GPs this involves combining the training data with the covariance\n", " function and the mean function.\n", "\n", ". . .\n", "\n", "- Parametric is a special case when conditional prediction can be\n", " summarized in a *fixed* number of parameters.\n", "\n", ". . .\n", "\n", "- Complexity of parametric model remains fixed regardless of the size\n", " of our training data set.\n", "\n", ". . .\n", "\n", "- For a non-parametric model the required number of parameters grows\n", " with the size of the training data.\n", "\n", "### Augment Variable Space\n", "\n", "- Augment variable space with inducing observations, $\\inducingVector$\n", " $$\n", " \\begin{bmatrix}\n", " \\mappingFunctionVector\\\\\n", " \\inducingVector\n", " \\end{bmatrix} \\sim \\gaussianSamp{\\zerosVector}{\\kernelMatrix}\n", " $$ with $$\n", " \\kernelMatrix =\n", " \\begin{bmatrix}\n", " \\Kff & \\Kfu \\\\\n", " \\Kuf & \\Kuu\n", " \\end{bmatrix}\n", " $$\n", "\n", "### Joint Density\n", "\n", "$$\n", "p(\\mappingFunctionVector, \\inducingVector) = p(\\mappingFunctionVector| \\inducingVector) p(\\inducingVector)\n", "$$ to augment our model $$\n", "\\dataScalar(\\inputVector) = \\mappingFunction(\\inputVector) + \\noiseScalar,\n", "$$ giving $$\n", "p(\\dataVector) = \\int p(\\dataVector|\\mappingFunctionVector) p(\\mappingFunctionVector) \\text{d}\\mappingFunctionVector,\n", "$$ where for the independent case we have\n", "$p(\\dataVector | \\mappingFunctionVector) = \\prod_{i=1}^\\numData p(\\dataScalar_i|\\mappingFunction_i)$.\n", "\n", "### Auxilliary Variables\n", "\n", "$$\n", "p(\\dataVector) = \\int p(\\dataVector|\\mappingFunctionVector) p(\\mappingFunctionVector|\\inducingVector) p(\\inducingVector) \\text{d}\\inducingVector \\text{d}\\mappingFunctionVector.\n", "$$ Integrating over $\\mappingFunctionVector$ $$\n", "p(\\dataVector) = \\int p(\\dataVector|\\inducingVector) p(\\inducingVector) \\text{d}\\inducingVector.\n", "$$\n", "\n", "### Parametric Comparison\n", "\n", "$$\n", "\\dataScalar(\\inputVector) = \\weightVector^\\top\\basisVector(\\inputVector) + \\noiseScalar\n", "$$\n", "\n", "$$\n", "p(\\dataVector) = \\int p(\\dataVector|\\weightVector) p(\\weightVector) \\text{d} \\weightVector\n", "$$\n", "\n", "$$\n", "p(\\dataVector^*|\\dataVector) = \\int p(\\dataVector^*|\\weightVector) p(\\weightVector|\\dataVector) \\text{d} \\weightVector\n", "$$\n", "\n", "### New Form\n", "\n", "$$\n", "p(\\dataVector^*|\\dataVector) = \\int p(\\dataVector^*|\\inducingVector) p(\\inducingVector|\\dataVector) \\text{d} \\inducingVector\n", "$$\n", "\n", "- but $\\inducingVector$ is not a *parameter*\n", "\n", "- Unfortunately computing $p(\\dataVector|\\inducingVector)$ is\n", " intractable\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Variational Bound on $p(\\dataVector |\\inducingVector)$\n", "\n", "$$\n", "\\begin{aligned}\n", " \\log p(\\dataVector|\\inducingVector) & = \\log \\int p(\\dataVector|\\mappingFunctionVector) p(\\mappingFunctionVector|\\inducingVector) \\text{d}\\mappingFunctionVector\\\\ & = \\int q(\\mappingFunctionVector) \\log \\frac{p(\\dataVector|\\mappingFunctionVector) p(\\mappingFunctionVector|\\inducingVector)}{q(\\mappingFunctionVector)}\\text{d}\\mappingFunctionVector + \\KL{q(\\mappingFunctionVector)}{p(\\mappingFunctionVector|\\dataVector, \\inducingVector)}.\n", "\\end{aligned}\n", "$$\n", "\n", "### Choose form for $q(\\cdot)$\n", "\n", "- Set\n", " $q(\\mappingFunctionVector)=p(\\mappingFunctionVector|\\inducingVector)$,\n", " $$\n", " \\log p(\\dataVector|\\inducingVector) \\geq \\int p(\\mappingFunctionVector|\\inducingVector) \\log p(\\dataVector|\\mappingFunctionVector)\\text{d}\\mappingFunctionVector.\n", " $$ $$\n", " p(\\dataVector|\\inducingVector) \\geq \\exp \\int p(\\mappingFunctionVector|\\inducingVector) \\log p(\\dataVector|\\mappingFunctionVector)\\text{d}\\mappingFunctionVector.\n", " $$ [\\small [@Titsias:variational09]]{style=\"text-align:right\"}\n", "\n", "### Optimal Compression in Inducing Variables\n", "\n", "- Maximizing lower bound minimizes the KL divergence (information\n", " gain): $$\n", " \\KL{p(\\mappingFunctionVector|\\inducingVector)}{p(\\mappingFunctionVector|\\dataVector, \\inducingVector)} = \\int p(\\mappingFunctionVector|\\inducingVector) \\log \\frac{p(\\mappingFunctionVector|\\inducingVector)}{p(\\mappingFunctionVector|\\dataVector, \\inducingVector)}\\text{d}\\inducingVector\n", " $$\n", "\n", "- This is minimized when the information stored about $\\dataVector$ is\n", " stored already in $\\inducingVector$.\n", "\n", "- The bound seeks an *optimal compression* from the *information gain*\n", " perspective.\n", "- If $\\inducingVector = \\mappingFunctionVector$ bound is exact\n", " ($\\mappingFunctionVector$ $d$-separates $\\dataVector$ from\n", " $\\inducingVector$).\n", "\n", "### Choice of Inducing Variables\n", "\n", "- Free to choose whatever heuristics for the inducing variables.\n", "- Can quantify which heuristics perform better through checking lower\n", " bound.\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\mappingFunctionVector\\\\\n", "\\inducingVector\n", "\\end{bmatrix} \\sim \\gaussianSamp{\\zerosVector}{\\kernelMatrix}\n", "$$ with $$\n", "\\kernelMatrix =\n", "\\begin{bmatrix}\n", "\\Kff & \\Kfu \\\\\n", "\\Kuf & \\Kuu\n", "\\end{bmatrix}\n", "$$\n", "\n", "### Variational Compression\n", "\n", "- Inducing variables are a compression of the real observations.\n", "- They are like pseudo-data. They can be in space of\n", " $\\mappingFunctionVector$ or a space that is related through a linear\n", " operator [@Alvarez:efficient10] — e.g. a gradient or convolution.\n", "\n", "### Variational Compression II\n", "\n", "- Resulting algorithms reduce computational complexity.\n", "- Also allow deployment of more standard scaling techniques.\n", "- E.g. Stochastic variational inference @Hoffman:stochastic12\n", "- Allow for scaling e.g. stochastic variational @Hensman:bigdata13 or\n", " parallelization @Gal:distributed14,@Dai:gpu14, @Seeger:auto17\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai\n", "import teaching_plots as plot \n", "from gp_tutorial import gpplot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2)\n", "xlim = ax.get_xlim()\n", "ylim = ax.get_ylim()\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/sparse-demo-full-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Full Gaussian Process Fit\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2, xlim=xlim, ylim=ylim)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/sparse-demo-constrained-inducing-6-unlearned-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inducing Variable Fit\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2, xlim=xlim, ylim=ylim)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/sparse-demo-constrained-inducing-6-learned-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inducing Variable Param Optimize\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2,xlim=xlim, ylim=ylim)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/sparse-demo-unconstrained-inducing-6-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inducing Variable Full Optimize\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, ax=ax, xlabel='$x$', ylabel='$y$', fontsize=20, portion=0.2, xlim=xlim, ylim=ylim)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/sparse-demo-sparse-inducing-8-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Eight Optimized Inducing Variables\n", "\n", "\n", "\n", "### Full Gaussian Process Fit\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Leads to Other Approximations ...\n", "\n", "- Let’s be explicity about storing approximate posterior of\n", " $\\inducingVector$, $q(\\inducingVector)$.\n", "- Now we have\n", " $$p(\\dataVector^*|\\dataVector) = \\int p(\\dataVector^*| \\inducingVector) q(\\inducingVector | \\dataVector) \\text{d} \\inducingVector$$\n", "\n", "### Leads to Other Approximations ...\n", "\n", "- Inducing variables look a lot like regular parameters.\n", "- *But*: their dimensionality does not need to be set at design time.\n", "- They can be modified arbitrarily at run time without effecting the\n", " model likelihood.\n", "- They only effect the quality of compression and the lower bound.\n", "\n", "### In GPs for Big Data\n", "\n", "- Exploit the resulting factorization ...\n", " $$p(\\dataVector^*|\\dataVector) = \\int p(\\dataVector^*| \\inducingVector) q(\\inducingVector | \\dataVector) \\text{d} \\inducingVector$$\n", "- The distribution now *factorizes*:\n", " $$p(\\dataVector^*|\\dataVector) = \\int \\prod_{i=1}^{\\numData^*}p(\\dataScalar^*_i| \\inducingVector) q(\\inducingVector | \\dataVector) \\text{d} \\inducingVector$$\n", "- This factorization can be exploited for stochastic variational\n", " inference [@Hoffman:stochastic12].\n", "\n", "### Nonparametrics for Very Large Data Sets\n", "\n", "
\n", "Modern data availability\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Nonparametrics for Very Large Data Sets\n", "\n", "
\n", "Proxy for index of deprivation?\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Nonparametrics for Very Large Data Sets\n", "\n", "
\n", "Actually index of deprivation is a proxy for this ...\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "\\catdoc\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "[[@Hensman:bigdata13]]{style=\"text-align:left\"}\n", "\n", "[]{style=\"text-align:right\"}\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "### \n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "[[@Hensman:bigdata13]]{style=\"text-align:left\"}\n", "\n", "[]{style=\"text-align:right\"}\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "### Modern Review\n", "\n", "- *A Unifying Framework for Gaussian Process Pseudo-Point\n", " Approximations using Power Expectation Propagation*\n", " @Thang:unifying17\n", "\n", "- *Deep Gaussian Processes and Variational Propagation of Uncertainty*\n", " @Damianou:thesis2015\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Structure of Priors\n", "\n", "MacKay: NeurIPS Tutorial 1997 “Have we thrown out the baby with the\n", "bathwater?” [Published as @MacKay:gpintroduction98]\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.deep_nn(diagrams='../slides/diagrams/deepgp/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deep Neural Network\n", "\n", "\n", "\n", "### Deep Neural Network\n", "\n", "\n", "\n", "### Mathematically\n", "\n", "$$\n", "\\begin{align}\n", " \\hiddenVector_{1} &= \\basisFunction\\left(\\mappingMatrix_1 \\inputVector\\right)\\\\\n", " \\hiddenVector_{2} &= \\basisFunction\\left(\\mappingMatrix_2\\hiddenVector_{1}\\right)\\\\\n", " \\hiddenVector_{3} &= \\basisFunction\\left(\\mappingMatrix_3 \\hiddenVector_{2}\\right)\\\\\n", " \\dataVector &= \\mappingVector_4 ^\\top\\hiddenVector_{3}\n", "\\end{align}\n", "$$\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\\#\\#\\# Overfitting\n", "\n", "- Potential problem: if number of nodes in two adjacent layers is big,\n", " corresponding $\\mappingMatrix$ is also very big and there is the\n", " potential to overfit.\n", "\n", "- Proposed solution: “dropout”.\n", "\n", "- Alternative solution: parameterize $\\mappingMatrix$ with its SVD. $$\n", " \\mappingMatrix = \\eigenvectorMatrix\\eigenvalueMatrix\\eigenvectwoMatrix^\\top\n", " $$ or $$\n", " \\mappingMatrix = \\eigenvectorMatrix\\eigenvectwoMatrix^\\top\n", " $$ where if $\\mappingMatrix \\in \\Re^{k_1\\times k_2}$ then\n", " $\\eigenvectorMatrix\\in \\Re^{k_1\\times q}$ and\n", " $\\eigenvectwoMatrix \\in \\Re^{k_2\\times q}$, i.e. we have a low rank\n", " matrix factorization for the weights." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.low_rank_approximation(diagrams='../slides/diagrams')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Low Rank Approximation\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.deep_nn_bottleneck(diagrams='../slides/diagrams/deepgp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deep Neural Network\n", "\n", "\n", "\n", "### Deep Neural Network\n", "\n", "\n", "\n", "### Mathematically\n", "\n", "The network can now be written mathematically as $$\n", "\\begin{align}\n", " \\latentVector_{1} &= \\eigenvectwoMatrix^\\top_1 \\inputVector\\\\\n", " \\hiddenVector_{1} &= \\basisFunction\\left(\\eigenvectorMatrix_1 \\latentVector_{1}\\right)\\\\\n", " \\latentVector_{2} &= \\eigenvectwoMatrix^\\top_2 \\hiddenVector_{1}\\\\\n", " \\hiddenVector_{2} &= \\basisFunction\\left(\\eigenvectorMatrix_2 \\latentVector_{2}\\right)\\\\\n", " \\latentVector_{3} &= \\eigenvectwoMatrix^\\top_3 \\hiddenVector_{2}\\\\\n", " \\hiddenVector_{3} &= \\basisFunction\\left(\\eigenvectorMatrix_3 \\latentVector_{3}\\right)\\\\\n", " \\dataVector &= \\mappingVector_4^\\top\\hiddenVector_{3}.\n", "\\end{align}\n", "$$\n", "\n", "### A Cascade of Neural Networks\n", "\n", "$$\n", "\\begin{align}\n", " \\latentVector_{1} &= \\eigenvectwoMatrix^\\top_1 \\inputVector\\\\\n", " \\latentVector_{2} &= \\eigenvectwoMatrix^\\top_2 \\basisFunction\\left(\\eigenvectorMatrix_1 \\latentVector_{1}\\right)\\\\\n", " \\latentVector_{3} &= \\eigenvectwoMatrix^\\top_3 \\basisFunction\\left(\\eigenvectorMatrix_2 \\latentVector_{2}\\right)\\\\\n", " \\dataVector &= \\mappingVector_4 ^\\top \\latentVector_{3}\n", "\\end{align}\n", "$$\n", "\n", "### Cascade of Gaussian Processes\n", "\n", "- Replace each neural network with a Gaussian process $$\n", " \\begin{align}\n", " \\latentVector_{1} &= \\mappingFunctionVector_1\\left(\\inputVector\\right)\\\\\n", " \\latentVector_{2} &= \\mappingFunctionVector_2\\left(\\latentVector_{1}\\right)\\\\\n", " \\latentVector_{3} &= \\mappingFunctionVector_3\\left(\\latentVector_{2}\\right)\\\\\n", " \\dataVector &= \\mappingFunctionVector_4\\left(\\latentVector_{3}\\right)\n", " \\end{align}\n", " $$\n", "\n", "- Equivalent to prior over parameters, take width of each layer to\n", " infinity.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Deep Face\n", "\n", "[Outline of the DeepFace architecture. A front-end of a single\n", "convolution-pooling-convolution filtering on the rectified input,\n", "followed by three locally-connected layers and two fully-connected\n", "layers. Color illustrates feature maps produced at each layer. The net\n", "includes more than 120 million parameters, where more than 95% come from\n", "the local and fully connected.]{.fragment .fade-in}\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[Source: DeepFace\n", "[@Taigman:deepface14]]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Deep Learning as Pinball\n", "\n", "
\n", "\n", "\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('pinball{sample:0>3}.svg', \n", " '../slides/diagrams',\n", " sample=IntSlider(1, 1, 2, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Mathematically\n", "\n", "- Composite *multivariate* function\n", "\n", "$$\n", " \\mathbf{g}(\\inputVector)=\\mappingFunctionVector_5(\\mappingFunctionVector_4(\\mappingFunctionVector_3(\\mappingFunctionVector_2(\\mappingFunctionVector_1(\\inputVector))))).\n", " $$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import rc\n", "\n", "rc(\"font\", **{'family':'sans-serif','sans-serif':['Helvetica'],'size':30})\n", "rc(\"text\", usetex=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pgm = plot.horizontal_chain(depth=5)\n", "pgm.render().figure.savefig(\"../slides/diagrams/deepgp/deep-markov.svg\", transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Equivalent to Markov Chain\n", "\n", "- Composite *multivariate* function $$\n", " p(\\dataVector|\\inputVector)= p(\\dataVector|\\mappingFunctionVector_5)p(\\mappingFunctionVector_5|\\mappingFunctionVector_4)p(\\mappingFunctionVector_4|\\mappingFunctionVector_3)p(\\mappingFunctionVector_3|\\mappingFunctionVector_2)p(\\mappingFunctionVector_2|\\mappingFunctionVector_1)p(\\mappingFunctionVector_1|\\inputVector)\n", " $$\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import rc\n", "rc(\"font\", **{'family':'sans-serif','sans-serif':['Helvetica'], 'size':15})\n", "rc(\"text\", usetex=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pgm = plot.vertical_chain(depth=5)\n", "pgm.render().figure.savefig(\"../slides/diagrams/deepgp/deep-markov-vertical.svg\", transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n", "\n", "\n", "\n", "### Why Deep?\n", "\n", "- Gaussian processes give priors over functions.\n", "\n", "- Elegant properties:\n", "- e.g. *Derivatives* of process are also Gaussian distributed (if they\n", " exist).\n", "\n", "- For particular covariance functions they are ‘universal\n", " approximators’, i.e. all functions can have support under the prior.\n", "\n", "- Gaussian derivatives might ring alarm bells.\n", "\n", "- E.g. a priori they don’t believe in function ‘jumps’.\n", "\n", "### Stochastic Process Composition\n", "\n", "- From a process perspective: *process composition*.\n", "\n", "- A (new?) way of constructing more complex *processes* based on\n", " simpler components.\n", "\n", "### \n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pgm = plot.vertical_chain(depth=5, shape=[2, 7])\n", "pgm.add_node(daft.Node('y_2', r'$\\mathbf{y}_2$', 1.5, 3.5, observed=True))\n", "pgm.add_edge('f_2', 'y_2')\n", "pgm.render().figure.savefig(\"../slides/diagrams/deepgp/deep-markov-vertical-side.svg\", transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.non_linear_difficulty_plot_3(diagrams='../../slides/diagrams/dimred/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Difficulty for Probabilistic Approaches\n", "\n", "- Propagate a probability distribution through a non-linear mapping.\n", "\n", "- Normalisation of distribution becomes intractable.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.non_linear_difficulty_plot_2(diagrams='../../slides/diagrams/dimred/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Difficulty for Probabilistic Approaches\n", "\n", "- Propagate a probability distribution through a non-linear mapping.\n", "\n", "- Normalisation of distribution becomes intractable.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.non_linear_difficulty_plot_1(diagrams='../../slides/diagrams/dimred')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Difficulty for Probabilistic Approaches\n", "\n", "- Propagate a probability distribution through a non-linear mapping.\n", "\n", "- Normalisation of distribution becomes intractable.\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Deep Gaussian Processes\n", "\n", "- Deep architectures allow abstraction of features\n", " [@Bengio:deep09; @Hinton:fast06; @Salakhutdinov:quantitative08]\n", "- We use variational approach to stack GP models.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.stack_gp_sample(kernel=GPy.kern.Linear,\n", " diagrams=\"../../slides/diagrams/deepgp\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('stack-gp-sample-Linear-{sample:0>1}.svg', \n", " directory='../../slides/diagrams/deepgp', sample=(0,4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stacked PCA\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Stacked GP" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.stack_gp_sample(kernel=GPy.kern.RBF,\n", " diagrams=\"../../slides/diagrams/deepgp\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('stack-gp-sample-RBF-{sample:0>1}.svg', \n", " directory='../../slides/diagrams/deepgp', sample=(0,4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Analysis of Deep GPs\n", "\n", "- *Avoiding pathologies in very deep networks* @Duvenaud:pathologies14\n", " show that the derivative distribution of the process becomes more\n", " *heavy tailed* as number of layers increase.\n", "\n", "- *How Deep Are Deep Gaussian Processes?* @Dunlop:deep2017 perform a\n", " theoretical analysis possible through conditional Gaussian Markov\n", " property.\n", "\n", "###" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('XhIvygQYFFQ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### GPy: A Gaussian Process Framework in Python\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "### GPy: A Gaussian Process Framework in Python\n", "\n", "- BSD Licensed software base.\n", "- Wide availability of libraries, 'modern' scripting language.\n", "- Allows us to set projects to undergraduates in Comp Sci that use\n", " GPs.\n", "- Available through GitHub \n", "- Reproducible Research with Jupyter Notebook.\n", "\n", "### Features\n", "\n", "- Probabilistic-style programming (specify the model, not the\n", " algorithm).\n", "- Non-Gaussian likelihoods.\n", "- Multivariate outputs.\n", "- Dimensionality reduction.\n", "- Approximations for large data sets.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Olympic Marathon Data\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "- Gold medal times for Olympic Marathon since 1896.\n", "- Marathons before 1924 didn’t have a standardised distance.\n", "- Present results using pace per km.\n", "- In 1904 Marathon was badly organised leading to very slow times.\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "Image from Wikimedia Commons \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "xlim = (1875,2030)\n", "ylim = (2.5, 6.5)\n", "yhat = (y-offset)/scale\n", "\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x, y, 'r.',markersize=10)\n", "ax.set_xlabel('year', fontsize=20)\n", "ax.set_ylabel('pace min/km', fontsize=20)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "\n", "mlai.write_figure(figure=fig, \n", " filename='../slides/diagrams/datasets/olympic-marathon.svg', \n", " transparent=True, \n", " frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Alan Turing\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "### Probability Winning Olympics?\n", "\n", "- He was a formidable Marathon runner.\n", "- In 1946 he ran a time 2 hours 46 minutes.\n", " - That's a pace of 3.95 min/km.\n", "- What is the probability he would have won an Olympics if one had\n", " been held in 1946?\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/olympic-marathon-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data GP\n", "\n", "\n", "\n", "### Deep GP Fit\n", "\n", "- Can a Deep Gaussian process help?\n", "\n", "- Deep GP is one GP feeding into another." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, scale=scale, offset=offset, ax=ax, xlabel='year', ylabel='pace min/km', \n", " fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "\n", "ax.set_ylim(ylim)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data Deep GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax, \n", " xlabel='year', ylabel='pace min/km', portion = 0.225)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp-samples.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data Deep GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m.visualize(scale=scale, offset=offset, xlabel='year',\n", " ylabel='pace min/km',xlim=xlim, ylim=ylim,\n", " dataset='olympic-marathon',\n", " diagrams='../slides/diagrams/deepgp')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pods.notebook.display_plots('olympic-marathon-deep-gp-layer-{sample:0>1}.svg', \n", " '../slides/diagrams/deepgp', sample=(0,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data Latent 1\n", "\n", "\n", "\n", "### Olympic Marathon Data Latent 2\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "m.visualize_pinball(ax=ax, scale=scale, offset=offset, points=30, portion=0.1,\n", " xlabel='year', ylabel='pace km/min', vertical=True)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/olympic-marathon-deep-gp-pinball.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Pinball Plot\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Della Gatta Gene Data\n", "\n", "- Given given expression levels in the form of a time series from\n", " @DellaGatta:direct08." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "xlim = (-20,260)\n", "ylim = (5, 7.5)\n", "yhat = (y-offset)/scale\n", "\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x, y, 'r.',markersize=10)\n", "ax.set_xlabel('time/min', fontsize=20)\n", "ax.set_ylabel('expression', fontsize=20)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "\n", "mlai.write_figure(figure=fig, \n", " filename='../slides/diagrams/datasets/della-gatta-gene.svg', \n", " transparent=True, \n", " frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Della Gatta Gene Data\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Gene Expression Example\n", "\n", "- Want to detect if a gene is expressed or not, fit a GP to each gene\n", " @Kalaitzis:simple11.\n", "\n", "### \n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "###" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full.log_likelihood()), fontsize=20)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/della-gatta-gene-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full2, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full2.log_likelihood()), fontsize=20)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/della-gatta-gene-gp2.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full3, scale=scale, offset=offset, ax=ax, xlabel='time/min', ylabel='expression', fontsize=20, portion=0.2)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "ax.set_title('log likelihood: {ll:.3}'.format(ll=m_full3.log_likelihood()), fontsize=20)\n", "mlai.write_figure(figure=fig,\n", " filename='../slides/diagrams/gp/della-gatta-gene-gp3.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.multiple_optima(diagrams='../slides/diagrams/gp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multiple Optima\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, scale=scale, offset=offset, ax=ax, fontsize=20, portion=0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(filename='../slides/diagrams/deepgp/della-gatta-gene-deep-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data Deep GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax, portion = 0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/della-gatta-gene-deep-gp-samples.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data Deep GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m.visualize(offset=offset, scale=scale, xlim=xlim, ylim=ylim,\n", " dataset='della-gatta-gene',\n", " diagrams='../slides/diagrams/deepgp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Data Latent 1\n", "\n", "\n", "\n", "### TP53 Gene Data Latent 2\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "m.visualize_pinball(offset=offset, ax=ax, scale=scale, xlim=xlim, ylim=ylim, portion=0.1, points=50)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/della-gatta-gene-deep-gp-pinball.svg', \n", " transparent=True, frameon=True, ax=ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TP53 Gene Pinball Plot\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x, y, 'r.',markersize=10)\n", "_ = ax.set_xlabel('$x$', fontsize=20)\n", "_ = ax.set_ylabel('$y$', fontsize=20)\n", "xlim = (-2, 2)\n", "ylim = (-0.6, 1.6)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(figure=fig, filename='../../slides/diagrams/datasets/step-function.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step Function Data\n", "\n", "\n", "\n", "### Step Function Data GP" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, scale=scale, offset=offset, ax=ax, fontsize=20, portion=0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "\n", "mlai.write_figure(figure=fig,filename='../slides/diagrams/gp/step-function-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Step Function Data Deep GP" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, scale=scale, offset=offset, ax=ax, fontsize=20, portion=0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(filename='../slides/diagrams/deepgp/step-function-deep-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Step Function Data Deep GP" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "\n", "plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax, portion = 0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/step-function-deep-gp-samples.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m.visualize(offset=offset, scale=scale, xlim=xlim, ylim=ylim,\n", " dataset='step-function',\n", " diagrams='../slides/diagrams/deepgp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step Function Data Latent 1\n", "\n", "\n", "\n", "### Step Function Data Latent 2\n", "\n", "\n", "\n", "### Step Function Data Latent 3\n", "\n", "### ../slides/diagrams/deepgp/step-function-deep-gp-layer-2.svg\n", "\n", "### Step Function Data Latent 4\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "### Step Function Pinball Plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "m.visualize_pinball(offset=offset, ax=ax, scale=scale, xlim=xlim, ylim=ylim, portion=0.1, points=50)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/step-function-deep-gp-pinball.svg', \n", " transparent=True, frameon=True, ax=ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x, y, 'r.',markersize=10)\n", "_ = ax.set_xlabel('time', fontsize=20)\n", "_ = ax.set_ylabel('acceleration', fontsize=20)\n", "xlim = (-20, 80)\n", "ylim = (-175, 125)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "mlai.write_figure(filename='../slides/diagrams/datasets/motorcycle-helmet.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motorcycle Helmet Data\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, scale=scale, offset=offset, ax=ax, xlabel='time', ylabel='acceleration/$g$', fontsize=20, portion=0.5)\n", "xlim=(-20,80)\n", "ylim=(-180,120)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(figure=fig,filename='../../slides/diagrams/gp/motorcycle-helmet-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motorcycle Helmet Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, scale=scale, offset=offset, ax=ax, xlabel='time', ylabel='acceleration/$g$', fontsize=20, portion=0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(filename='../slides/diagrams/deepgp/motorcycle-helmet-deep-gp.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motorcycle Helmet Data Deep GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_sample(m, scale=scale, offset=offset, samps=10, ax=ax, xlabel='time', ylabel='acceleration/$g$', portion = 0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/motorcycle-helmet-deep-gp-samples.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motorcycle Helmet Data Deep GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m.visualize(xlim=xlim, ylim=ylim, scale=scale,offset=offset, \n", " xlabel=\"time\", ylabel=\"acceleration/$g$\", portion=0.5,\n", " dataset='motorcycle-helmet',\n", " diagrams='../slides/diagrams/deepgp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motorcycle Helmet Data Latent 1\n", "\n", "\n", "\n", "### Motorcycle Helmet Data Latent 2\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "m.visualize_pinball(ax=ax, xlabel='time', ylabel='acceleration/g', \n", " points=50, scale=scale, offset=offset, portion=0.1)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/motorcycle-helmet-deep-gp-pinball.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motorcycle Helmet Pinball Plot\n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "plt.plot(data['X'][:, 1], data['X'][:, 2], 'r.', markersize=5)\n", "ax.set_xlabel('x position', fontsize=20)\n", "ax.set_ylabel('y position', fontsize=20)\n", "mlai.write_figure(figure=fig, filename='../../slides/diagrams/datasets/robot-wireless-ground-truth.svg', transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Robot Wireless Ground Truth\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "output_dim=1\n", "xlim = (-0.3, 1.3)\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x.flatten(), y[:, output_dim], \n", " 'r.', markersize=5)\n", "\n", "ax.set_xlabel('time', fontsize=20)\n", "ax.set_ylabel('signal strength', fontsize=20)\n", "xlim = (-0.2, 1.2)\n", "ylim = (-0.6, 2.0)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/datasets/robot-wireless-dim-' + str(output_dim) + '.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Robot WiFi Data\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m_full, output_dim=output_dim, scale=scale, offset=offset, ax=ax, \n", " xlabel='time', ylabel='signal strength', fontsize=20, portion=0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(filename='../slides/diagrams/gp/robot-wireless-gp-dim-' + str(output_dim)+ '.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Robot WiFi Data GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_output(m, output_dim=output_dim, scale=scale, offset=offset, ax=ax, \n", " xlabel='time', ylabel='signal strength', fontsize=20, portion=0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/robot-wireless-deep-gp-dim-' + str(output_dim)+ '.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Robot WiFi Data Deep GP\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax=plt.subplots(figsize=plot.big_wide_figsize)\n", "plot.model_sample(m, output_dim=output_dim, scale=scale, offset=offset, samps=10, ax=ax,\n", " xlabel='time', ylabel='signal strength', fontsize=20, portion=0.5)\n", "ax.set_ylim(ylim)\n", "ax.set_xlim(xlim)\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/robot-wireless-deep-gp-samples-dim-' + str(output_dim)+ '.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Robot WiFi Data Deep GP\n", "\n", "\n", "\n", "### Robot WiFi Data Latent Space\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "ax.plot(m.layers[-2].latent_space.mean[:, 0], \n", " m.layers[-2].latent_space.mean[:, 1], \n", " 'r.-', markersize=5)\n", "\n", "ax.set_xlabel('latent dimension 1', fontsize=20)\n", "ax.set_ylabel('latent dimension 2', fontsize=20)\n", "\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/deepgp/robot-wireless-latent-space.svg', \n", " transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Motion Capture\n", "\n", "- ‘High five’ data.\n", "- Model learns structure between two interacting subjects.\n", "\n", "### Shared LVM\n", "\n", "\n", "\n", "### \n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[Thanks to: Zhenwen Dai and Neil D.\n", "Lawrence]{style=\"text-align:right\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "from matplotlib import rc\n", "import teaching_plots as plot\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rc(\"font\", **{'family':'sans-serif','sans-serif':['Helvetica'],'size':20})\n", "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "for d in digits:\n", " ax.plot(m.layer_1.X.mean[labels==d,0],m.layer_1.X.mean[labels==d,1],'.',label=str(d))\n", "_ = plt.legend()\n", "mlai.write_figure(figure=fig, filename=\"../slides/diagrams/deepgp/usps-digits-latent.svg\", transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "for i in range(5):\n", " for j in range(i):\n", " dims=[i, j]\n", " ax.cla()\n", " for d in digits:\n", " ax.plot(m.obslayer.X.mean[labels==d,dims[0]],\n", " m.obslayer.X.mean[labels==d,dims[1]],\n", " '.', label=str(d))\n", " plt.legend()\n", " plt.xlabel('dimension ' + str(dims[0]))\n", " plt.ylabel('dimension ' + str(dims[1]))\n", " mlai.write_figure(figure=fig, filename=\"../slides/diagrams/deepgp/usps-digits-hidden-\" + str(dims[0]) + '-' + str(dims[1]) + '.svg', transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n", "\n", "\n", "\n", "### \n", "\n", "\n", "\n", "### \n", "\n", "\n", "\n", "### \n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yt = m.predict(x)\n", "fig, axs = plt.subplots(rows,cols,figsize=(10,6))\n", "for i in range(rows):\n", " for j in range(cols):\n", " #v = np.random.normal(loc=yt[0][i*cols+j, :], scale=np.sqrt(yt[1][i*cols+j, :]))\n", " v = yt[0][i*cols+j, :]\n", " axs[i,j].imshow(v.reshape(28,28), \n", " cmap='gray', interpolation='none',\n", " aspect='equal')\n", " axs[i,j].set_axis_off()\n", "mlai.write_figure(figure=fig, filename=\"../slides/diagrams/deepgp/digit-samples-deep-gp.svg\", transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n", "\n", "\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Deep Health\n", "\n", "\n", "\n", "### From NIPS 2017\n", "\n", "- *Gaussian process based nonlinear latent structure discovery in\n", " multivariate spike train data* @Anqi:gpspike2017\n", "- *Doubly Stochastic Variational Inference for Deep Gaussian\n", " Processes* @Salimbeni:doubly2017\n", "- *Deep Multi-task Gaussian Processes for Survival Analysis with\n", " Competing Risks* @Alaa:deep2017\n", "- *Counterfactual Gaussian Processes for Reliable Decision-making and\n", " What-if Reasoning* @Schulam:counterfactual17\n", "\n", "### Some Other Works\n", "\n", "- *Deep Survival Analysis* @Ranganath-survival16\n", "- *Recurrent Gaussian Processes* @Mattos:recurrent15\n", "- *Gaussian Process Based Approaches for Survival Analysis*\n", " @Saul:thesis2016\n", "\n", "### Data Driven\n", "\n", "- Machine Learning: Replicate Processes through *direct use of data*.\n", "- Aim to emulate cognitive processes through the use of data.\n", "- Use data to provide new approaches in control and optimization that\n", " should allow for emulation of human motor skills.\n", "\n", "### Process Emulation\n", "\n", "- Key idea: emulate the process as a mathematical function.\n", "- Each function has a set of *parameters* which control its behavior.\n", "- *Learning* is the process of changing these parameters to change the\n", " shape of the function\n", "- Choice of which class of mathematical functions we use is a vital\n", " component of our *model*.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Emukit Playground\n", "\n", "- Work [Adam Hirst](https://twitter.com/_AdamHirst), Software\n", " Engineering Intern and Cliff McCollum.\n", "\n", "- Tutorial on emulation.\n", "\n", "### Emukit Playground\n", "\n", "
\n", "\n", "[](https://amzn.github.io/emukit-playground/)\n", "\n", "
\n", "\n", "### Emukit Playground\n", "\n", "
\n", "\n", "[](https://amzn.github.io/emukit-playground/#!/learn/bayesian_optimization)\n", "\n", "
\n", "\n", "### Uncertainty Quantification\n", "\n", "- Deep nets are powerful approach to images, speech, language.\n", "- Proposal: Deep GPs may also be a great approach, but better to\n", " deploy according to natural strengths.\n", "\n", "### Uncertainty Quantification\n", "\n", "- Probabilistic numerics, surrogate modelling, emulation, and UQ.\n", "- Not a fan of AI as a term.\n", "- But we are faced with increasing amounts of *algorithmic decision\n", " making*.\n", "\n", "### ML and Decision Making\n", "\n", "- When trading off decisions: compute or acquire data?\n", "- There is a critical need for uncertainty.\n", "\n", "### Uncertainty Quantification\n", "\n", "> Uncertainty quantification (UQ) is the science of quantitative\n", "> characterization and reduction of uncertainties in both computational\n", "> and real world applications. It tries to determine how likely certain\n", "> outcomes are if some aspects of the system are not exactly known.\n", "\n", "- Interaction between physical and virtual worlds of major interest.\n", "\n", "### Contrast\n", "\n", "- Simulation in *reinforcement learning*.\n", "- Known as *data augmentation*.\n", "- Newer, similar in spirit, but typically ignores uncertainty.\n", "\n", "### Example: Formula One Racing\n", "\n", "- Designing an F1 Car requires CFD, Wind Tunnel, Track Testing etc.\n", "\n", "- How to combine them?\n", "\n", "### Mountain Car Simulator\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "### Car Dynamics\n", "\n", "$$\\inputVector_{t+1} = \\mappingFunction(\\inputVector_{t},\\textbf{u}_{t})$$\n", "\n", "where $\\textbf{u}_t$ is the action force, $\\inputVector_t = (p_t, v_t)$\n", "is the vehicle state\n", "\n", "### Policy\n", "\n", "- Assume policy is linear with parameters $\\boldsymbol{\\theta}$\n", "\n", "$$\\pi(\\inputVector,\\theta)= \\theta_0 + \\theta_p p + \\theta_vv.$$\n", "\n", "### Emulate the Mountain Car\n", "\n", "- Goal is find $\\theta$ such that\n", "\n", "$$\\theta^* = arg \\max_{\\theta} R_T(\\theta).$$\n", "\n", "- Reward is computed as 100 for target, minus squared sum of actions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HTML(anim.to_jshtml())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mc.save_frames(frames, \n", " diagrams='../slides/diagrams/uq', \n", " filename='mountain_car_random.html')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random Linear Controller\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HTML(anim.to_jshtml())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mc.save_frames(frames, \n", " diagrams='../slides/diagrams/uq', \n", " filename='mountain_car_simulated.html')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Best Controller after 50 Iterations of Bayesian Optimization\n", "\n", "\n", "### Data Efficient Emulation\n", "\n", "- For standard Bayesian Optimization ignored *dynamics* of the car.\n", "\n", "- For more data efficiency, first *emulate* the dynamics.\n", "\n", "- Then do Bayesian optimization of the *emulator*.\n", "\n", "- Use a Gaussian process to model $$\\Delta v_{t+1} = v_{t+1} - v_{t}$$\n", " and $$\\Delta x_{t+1} = p_{t+1} - p_{t}$$\n", "\n", "- Two processes, one with mean $v_{t}$ one with mean $p_{t}$\n", "\n", "### Emulator Training\n", "\n", "- Used 500 randomly selected points to train emulators.\n", "\n", "- Can make proces smore efficient through *experimental design*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "control = mc.plot_control(velocity_model)\n", "interact(control.plot_slices, control=(-1, 1, 0.05))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mc.emu_sim_comparison(env, controller_gains, [position_model, velocity_model], \n", " max_steps=500, diagrams='../slides/diagrams/uq')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison of Emulation and Simulation\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HTML(anim.to_jshtml())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mc.save_frames(frames, \n", " diagrams='../slides/diagrams/uq', \n", " filename='mountain_car_emulated.html')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Efficiency\n", "\n", "- Our emulator used only 500 calls to the simulator.\n", "\n", "- Optimizing the simulator directly required 37,500 calls to the\n", " simulator.\n", "\n", "### Best Controller using Emulator of Dynamics\n", "\n", "\n", "500 calls to the simulator vs 37,500 calls to the simulator\n", "\n", "$$\\mappingFunction_i\\left(\\inputVector\\right) = \\rho\\mappingFunction_{i-1}\\left(\\inputVector\\right) + \\delta_i\\left(\\inputVector \\right)$$\n", "\n", "### Multi-Fidelity Emulation\n", "\n", "$$\\mappingFunction_i\\left(\\inputVector\\right) = \\mappingFunctionTwo_{i}\\left(\\mappingFunction_{i-1}\\left(\\inputVector\\right)\\right) + \\delta_i\\left(\\inputVector \\right),$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HTML(anim.to_jshtml())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mc.save_frames(frames, \n", " diagrams='../slides/diagrams/uq', \n", " filename='mountain_car_multi_fidelity.html')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Best Controller with Multi-Fidelity Emulator\n", "\n", "\n", "250 observations of high fidelity simulator and 250 of the low fidelity\n", "simulator\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### Emukit\n", "\n", "- Work by Javier Gonzalez, Andrei Paleyes, Mark Pullin, Maren\n", " Mahsereci, Alex Gessner, Aaron Klein.\n", "- Available on [Github](https://github.com/amzn/emukit)\n", "- Example [sensitivity\n", " notebook](https://github.com/amzn/emukit/blob/develop/notebooks/Emukit-sensitivity-montecarlo.ipynb).\n", "\n", "### Emukit Software\n", "\n", "- *Multi-fidelity emulation*: build surrogate models for multiple\n", " sources of information;\n", "- *Bayesian optimisation*: optimise physical experiments and tune\n", " parameters ML algorithms;\n", "- *Experimental design/Active learning*: design experiments and\n", " perform active learning with ML models;\n", "- *Sensitivity analysis*: analyse the influence of inputs on the\n", " outputs\n", "- *Bayesian quadrature*: compute integrals of functions that are\n", " expensive to evaluate.\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "[\\small{[edit]}]{style=\"text-align:right\"}\n", "\n", "### MXFusion: Modular Probabilistic Programming on MXNet\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "### MxFusion\n", "\n", "\n", "\n", "\n", " \n", "\n", "
\n", "- Work by Eric Meissner and Zhenwen Dai.\n", "- Probabilistic programming.\n", "- Available on [Github](https://github.com/amzn/mxfusion)\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### MxFusion\n", "\n", "- Targeted at challenges we face in emulation.\n", "- Composition of Gaussian processes (Deep GPs)\n", "- Combining GPs with neural networks.\n", "- Example [PPCA\n", " Tutorial](https://github.com/amzn/MXFusion/blob/master/examples/notebooks/ppca_tutorial.ipynb).\n", "\n", "### Why another framework?\n", "\n", "- Existing libraries had either:\n", "- Probabilistic modelling with rich, flexible models and universal\n", " inference or\n", "- Specialized, efficient inference over a subset of models\n", "\n", "**We needed both**\n", "\n", "### Key Requirements\n", "\n", "- Integration with deep learning\n", "- Flexiblility\n", "- Scalability\n", "- Specialized inference and models support\n", " - Bayesian Deep Learning methods\n", " - Rapid prototyping and software re-use\n", " - GPUs, specialized inference methods\n", "\n", "### Modularity\n", "\n", "- Specialized Inference\n", "- Composability (tinkerability)\n", " - Better leveraging of expert expertise\n", "\n", "### What does it look like?\n", "\n", "**Modelling**\n", "\n", "**Inference**\n", "\n", "### Modelling\n", "\n", "### Directed Graphs\n", "\n", "- Variable\n", "- Function\n", "- Distribution\n", "\n", "### Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m = Model()\n", "m.mu = Variable()\n", "m.s = Variable(transformation=PositiveTransformation())\n", "m.Y = Normal.define_variable(mean=m.mu, variance=m.s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3 primary components in modeling\n", "\n", "- Variable\n", "- Distribution\n", "- Function\n", "\n", "### 2 primary methods for models\n", "\n", "- `log_pdf`\n", "- `draw_samples`\n", "\n", "### Inference: Two Classes\n", "\n", "- Variational Inference\n", "- MCMC Sampling (*soon*) Built on MXNet Gluon (imperative code, not\n", " static graph)\n", "\n", "### Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "infr = GradBasedInference(inference_algorithm=MAP(model=m, observed=[m.Y]))\n", "infr.run(Y=data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modules\n", "\n", "- Model + Inference together form building blocks.\n", " - Just doing modular modeling with universal inference doesn't\n", " really scale, need specialized inference methods for specialized\n", " modelling objects like non-parametrics.\n", "\n", "### Long term Aim\n", "\n", "- Simulate/Emulate the components of the system.\n", " - Validate with real world using multifidelity.\n", " - Interpret system using e.g. sensitivity analysis.\n", "- Perform end to end learning to optimize.\n", " - Maintain interpretability.\n", "\n", "### Acknowledgments\n", "\n", "Stefanos Eleftheriadis, John Bronskill, Hugh Salimbeni, Rich Turner,\n", "Zhenwen Dai, Javier Gonzalez, Andreas Damianou, Mark Pullin, Eric\n", "Meissner.\n", "\n", "### Thanks!\n", "\n", "- twitter: @lawrennd\n", "- blog:\n", " [http://inverseprobability.com](http://inverseprobability.com/blog.html)\n", "\n", "### References {#references .unnumbered}" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }