{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning Systems Design\n", "### [Neil D. Lawrence](http://inverseprobability.com), Amazon Cambridge and University of Sheffield\n", "### 2019-06-06\n", "\n", "**Abstract**: Machine learning solutions, in particular those based on deep learning\n", "methods, form an underpinning of the current revolution in “artificial\n", "intelligence” that has dominated popular press headlines and is having a\n", "significant influence on the wider tech agenda. In this talk I will give\n", "an overview of where we are now with machine learning solutions, and\n", "what challenges we face both in the near and far future. These include\n", "practical application of existing algorithms in the face of the need to\n", "explain decision-making, mechanisms for improving the quality and\n", "availability of data, dealing with large unstructured datasets.\n", "\n", "$$\n", "\n", "\\newcommand{\\tk}[1]{\\textbf{TK}: #1}\n", "\\newcommand{\\Amatrix}{\\mathbf{A}}\n", "\\newcommand{\\KL}[2]{\\text{KL}\\left( #1\\,\\|\\,#2 \\right)}\n", "\\newcommand{\\Kaast}{\\kernelMatrix_{\\mathbf{ \\ast}\\mathbf{ \\ast}}}\n", "\\newcommand{\\Kastu}{\\kernelMatrix_{\\mathbf{ \\ast} \\inducingVector}}\n", "\\newcommand{\\Kff}{\\kernelMatrix_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kfu}{\\kernelMatrix_{\\mappingFunctionVector \\inducingVector}}\n", "\\newcommand{\\Kuast}{\\kernelMatrix_{\\inducingVector \\bf\\ast}}\n", "\\newcommand{\\Kuf}{\\kernelMatrix_{\\inducingVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kuu}{\\kernelMatrix_{\\inducingVector \\inducingVector}}\n", "\\newcommand{\\Kuui}{\\Kuu^{-1}}\n", "\\newcommand{\\Qaast}{\\mathbf{Q}_{\\bf \\ast \\ast}}\n", "\\newcommand{\\Qastf}{\\mathbf{Q}_{\\ast \\mappingFunction}}\n", "\\newcommand{\\Qfast}{\\mathbf{Q}_{\\mappingFunctionVector \\bf \\ast}}\n", "\\newcommand{\\Qff}{\\mathbf{Q}_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\aMatrix}{\\mathbf{A}}\n", "\\newcommand{\\aScalar}{a}\n", "\\newcommand{\\aVector}{\\mathbf{a}}\n", "\\newcommand{\\acceleration}{a}\n", "\\newcommand{\\bMatrix}{\\mathbf{B}}\n", "\\newcommand{\\bScalar}{b}\n", "\\newcommand{\\bVector}{\\mathbf{b}}\n", "\\newcommand{\\basisFunc}{\\phi}\n", "\\newcommand{\\basisFuncVector}{\\boldsymbol{ \\basisFunc}}\n", "\\newcommand{\\basisFunction}{\\phi}\n", "\\newcommand{\\basisLocation}{\\mu}\n", "\\newcommand{\\basisMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\basisScalar}{\\basisFunction}\n", "\\newcommand{\\basisVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\activationFunction}{\\phi}\n", "\\newcommand{\\activationMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\activationScalar}{\\basisFunction}\n", "\\newcommand{\\activationVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\bigO}{\\mathcal{O}}\n", "\\newcommand{\\binomProb}{\\pi}\n", "\\newcommand{\\cMatrix}{\\mathbf{C}}\n", "\\newcommand{\\cbasisMatrix}{\\hat{\\boldsymbol{ \\Phi}}}\n", "\\newcommand{\\cdataMatrix}{\\hat{\\dataMatrix}}\n", "\\newcommand{\\cdataScalar}{\\hat{\\dataScalar}}\n", "\\newcommand{\\cdataVector}{\\hat{\\dataVector}}\n", "\\newcommand{\\centeredKernelMatrix}{\\mathbf{ \\MakeUppercase{\\centeredKernelScalar}}}\n", "\\newcommand{\\centeredKernelScalar}{b}\n", "\\newcommand{\\centeredKernelVector}{\\centeredKernelScalar}\n", "\\newcommand{\\centeringMatrix}{\\mathbf{H}}\n", "\\newcommand{\\chiSquaredDist}[2]{\\chi_{#1}^{2}\\left(#2\\right)}\n", "\\newcommand{\\chiSquaredSamp}[1]{\\chi_{#1}^{2}}\n", "\\newcommand{\\conditionalCovariance}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\coregionalizationMatrix}{\\mathbf{B}}\n", "\\newcommand{\\coregionalizationScalar}{b}\n", "\\newcommand{\\coregionalizationVector}{\\mathbf{ \\coregionalizationScalar}}\n", "\\newcommand{\\covDist}[2]{\\text{cov}_{#2}\\left(#1\\right)}\n", "\\newcommand{\\covSamp}[1]{\\text{cov}\\left(#1\\right)}\n", "\\newcommand{\\covarianceScalar}{c}\n", "\\newcommand{\\covarianceVector}{\\mathbf{ \\covarianceScalar}}\n", "\\newcommand{\\covarianceMatrix}{\\mathbf{C}}\n", "\\newcommand{\\covarianceMatrixTwo}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\croupierScalar}{s}\n", "\\newcommand{\\croupierVector}{\\mathbf{ \\croupierScalar}}\n", "\\newcommand{\\croupierMatrix}{\\mathbf{ \\MakeUppercase{\\croupierScalar}}}\n", "\\newcommand{\\dataDim}{p}\n", "\\newcommand{\\dataIndex}{i}\n", "\\newcommand{\\dataIndexTwo}{j}\n", "\\newcommand{\\dataMatrix}{\\mathbf{Y}}\n", "\\newcommand{\\dataScalar}{y}\n", "\\newcommand{\\dataSet}{\\mathcal{D}}\n", "\\newcommand{\\dataStd}{\\sigma}\n", "\\newcommand{\\dataVector}{\\mathbf{ \\dataScalar}}\n", "\\newcommand{\\decayRate}{d}\n", "\\newcommand{\\degreeMatrix}{\\mathbf{ \\MakeUppercase{\\degreeScalar}}}\n", "\\newcommand{\\degreeScalar}{d}\n", "\\newcommand{\\degreeVector}{\\mathbf{ \\degreeScalar}}\n", "% Already defined by latex\n", "%\\newcommand{\\det}[1]{\\left|#1\\right|}\n", "\\newcommand{\\diag}[1]{\\text{diag}\\left(#1\\right)}\n", "\\newcommand{\\diagonalMatrix}{\\mathbf{D}}\n", "\\newcommand{\\diff}[2]{\\frac{\\text{d}#1}{\\text{d}#2}}\n", "\\newcommand{\\diffTwo}[2]{\\frac{\\text{d}^2#1}{\\text{d}#2^2}}\n", "\\newcommand{\\displacement}{x}\n", "\\newcommand{\\displacementVector}{\\textbf{\\displacement}}\n", "\\newcommand{\\distanceMatrix}{\\mathbf{ \\MakeUppercase{\\distanceScalar}}}\n", "\\newcommand{\\distanceScalar}{d}\n", "\\newcommand{\\distanceVector}{\\mathbf{ \\distanceScalar}}\n", "\\newcommand{\\eigenvaltwo}{\\ell}\n", "\\newcommand{\\eigenvaltwoMatrix}{\\mathbf{L}}\n", "\\newcommand{\\eigenvaltwoVector}{\\mathbf{l}}\n", "\\newcommand{\\eigenvalue}{\\lambda}\n", "\\newcommand{\\eigenvalueMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\eigenvalueVector}{\\boldsymbol{ \\lambda}}\n", "\\newcommand{\\eigenvector}{\\mathbf{ \\eigenvectorScalar}}\n", "\\newcommand{\\eigenvectorMatrix}{\\mathbf{U}}\n", "\\newcommand{\\eigenvectorScalar}{u}\n", "\\newcommand{\\eigenvectwo}{\\mathbf{v}}\n", "\\newcommand{\\eigenvectwoMatrix}{\\mathbf{V}}\n", "\\newcommand{\\eigenvectwoScalar}{v}\n", "\\newcommand{\\entropy}[1]{\\mathcal{H}\\left(#1\\right)}\n", "\\newcommand{\\errorFunction}{E}\n", "\\newcommand{\\expDist}[2]{\\left<#1\\right>_{#2}}\n", "\\newcommand{\\expSamp}[1]{\\left<#1\\right>}\n", "\\newcommand{\\expectation}[1]{\\left\\langle #1 \\right\\rangle }\n", "\\newcommand{\\expectationDist}[2]{\\left\\langle #1 \\right\\rangle _{#2}}\n", "\\newcommand{\\expectedDistanceMatrix}{\\mathcal{D}}\n", "\\newcommand{\\eye}{\\mathbf{I}}\n", "\\newcommand{\\fantasyDim}{r}\n", "\\newcommand{\\fantasyMatrix}{\\mathbf{ \\MakeUppercase{\\fantasyScalar}}}\n", "\\newcommand{\\fantasyScalar}{z}\n", "\\newcommand{\\fantasyVector}{\\mathbf{ \\fantasyScalar}}\n", "\\newcommand{\\featureStd}{\\varsigma}\n", "\\newcommand{\\gammaCdf}[3]{\\mathcal{GAMMA CDF}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaDist}[3]{\\mathcal{G}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaSamp}[2]{\\mathcal{G}\\left(#1,#2\\right)}\n", "\\newcommand{\\gaussianDist}[3]{\\mathcal{N}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gaussianSamp}[2]{\\mathcal{N}\\left(#1,#2\\right)}\n", "\\newcommand{\\given}{|}\n", "\\newcommand{\\half}{\\frac{1}{2}}\n", "\\newcommand{\\heaviside}{H}\n", "\\newcommand{\\hiddenMatrix}{\\mathbf{ \\MakeUppercase{\\hiddenScalar}}}\n", "\\newcommand{\\hiddenScalar}{h}\n", "\\newcommand{\\hiddenVector}{\\mathbf{ \\hiddenScalar}}\n", "\\newcommand{\\identityMatrix}{\\eye}\n", "\\newcommand{\\inducingInputScalar}{z}\n", "\\newcommand{\\inducingInputVector}{\\mathbf{ \\inducingInputScalar}}\n", "\\newcommand{\\inducingInputMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\inducingScalar}{u}\n", "\\newcommand{\\inducingVector}{\\mathbf{ \\inducingScalar}}\n", "\\newcommand{\\inducingMatrix}{\\mathbf{U}}\n", "\\newcommand{\\inlineDiff}[2]{\\text{d}#1/\\text{d}#2}\n", "\\newcommand{\\inputDim}{q}\n", "\\newcommand{\\inputMatrix}{\\mathbf{X}}\n", "\\newcommand{\\inputScalar}{x}\n", "\\newcommand{\\inputSpace}{\\mathcal{X}}\n", "\\newcommand{\\inputVals}{\\inputVector}\n", "\\newcommand{\\inputVector}{\\mathbf{ \\inputScalar}}\n", "\\newcommand{\\iterNum}{k}\n", "\\newcommand{\\kernel}{\\kernelScalar}\n", "\\newcommand{\\kernelMatrix}{\\mathbf{K}}\n", "\\newcommand{\\kernelScalar}{k}\n", "\\newcommand{\\kernelVector}{\\mathbf{ \\kernelScalar}}\n", "\\newcommand{\\kff}{\\kernelScalar_{\\mappingFunction \\mappingFunction}}\n", "\\newcommand{\\kfu}{\\kernelVector_{\\mappingFunction \\inducingScalar}}\n", "\\newcommand{\\kuf}{\\kernelVector_{\\inducingScalar \\mappingFunction}}\n", "\\newcommand{\\kuu}{\\kernelVector_{\\inducingScalar \\inducingScalar}}\n", "\\newcommand{\\lagrangeMultiplier}{\\lambda}\n", "\\newcommand{\\lagrangeMultiplierMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\lagrangian}{L}\n", "\\newcommand{\\laplacianFactor}{\\mathbf{ \\MakeUppercase{\\laplacianFactorScalar}}}\n", "\\newcommand{\\laplacianFactorScalar}{m}\n", "\\newcommand{\\laplacianFactorVector}{\\mathbf{ \\laplacianFactorScalar}}\n", "\\newcommand{\\laplacianMatrix}{\\mathbf{L}}\n", "\\newcommand{\\laplacianScalar}{\\ell}\n", "\\newcommand{\\laplacianVector}{\\mathbf{ \\ell}}\n", "\\newcommand{\\latentDim}{q}\n", "\\newcommand{\\latentDistanceMatrix}{\\boldsymbol{ \\Delta}}\n", "\\newcommand{\\latentDistanceScalar}{\\delta}\n", "\\newcommand{\\latentDistanceVector}{\\boldsymbol{ \\delta}}\n", "\\newcommand{\\latentForce}{f}\n", "\\newcommand{\\latentFunction}{u}\n", "\\newcommand{\\latentFunctionVector}{\\mathbf{ \\latentFunction}}\n", "\\newcommand{\\latentFunctionMatrix}{\\mathbf{ \\MakeUppercase{\\latentFunction}}}\n", "\\newcommand{\\latentIndex}{j}\n", "\\newcommand{\\latentScalar}{z}\n", "\\newcommand{\\latentVector}{\\mathbf{ \\latentScalar}}\n", "\\newcommand{\\latentMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\learnRate}{\\eta}\n", "\\newcommand{\\lengthScale}{\\ell}\n", "\\newcommand{\\rbfWidth}{\\ell}\n", "\\newcommand{\\likelihoodBound}{\\mathcal{L}}\n", "\\newcommand{\\likelihoodFunction}{L}\n", "\\newcommand{\\locationScalar}{\\mu}\n", "\\newcommand{\\locationVector}{\\boldsymbol{ \\locationScalar}}\n", "\\newcommand{\\locationMatrix}{\\mathbf{M}}\n", "\\newcommand{\\variance}[1]{\\text{var}\\left( #1 \\right)}\n", "\\newcommand{\\mappingFunction}{f}\n", "\\newcommand{\\mappingFunctionMatrix}{\\mathbf{F}}\n", "\\newcommand{\\mappingFunctionTwo}{g}\n", "\\newcommand{\\mappingFunctionTwoMatrix}{\\mathbf{G}}\n", "\\newcommand{\\mappingFunctionTwoVector}{\\mathbf{ \\mappingFunctionTwo}}\n", "\\newcommand{\\mappingFunctionVector}{\\mathbf{ \\mappingFunction}}\n", "\\newcommand{\\scaleScalar}{s}\n", "\\newcommand{\\mappingScalar}{w}\n", "\\newcommand{\\mappingVector}{\\mathbf{ \\mappingScalar}}\n", "\\newcommand{\\mappingMatrix}{\\mathbf{W}}\n", "\\newcommand{\\mappingScalarTwo}{v}\n", "\\newcommand{\\mappingVectorTwo}{\\mathbf{ \\mappingScalarTwo}}\n", "\\newcommand{\\mappingMatrixTwo}{\\mathbf{V}}\n", "\\newcommand{\\maxIters}{K}\n", "\\newcommand{\\meanMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanScalar}{\\mu}\n", "\\newcommand{\\meanTwoMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanTwoScalar}{m}\n", "\\newcommand{\\meanTwoVector}{\\mathbf{ \\meanTwoScalar}}\n", "\\newcommand{\\meanVector}{\\boldsymbol{ \\meanScalar}}\n", "\\newcommand{\\mrnaConcentration}{m}\n", "\\newcommand{\\naturalFrequency}{\\omega}\n", "\\newcommand{\\neighborhood}[1]{\\mathcal{N}\\left( #1 \\right)}\n", "\\newcommand{\\neilurl}{http://inverseprobability.com/}\n", "\\newcommand{\\noiseMatrix}{\\boldsymbol{ E}}\n", "\\newcommand{\\noiseScalar}{\\epsilon}\n", "\\newcommand{\\noiseVector}{\\boldsymbol{ \\epsilon}}\n", "\\newcommand{\\norm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\normalizedLaplacianMatrix}{\\hat{\\mathbf{L}}}\n", "\\newcommand{\\normalizedLaplacianScalar}{\\hat{\\ell}}\n", "\\newcommand{\\normalizedLaplacianVector}{\\hat{\\mathbf{ \\ell}}}\n", "\\newcommand{\\numActive}{m}\n", "\\newcommand{\\numBasisFunc}{m}\n", "\\newcommand{\\numComponents}{m}\n", "\\newcommand{\\numComps}{K}\n", "\\newcommand{\\numData}{n}\n", "\\newcommand{\\numFeatures}{K}\n", "\\newcommand{\\numHidden}{h}\n", "\\newcommand{\\numInducing}{m}\n", "\\newcommand{\\numLayers}{\\ell}\n", "\\newcommand{\\numNeighbors}{K}\n", "\\newcommand{\\numSequences}{s}\n", "\\newcommand{\\numSuccess}{s}\n", "\\newcommand{\\numTasks}{m}\n", "\\newcommand{\\numTime}{T}\n", "\\newcommand{\\numTrials}{S}\n", "\\newcommand{\\outputIndex}{j}\n", "\\newcommand{\\paramVector}{\\boldsymbol{ \\theta}}\n", "\\newcommand{\\parameterMatrix}{\\boldsymbol{ \\Theta}}\n", "\\newcommand{\\parameterScalar}{\\theta}\n", "\\newcommand{\\parameterVector}{\\boldsymbol{ \\parameterScalar}}\n", "\\newcommand{\\partDiff}[2]{\\frac{\\partial#1}{\\partial#2}}\n", "\\newcommand{\\precisionScalar}{j}\n", "\\newcommand{\\precisionVector}{\\mathbf{ \\precisionScalar}}\n", "\\newcommand{\\precisionMatrix}{\\mathbf{J}}\n", "\\newcommand{\\pseudotargetScalar}{\\widetilde{y}}\n", "\\newcommand{\\pseudotargetVector}{\\mathbf{ \\pseudotargetScalar}}\n", "\\newcommand{\\pseudotargetMatrix}{\\mathbf{ \\widetilde{Y}}}\n", "\\newcommand{\\rank}[1]{\\text{rank}\\left(#1\\right)}\n", "\\newcommand{\\rayleighDist}[2]{\\mathcal{R}\\left(#1|#2\\right)}\n", "\\newcommand{\\rayleighSamp}[1]{\\mathcal{R}\\left(#1\\right)}\n", "\\newcommand{\\responsibility}{r}\n", "\\newcommand{\\rotationScalar}{r}\n", "\\newcommand{\\rotationVector}{\\mathbf{ \\rotationScalar}}\n", "\\newcommand{\\rotationMatrix}{\\mathbf{R}}\n", "\\newcommand{\\sampleCovScalar}{s}\n", "\\newcommand{\\sampleCovVector}{\\mathbf{ \\sampleCovScalar}}\n", "\\newcommand{\\sampleCovMatrix}{\\mathbf{s}}\n", "\\newcommand{\\scalarProduct}[2]{\\left\\langle{#1},{#2}\\right\\rangle}\n", "\\newcommand{\\sign}[1]{\\text{sign}\\left(#1\\right)}\n", "\\newcommand{\\sigmoid}[1]{\\sigma\\left(#1\\right)}\n", "\\newcommand{\\singularvalue}{\\ell}\n", "\\newcommand{\\singularvalueMatrix}{\\mathbf{L}}\n", "\\newcommand{\\singularvalueVector}{\\mathbf{l}}\n", "\\newcommand{\\sorth}{\\mathbf{u}}\n", "\\newcommand{\\spar}{\\lambda}\n", "\\newcommand{\\trace}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\BasalRate}{B}\n", "\\newcommand{\\DampingCoefficient}{C}\n", "\\newcommand{\\DecayRate}{D}\n", "\\newcommand{\\Displacement}{X}\n", "\\newcommand{\\LatentForce}{F}\n", "\\newcommand{\\Mass}{M}\n", "\\newcommand{\\Sensitivity}{S}\n", "\\newcommand{\\basalRate}{b}\n", "\\newcommand{\\dampingCoefficient}{c}\n", "\\newcommand{\\mass}{m}\n", "\\newcommand{\\sensitivity}{s}\n", "\\newcommand{\\springScalar}{\\kappa}\n", "\\newcommand{\\springVector}{\\boldsymbol{ \\kappa}}\n", "\\newcommand{\\springMatrix}{\\boldsymbol{ \\mathcal{K}}}\n", "\\newcommand{\\tfConcentration}{p}\n", "\\newcommand{\\tfDecayRate}{\\delta}\n", "\\newcommand{\\tfMrnaConcentration}{f}\n", "\\newcommand{\\tfVector}{\\mathbf{ \\tfConcentration}}\n", "\\newcommand{\\velocity}{v}\n", "\\newcommand{\\sufficientStatsScalar}{g}\n", "\\newcommand{\\sufficientStatsVector}{\\mathbf{ \\sufficientStatsScalar}}\n", "\\newcommand{\\sufficientStatsMatrix}{\\mathbf{G}}\n", "\\newcommand{\\switchScalar}{s}\n", "\\newcommand{\\switchVector}{\\mathbf{ \\switchScalar}}\n", "\\newcommand{\\switchMatrix}{\\mathbf{S}}\n", "\\newcommand{\\tr}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\loneNorm}[1]{\\left\\Vert #1 \\right\\Vert_1}\n", "\\newcommand{\\ltwoNorm}[1]{\\left\\Vert #1 \\right\\Vert_2}\n", "\\newcommand{\\onenorm}[1]{\\left\\vert#1\\right\\vert_1}\n", "\\newcommand{\\twonorm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\vScalar}{v}\n", "\\newcommand{\\vVector}{\\mathbf{v}}\n", "\\newcommand{\\vMatrix}{\\mathbf{V}}\n", "\\newcommand{\\varianceDist}[2]{\\text{var}_{#2}\\left( #1 \\right)}\n", "% Already defined by latex\n", "%\\newcommand{\\vec}{#1:}\n", "\\newcommand{\\vecb}[1]{\\left(#1\\right):}\n", "\\newcommand{\\weightScalar}{w}\n", "\\newcommand{\\weightVector}{\\mathbf{ \\weightScalar}}\n", "\\newcommand{\\weightMatrix}{\\mathbf{W}}\n", "\\newcommand{\\weightedAdjacencyMatrix}{\\mathbf{A}}\n", "\\newcommand{\\weightedAdjacencyScalar}{a}\n", "\\newcommand{\\weightedAdjacencyVector}{\\mathbf{ \\weightedAdjacencyScalar}}\n", "\\newcommand{\\onesVector}{\\mathbf{1}}\n", "\\newcommand{\\zerosVector}{\\mathbf{0}}\n", "$$\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "# Introduction\n", "\n", "## The Centrifugal Governor \\[edit\\]\n", "\n", "\n", "\n", "Figure: Centrifugal governor as held by \"Science\" on Holborn\n", "Viaduct\n", "\n", "## Boulton and Watt's Steam Engine \\[edit\\]\n", "\n", "\n", "\n", "Figure: Watt's Steam Engine which made Steam Power Efficient and\n", "Practical.\n", "\n", "James Watt's steam engine contained an early machine learning device. In\n", "the same way that modern systems are component based, his engine was\n", "composed of components. One of which is a speed regulator sometimes\n", "known as *Watt's governor*. The two balls in the center of the image,\n", "when spun fast, rise, and through a linkage mechanism.\n", "\n", "The centrifugal governor was made famous by Boulton and Watt when it was\n", "deployed in the steam engine. Studying stability in the governor is the\n", "main subject of James Clerk Maxwell's paper on the theoretical analysis\n", "of governors [@Maxwell:governors1867]. This paper is a founding paper of\n", "control theory. In an acknowledgment of its influence, Wiener used the\n", "name [*cybernetics*](https://en.wikipedia.org/wiki/Cybernetics) to\n", "describe the field of control and communication in animals and the\n", "machine [@Wiener:cybernetics48]. Cybernetics is the Greek word for\n", "governor, which comes from the latin for helmsman.\n", "\n", "A governor is one of the simplest artificial intelligence systems. It\n", "senses the speed of an engine, and acts to change the position of the\n", "valve on the engine to slow it down.\n", "\n", "Although it's a mechanical system a governor can be seen as automating a\n", "role that a human would have traditionally played. It is an early\n", "example of artificial intelligence.\n", "\n", "The centrifugal governor has several parameters, the weight of the balls\n", "used, the length of the linkages and the limits on the balls movement.\n", "\n", "Two principle differences exist between the centrifugal governor and\n", "artificial intelligence systems of today.\n", "\n", "1. The centrifugal governor is a physical system and it is an integral\n", " part of a wider physical system that it regulates (the engine).\n", "2. The parameters of the governor were set by hand, our modern\n", " artificial intelligence systems have their parameters set by *data*.\n", "\n", "\n", "\n", "Figure: The centrifugal governor, an early example of a decision\n", "making system. The parameters of the governor include the lengths of the\n", "linkages (which effect how far the throttle opens in response to\n", "movement in the balls), the weight of the balls (which effects inertia)\n", "and the limits of to which the balls can rise.\n", "\n", "This has the basic components of sense and act that we expect in an\n", "intelligent system, and this system saved the need for a human operator\n", "to manually adjust the system in the case of overspeed. Overspeed has\n", "the potential to destroy an engine, so the governor operates as a safety\n", "device.\n", "\n", "The first wave of automation did bring about sabotoage as a worker's\n", "response. But if machinery was sabotaged, for example, if the linkage\n", "between sensor (the spinning balls) and action (the valve closure) was\n", "broken, this would be obvious to the engine operator at start up time.\n", "The machine could be repaired before operation.\n", "\n", "## What is Machine Learning? \\[edit\\]\n", "\n", "Machine learning allows us to extract knowledge from data to form a\n", "prediction.\n", "\n", "$$\\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}$$\n", "\n", "A machine learning prediction is made by combining a model with data to\n", "form the prediction. The manner in which this is done gives us the\n", "machine learning *algorithm*.\n", "\n", "Machine learning models are *mathematical models* which make weak\n", "assumptions about data, e.g. smoothness assumptions. By combining these\n", "assumptions with the data, we observe we can interpolate between data\n", "points or, occasionally, extrapolate into the future.\n", "\n", "Machine learning is a technology which strongly overlaps with the\n", "methodology of statistics. From a historical/philosophical view point,\n", "machine learning differs from statistics in that the focus in the\n", "machine learning community has been primarily on accuracy of prediction,\n", "whereas the focus in statistics is typically on the interpretability of\n", "a model and/or validating a hypothesis through data collection.\n", "\n", "The rapid increase in the availability of compute and data has led to\n", "the increased prominence of machine learning. This prominence is\n", "surfacing in two different but overlapping domains: data science and\n", "artificial intelligence.\n", "\n", "## From Model to Decision \\[edit\\]\n", "\n", "The real challenge, however, is end-to-end decision making. Taking\n", "information from the environment and using it to drive decision making\n", "to achieve goals.\n", "\n", "## Artificial Intelligence and Data Science \\[edit\\]\n", "\n", "Artificial intelligence has the objective of endowing computers with\n", "human-like intelligent capabilities. For example, understanding an image\n", "(computer vision) or the contents of some speech (speech recognition),\n", "the meaning of a sentence (natural language processing) or the\n", "translation of a sentence (machine translation).\n", "\n", "### Supervised Learning for AI\n", "\n", "The machine learning approach to artificial intelligence is to collect\n", "and annotate a large data set from humans. The problem is characterized\n", "by input data (e.g. a particular image) and a label (e.g. is there a car\n", "in the image yes/no). The machine learning algorithm fits a mathematical\n", "function (I call this the *prediction function*) to map from the input\n", "image to the label. The parameters of the prediction function are set by\n", "minimizing an error between the function’s predictions and the true\n", "data. This mathematical function that encapsulates this error is known\n", "as the *objective function*.\n", "\n", "This approach to machine learning is known as *supervised learning*.\n", "Various approaches to supervised learning use different prediction\n", "functions, objective functions or different optimization algorithms to\n", "fit them.\n", "\n", "For example, *deep learning* makes use of *neural networks* to form the\n", "predictions. A neural network is a particular type of mathematical\n", "function that allows the algorithm designer to introduce invariances\n", "into the function.\n", "\n", "An invariance is an important way of including prior understanding in a\n", "machine learning model. For example, in an image, a car is still a car\n", "regardless of whether it’s in the upper left or lower right corner of\n", "the image. This is known as translation invariance. A neural network\n", "encodes translation invariance in *convolutional layers*. Convolutional\n", "neural networks are widely used in image recognition tasks.\n", "\n", "An alternative structure is known as a recurrent neural network (RNN).\n", "RNNs neural networks encode temporal structure. They use auto regressive\n", "connections in their hidden layers, they can be seen as time series\n", "models which have non-linear auto-regressive basis functions. They are\n", "widely used in speech recognition and machine translation.\n", "\n", "Machine learning has been deployed in Speech Recognition (e.g. Alexa,\n", "deep neural networks, convolutional neural networks for speech\n", "recognition), in computer vision (e.g. Amazon Go, convolutional neural\n", "networks for person recognition and pose detection).\n", "\n", "The field of data science is related to AI, but philosophically\n", "different. It arises because we are increasingly creating large amounts\n", "of data through *happenstance* rather than active collection. In the\n", "modern era data is laid down by almost all our activities. The objective\n", "of data science is to extract insights from this data.\n", "\n", "Classically, in the field of statistics, data analysis proceeds by\n", "assuming that the question (or scientific hypothesis) comes before the\n", "data is created. E.g., if I want to determine the effectiveness of a\n", "particular drug, I perform a *design* for my data collection. I use\n", "foundational approaches such as randomization to account for\n", "confounders. This made a lot of sense in an era where data had to be\n", "actively collected. The reduction in cost of data collection and storage\n", "now means that many data sets are available which weren’t collected with\n", "a particular question in mind. This is a challenge because bias in the\n", "way data was acquired can corrupt the insights we derive. We can perform\n", "randomized control trials (or A/B tests) to verify our conclusions, but\n", "the opportunity is to use data science techniques to better guide our\n", "question selection or even answer a question without the expense of a\n", "full randomized control trial (referred to as A/B testing in modern\n", "internet parlance).\n", "\n", "## Amazon: Bits and Atoms\n", "\n", "## Machine Learning in Supply Chain \\[edit\\]\n", "\n", "\n", "\n", "Figure: Packhorse Bridge under Burbage Edge. This packhorse route\n", "climbs steeply out of Hathersage and heads towards Sheffield. Packhorses\n", "were the main route for transporting goods across the Peak District. The\n", "high cost of transport is one driver of the 'smith' model, where there\n", "is a local skilled person responsible for assembling or creating goods\n", "(e.g. a blacksmith). \n", "\n", "On Sunday mornings in Sheffield, I often used to run across Packhorse\n", "Bridge in Burbage valley. The bridge is part of an ancient network of\n", "trails crossing the Pennines that, before Turnpike roads arrived in the\n", "18th century, was the main way in which goods were moved. Given that the\n", "moors around Sheffield were home to sand quarries, tin mines, lead mines\n", "and the villages in the Derwent valley were known for nail and pin\n", "manufacture, this wasn't simply movement of agricultural goods, but it\n", "was the infrastructure for industrial transport.\n", "\n", "The profession of leading the horses was known as a Jagger and leading\n", "out of the village of Hathersage is Jagger's Lane, a trail that headed\n", "underneath Stanage Edge and into Sheffield.\n", "\n", "The movement of goods from regions of supply to areas of demand is\n", "fundamental to our society. The physical infrastructure of supply chain\n", "has evolved a great deal over the last 300 years.\n", "\n", "\n", "\n", "Figure: Richard Arkwright is regarded of the founder of the modern\n", "factory system. Factories exploit distribution networks to centralize\n", "production of goods. Arkwright located his factory in Cromford due to\n", "proximity to Nottingham Weavers (his market) and availability of water\n", "power from the tributaries of the Derwent river. When he first arrived\n", "there was almost no transportation network. Over the following 200 years\n", "The Cromford Canal (1790s), a Turnpike (now the A6, 1816-18) and the\n", "High Peak Railway (now closed, 1820s) were all constructed to improve\n", "transportation access as the factory blossomed.\n", "\n", "Richard Arkwright is known as the father of the modern factory system.\n", "In 1771 he set up a Mill for spinning cotton yarn in the village of\n", "Cromford, in the Derwent Valley. The Derwent valley is relatively\n", "inaccessible. Raw cotton arrived in Liverpool from the US and India. It\n", "needed to be transported on packhorse across the bridleways of the\n", "Pennines. But Cromford was a good location due to proximity to\n", "Nottingham, where weavers where consuming the finished thread, and the\n", "availability of water power from small tributaries of the Derwent river\n", "for Arkwright's [water\n", "frames](https://en.wikipedia.org/wiki/Spinning_jenny) which automated\n", "the production of yarn from raw cotton.\n", "\n", "By 1794 the Cromford canal was opened to bring coal in to Cromford and\n", "give better transport to Nottingham. The construction of the canals was\n", "driven by the need to improve the transport infrastructure, facilitating\n", "the movement of goods across the UK. Canals, roads and railways were\n", "initially constructed by the economic need for moving goods. To improve\n", "supply chain.\n", "\n", "## Containerization \\[edit\\]\n", "\n", "\n", "\n", "Figure: The container is one of the major drivers of globalization,\n", "and arguably the largest agent of social change in the last 100 years.\n", "It reduces the cost of transportation, significantly changing the\n", "appropriate topology of distribution networks. The container makes it\n", "possible to ship goods halfway around the world for cheaper than it\n", "costs to process those goods, leading to an extended distribution\n", "topology.\n", "\n", "Containerization has had a dramatic effect on global economics, placing\n", "many people in the developing world at the end of the supply chain.\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "Figure: Wild Alaskan Cod, being solid in the Pacific Northwest, that\n", "is a product of China. It is cheaper to ship the deep frozen fish\n", "thousands of kilometers for processing than to process locally.\n", "\n", "For example, you can buy Wild Alaskan Cod fished from Alaska, processed\n", "in China, sold in North America. This is driven by the low cost of\n", "transport for frozen cod vs the higher relative cost of cod processing\n", "in the US versus China. Similarly,\n", "Scottish\n", "prawns are also processed in China for sale in the UK.\n", "\n", "This effect on cost of transport vs cost of processing is the main\n", "driver of the topology of the modern supply chain and the associated\n", "effect of globalization. If transport is much cheaper than processing,\n", "then processing will tend to agglomerate in places where processing\n", "costs can be minimized.\n", "\n", "Large scale global economic change has principally been driven by\n", "changes in the technology that drives supply chain.\n", "\n", "Supply chain is a large-scale automated decision making network. Our aim\n", "is to make decisions not only based on our models of customer behavior\n", "(as observed through data), but also by accounting for the structure of\n", "our fulfilment center, and delivery network.\n", "\n", "Many of the most important questions in supply chain take the form of\n", "counterfactuals. E.g. “What would happen if we opened a manufacturing\n", "facility in Cambridge?” A counter factual is a question that implies a\n", "mechanistic understanding of a system. It goes beyond simple smoothness\n", "assumptions or translation invariants. It requires a physical, or\n", "*mechanistic* understanding of the supply chain network. For this\n", "reason, the type of models we deploy in supply chain often involve\n", "simulations or more mechanistic understanding of the network.\n", "\n", "In supply chain Machine Learning alone is not enough, we need to bridge\n", "between models that contain real mechanisms and models that are entirely\n", "data driven.\n", "\n", "This is challenging, because as we introduce more mechanism to the\n", "models we use, it becomes harder to develop efficient algorithms to\n", "match those models to data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('ncwsr1Of6Cw')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Figure: The Supply Chain Optimization Team (SCOT) at Amazon is\n", "responsible for the automated decision making in (probably) the world's\n", "largest AI.\n", "\n", "> Solve Supply Chain, then solve everything else.\n", "\n", "# The Three Ds of Machine Learning Systems Design \\[edit\\]\n", "\n", "We can characterize the challenges for integrating machine learning\n", "within our systems as the three Ds. Decomposition, Data and Deployment.\n", "\n", "You can also check my blog post on [\"The 3Ds of Machine Learning Systems\n", "Design\"](http://inverseprobability.com/2018/11/05/the-3ds-of-machine-learning-systems-design).\n", "\n", "The first two components *decomposition* and *data* are interlinked, but\n", "we will first outline the decomposition challenge. Below we will mainly\n", "focus on *supervised learning* because this is arguably the technology\n", "that is best understood within machine learning.\n", "\n", "## Decomposition \\[edit\\]\n", "\n", "Machine learning is not magical pixie dust, we cannot simply automate\n", "all decisions through data. We are constrained by our data (see below)\n", "and the models we use.[^1] Machine learning models are relatively simple\n", "function mappings that include characteristics such as smoothness. With\n", "some famous exceptions, e.g. speech and image data, inputs are\n", "constrained in the form of vectors and the model consists of a\n", "mathematically well-behaved function. This means that some careful\n", "thought has to be put in to the right sub-process to automate with\n", "machine learning. This is the challenge of *decomposition* of the\n", "machine learning system.\n", "\n", "Any repetitive task is a candidate for automation, but many of the\n", "repetitive tasks we perform as humans are more complex than any\n", "individual algorithm can replace. The selection of which task to\n", "automate becomes critical and has downstream effects on our overall\n", "system design.\n", "\n", "### Pigeonholing\n", "\n", "\n", "\n", "Figure: The machine learning systems decomposition process calls for\n", "separating a complex task into decomposable separate entities. A process\n", "we can think of as\n", "pigeonholing.\n", "\n", "Some aspects to take into account are\n", "\n", "1. Can we refine the decision we need to a set of repetitive tasks\n", " where input information and output decision/value is well defined?\n", "2. Can we represent each sub-task we’ve defined with a mathematical\n", " mapping?\n", "\n", "The representation necessary for the second aspect may involve massaging\n", "of the problem: feature selection or adaptation. It may also involve\n", "filtering out exception cases (perhaps through a pre-classification).\n", "\n", "All else being equal, we’d like to keep our models simple and\n", "interpretable. If we can convert a complex mapping to a linear mapping\n", "through clever selection of sub-tasks and features this is a big win.\n", "\n", "For example, Facebook have *feature engineers*, individuals whose main\n", "role is to design features they think might be useful for one of their\n", "tasks (e.g. newsfeed ranking, or ad matching). Facebook have a\n", "training/testing pipeline called\n", "[FBLearner](https://www.facebook.com/Engineering/posts/fblearner-flow-is-a-machine-learning-platform-capable-of-easily-reusing-algorith/10154077833317200/).\n", "Facebook have predefined the sub-tasks they are interested in, and they\n", "are tightly connected to their business model.\n", "\n", "It is easier for Facebook to do this because their business model is\n", "heavily focused on user interaction. A challenge for companies that have\n", "a more diversified portfolio of activities driving their business is the\n", "identification of the most appropriate sub-task. A potential solution to\n", "feature and model selection is known as *AutoML* [@Feurer:automl15]. Or\n", "we can think of it as using Machine Learning to assist Machine Learning.\n", "It’s also called meta-learning. Learning about learning. The input to\n", "the ML algorithm is a machine learning task, the output is a proposed\n", "model to solve the task.\n", "\n", "One trap that is easy to fall in is too much emphasis on the type of\n", "model we have deployed rather than the appropriateness of the task\n", "decomposition we have chosen.\n", "\n", "**Recommendation**: Conditioned on task decomposition, we should\n", "automate the process of model improvement. Model updates should not be\n", "discussed in management meetings, they should be deployed and updated as\n", "a matter of course. Further details below on model deployment, but model\n", "updating needs to be considered at design time. This is the domain of\n", "AutoML.\n", "\n", "\n", "\n", "Figure: The answer to the question which comes first, the chicken or\n", "the egg is simple, they co-evolve [@Popper:conjectures63]. Similarly,\n", "when we place components together in a complex machine learning system,\n", "they will tend to co-evolve and compensate for one another.\n", "\n", "To form modern decision-making systems, many components are interlinked.\n", "We decompose our complex decision making into individual tasks, but the\n", "performance of each component is dependent on those upstream of it.\n", "\n", "This naturally leads to co-evolution of systems; upstream errors can be\n", "compensated by downstream corrections.\n", "\n", "To embrace this characteristic, end-to-end training could be considered.\n", "Why produce the best forecast by metrics when we can just produce the\n", "best forecast for our systems? End-to-end training can lead to\n", "improvements in performance, but it would also damage our systems\n", "decomposability and its interpretability, and perhaps its adaptability.\n", "\n", "The less human interpretable our systems are, the harder they are to\n", "adapt to different circumstances or diagnose when there's a challenge.\n", "The trade-off between interpretability and performance is a constant\n", "tension which we should always retain in our minds when performing our\n", "system design.\n", "\n", "## Data \\[edit\\]\n", "\n", "It is difficult to overstate the importance of data. It is half of the\n", "equation for machine learning but is often utterly neglected. We can\n", "speculate that there are two reasons for this. Firstly, data cleaning is\n", "perceived as tedious. It doesn’t seem to consist of the same\n", "intellectual challenges that are inherent in constructing complex\n", "mathematical models and implementing them in code. Secondly, data\n", "cleaning is highly complex, it requires a deep understanding of how\n", "machine learning systems operate and good intuitions about the data\n", "itself, the domain from which data is drawn (e.g. Supply Chain) and what\n", "downstream problems might be caused by poor data quality.\n", "\n", "A consequence of these two reasons, data cleaning seems difficult to\n", "formulate into a readily teachable set of principles. As a result, it is\n", "heavily neglected in courses on machine learning and data science.\n", "Despite data being half the equation, most University courses spend\n", "little to no time on its challenges.\n", "\n", "Anecdotally, talking to data modelling scientists. Most say they spend\n", "80% of their time acquiring and cleaning data. This is precipitating\n", "what I refer to as the “data crisis”. This is an analogy with software.\n", "The “software crisis” was the phenomenon of inability to deliver\n", "software solutions due to increasing complexity of implementation. There\n", "was no single shot solution for the software crisis, it involved better\n", "practice (scrum, test orientated development, sprints, code review),\n", "improved programming paradigms (object orientated, functional) and\n", "better tools (CVS, then SVN, then git).\n", "\n", "## The Data Crisis \\[edit\\]\n", "\n", "Anecdotally, talking to data modelling scientists. Most say they spend\n", "80% of their time acquiring and cleaning data. This is precipitating\n", "what I refer to as the “data crisis”. This is an analogy with software.\n", "The “software crisis” was the phenomenon of inability to deliver\n", "software solutions due to increasing complexity of implementation. There\n", "was no single shot solution for the software crisis, it involved better\n", "practice (scrum, test orientated development, sprints, code review),\n", "improved programming paradigms (object orientated, functional) and\n", "better tools (CVS, then SVN, then git).\n", "\n", "However, these challenges aren't new, they are merely taking a different\n", "form. From the computer's perspective software *is* data. The first wave\n", "of the data crisis was known as the *software crisis*.\n", "\n", "### The Software Crisis\n", "\n", "In the late sixties early software programmers made note of the\n", "increasing costs of software development and termed the challenges\n", "associated with it as the \"[Software\n", "Crisis](https://en.wikipedia.org/wiki/Software_crisis)\". Edsger Dijkstra\n", "referred to the crisis in his 1972 Turing Award winner's address.\n", "\n", "> The major cause of the software crisis is that the machines have\n", "> become several orders of magnitude more powerful! To put it quite\n", "> bluntly: as long as there were no machines, programming was no problem\n", "> at all; when we had a few weak computers, programming became a mild\n", "> problem, and now we have gigantic computers, programming has become an\n", "> equally gigantic problem.\n", ">\n", "> Edsger Dijkstra (1930-2002), The Humble Programmer\n", "\n", "> The major cause of the data crisis is that machines have become more\n", "> interconnected than ever before. Data access is therefore cheap, but\n", "> data quality is often poor. What we need is cheap high-quality data.\n", "> That implies that we develop processes for improving and verifying\n", "> data quality that are efficient.\n", ">\n", "> There would seem to be two ways for improving efficiency. Firstly, we\n", "> should not duplicate work. Secondly, where possible we should automate\n", "> work.\n", "\n", "What I term \"The Data Crisis\" is the modern equivalent of this problem.\n", "The quantity of modern data, and the lack of attention paid to data as\n", "it is initially \"laid down\" and the costs of data cleaning are bringing\n", "about a crisis in data-driven decision making. This crisis is at the\n", "core of the challenge of *technical debt* in machine learning\n", "[@Sculley:debt15].\n", "\n", "Just as with software, the crisis is most correctly addressed by\n", "'scaling' the manner in which we process our data. Duplication of work\n", "occurs because the value of data cleaning is not correctly recognised in\n", "management decision making processes. Automation of work is increasingly\n", "possible through techniques in \"artificial intelligence\", but this will\n", "also require better management of the data science pipeline so that data\n", "about data science (meta-data science) can be correctly assimilated and\n", "processed. The Alan Turing institute has a program focussed on this\n", "area, [AI for Data\n", "Analytics](https://www.turing.ac.uk/research_projects/artificial-intelligence-data-analytics/).\n", "\n", "Data is the new software, and the data crisis is already upon us. It is\n", "driven by the cost of cleaning data, the paucity of tools for monitoring\n", "and maintaining our deployments, the provenance of our models (e.g. with\n", "respect to the data they’re trained on).\n", "\n", "Three principal changes need to occur in response. They are cultural and\n", "infrastructural.\n", "\n", "### The Data First Paradigm\n", "\n", "First of all, to excel in data driven decision making we need to move\n", "from a *software first* paradigm to a *data first* paradigm. That means\n", "refocusing on data as the product. Software is the intermediary to\n", "producing the data, and its quality standards must be maintained, but\n", "not at the expense of the data we are producing. Data cleaning and\n", "maintenance need to be prized as highly as software debugging and\n", "maintenance. Instead of *software* as a service, we should refocus\n", "around *data* as a service. This first change is a cultural change in\n", "which our teams think about their outputs in terms of data. Instead of\n", "decomposing our systems around the software components, we need to\n", "decompose them around the data generating and consuming components.[^2]\n", "Software first is only an intermediate step on the way to becoming *data\n", "first*. It is a necessary, but not a sufficient condition for efficient\n", "machine learning systems design and deployment. We must move from\n", "*software orientated architecture* to a *data orientated architecture*.\n", "\n", "### Data Quality\n", "\n", "Secondly, we need to improve our language around data quality. We cannot\n", "assess the costs of improving data quality unless we generate a language\n", "around what data quality means. Data Readiness Levels[^3] are an\n", "assessment of data quality that is based on the usage to which data is\n", "put.\n", "\n", "### Data Readiness Levels\n", "\n", "[Data Readiness\n", "Levels](http://inverseprobability.com/2017/01/12/data-readiness-levels)\n", "[@Lawrence:drl17] are an attempt to develop a language around data\n", "quality that can bridge the gap between technical solutions and decision\n", "makers such as managers and project planners. The are inspired by\n", "Technology Readiness Levels which attempt to quantify the readiness of\n", "technologies for deployment.b\n", "\n", "### Three Grades of Data Readiness \\[edit\\]\n", "\n", "Data-readiness describes, at its coarsest level, three separate stages\n", "of data graduation.\n", "\n", "- Grade C - accessibility\n", "- Transition: data becomes electronically available\n", "- Grade B - validity\n", "- Transition: pose a question to the data.\n", "- Grade A - usability\n", "\n", "The important definitions are at the transition. The move from Grade C\n", "data to Grade B data is delimited by the *electronic availability* of\n", "the data. The move from Grade B to Grade A data is delimited by posing a\n", "question or task to the data [@Lawrence:drl17].\n", "\n", "**Recommendation**: Build a shared understanding of the language of data\n", "readiness levels for use in planning documents and costing of data\n", "cleaning and the benefits of reusing cleaned data.\n", "\n", "### Move Beyond Software Engineering to Data Engineering\n", "\n", "Thirdly, we need to improve our mental model of the separation of data\n", "science from applied science. A common trap in our thinking around data\n", "is to see data science (and data engineering, data preparation) as a\n", "sub-set of the software engineer’s or applied scientist’s skill set. As\n", "a result, we recruit and deploy the wrong type of resource. Data\n", "preparation and question formulation is superficially similar to both\n", "because of the need for programming skills, but the day to day problems\n", "faced are very different.\n", "\n", "## Combining Data and Systems Design \\[edit\\]\n", "\n", "## Data Science as Debugging \\[edit\\]\n", "\n", "One challenge for existing information technology professionals is\n", "realizing the extent to which a software ecosystem based on data differs\n", "from a classical ecosystem. In particular, by ingesting data we bring\n", "unknowns/uncontrollables into our decision-making system. This presents\n", "opportunity for adversarial exploitation and unforeseen operation.\n", "\n", "You can also check my blog post on [\"Data Science as\n", "Debugging\"](http://inverseprobability.com/2017/03/14/data-science-as-debugging)..\n", "\n", "Starting with the analysis of a data set, the nature of data science is\n", "somewhat difference from classical software engineering.\n", "\n", "One analogy I find helpful for understanding the depth of change we need\n", "is the following. Imagine as a software engineer, you find a USB stick\n", "on the ground. And for some reason you *know* that on that USB stick is\n", "a particular API call that will enable you to make a significant\n", "positive difference on a business problem. You don't know which of the\n", "many library functions on the USB stick are the ones that will help. And\n", "it could be that some of those library functions will hinder, perhaps\n", "because they are just inappropriate or perhaps because they have been\n", "placed there maliciously. The most secure thing to do would be to *not*\n", "introduce this code into your production system at all. But what if your\n", "manager told you to do so, how would you go about incorporating this\n", "code base?\n", "\n", "The answer is *very* carefully. You would have to engage in a process\n", "more akin to debugging than regular software engineering. As you\n", "understood the code base, for your work to be reproducible, you should\n", "be documenting it, not just what you discovered, but how you discovered\n", "it. In the end, you typically find a single API call that is the one\n", "that most benefits your system. But more thought has been placed into\n", "this line of code than any line of code you have written before.\n", "\n", "An enormous amount of debugging would be required. As the nature of the\n", "code base is understood, software tests to verify it also need to be\n", "constructed. At the end of all your work, the lines of software you\n", "write to actually interact with the software on the USB stick are likely\n", "to be minimal. But more thought would be put into those lines than\n", "perhaps any other lines of code in the system.\n", "\n", "Even then, when your API code is introduced into your production system,\n", "it needs to be deployed in an environment that monitors it. We cannot\n", "rely on an individual’s decision making to ensure the quality of all our\n", "systems. We need to create an environment that includes quality\n", "controls, checks and bounds, tests, all designed to ensure that\n", "assumptions made about this foreign code base are remaining valid.\n", "\n", "This situation is akin to what we are doing when we incorporate data in\n", "our production systems. When we are consuming data from others, we\n", "cannot assume that it has been produced in alignment with our goals for\n", "our own systems. Worst case, it may have been adversarially produced. A\n", "further challenge is that data is dynamic. So, in effect, the code on\n", "the USB stick is evolving over time.\n", "\n", "It might see that this process is easy to formalize now, we simply need\n", "to check what the formal software engineering process is for debugging,\n", "because that is the current software engineering activity that data\n", "science is closest to. But when we look for a formalization of\n", "debugging, we find that there is none. Indeed, modern software\n", "engineering mainly focusses on ensuring that code is written without\n", "bugs in the first place.\n", "\n", "**Recommendation**: Anecdotally, resolving a machine learning challenge\n", "requires 80% of the resource to be focused on the data and perhaps 20%\n", "to be focused on the model. But many companies are too keen to employ\n", "machine learning engineers who focus on the models, not the data. We\n", "should change our hiring priorities and training. Universities cannot\n", "provide the understanding of how to data-wrangle. Companies must fill\n", "this gap.\n", "\n", "\n", "\n", "Figure: A reservoir of data has more value if the data is consumable.\n", "The data crisis can only be addressed if we focus on outputs rather than\n", "inputs.\n", "\n", "\n", "\n", "Figure: For a data first architecture we need to clean our data at\n", "source, rather than individually cleaning data for each task. This\n", "involves a shift of focus from our inputs to our outputs. We should\n", "provide data streams that are consumable by many teams without\n", "purification.\n", "\n", "**Recommendation**: We need to share best practice around data\n", "deployment across our teams. We should make best use of our processes\n", "where applicable, but we need to develop them to become *data first*\n", "organizations. Data needs to be cleaned at *output* not at *input*.\n", "\n", "## Deployment \\[edit\\]\n", "\n", "Much of the academic machine learning systems point of view is based on\n", "a software systems point of view that is around 20 years out of date. In\n", "particular we build machine learning models on fixed training data sets,\n", "and we test them on stationary test data sets.\n", "\n", "In practice modern software systems involve continuous deployment of\n", "models into an ever-evolving world of data. These changes are indicated\n", "in the software world by greater availability of technologies like\n", "*streaming* technologies.\n", "\n", "### Continuous Deployment\n", "\n", "Once the decomposition is understood, the data is sourced and the models\n", "are created, the model code needs to be deployed.\n", "\n", "To extend the USB stick analogy further, how would we deploy that code\n", "if we thought it was likely to evolve in production? This is what data\n", "does. We cannot assume that the conditions under which we trained our\n", "model will be retained as we move forward, indeed the only constant we\n", "have is change.\n", "\n", "This means that when any data dependent model is deployed into\n", "production, it requires *continuous monitoring* to ensure the\n", "assumptions of design have not been invalidated. Software changes are\n", "qualified through testing, in particular a regression test ensures that\n", "existing functionality is not broken by change. Since data is\n", "continually evolving, machine learning systems require 'continual\n", "regression testing': oversight by systems that ensure their existing\n", "functionality has not been broken as the world evolves around them. An\n", "approach we refer to as *progression testing*. Unfortunately, standards\n", "around ML model deployment yet been developed. The modern world of\n", "continuous deployment does rely on testing, but it does not recognize\n", "the continuous evolution of the world around us.\n", "\n", "Progression tests are likely to be *statistical* tests in contrast to\n", "classical software tests. The tests should be monitoring model\n", "performance and quality measures. They could also monitor conformance to\n", "standardized *fairness* measures.\n", "\n", "If the world has changed around our decision-making ecosystem, how are\n", "we alerted to those changes?\n", "\n", "**Recommendation**: We establish best practice around model deployment.\n", "We need to shift our culture from standing up a software service, to\n", "standing up a *data as a service*. Data as a Service would involve\n", "continual monitoring of our deployed models in production. This would be\n", "regulated by 'hypervisor' systems[^4] that understand the context in\n", "which models are deployed and recognize when circumstances have changed,\n", "and models need retraining or restructuring.\n", "\n", "**Recommendation**: We should consider a major re-architecting of\n", "systems around our services. In particular we should scope the use of a\n", "*streaming architecture* (such as Apache Kafka) that ensures data\n", "persistence and enables asynchronous operation of our systems.[^5] This\n", "would enable the provision of QC streams, and real time dash boards as\n", "well as hypervisors.\n", "\n", "Importantly a streaming architecture implies the services we build are\n", "*stateless*, internal state is deployed on streams alongside external\n", "state. This allows for rapid assessment of other services' data.\n", "\n", "## Data Oriented Architectures \\[edit\\]\n", "\n", "In a streaming architecture we shift from management of services, to\n", "management of data streams. Instead of worrying about availability of\n", "the services we shift to worrying about the quality of the data those\n", "services are producing.\n", "\n", "## Streaming System\n", "\n", "Characteristics of a streaming system include a move from *pull* updates\n", "to *push* updates, i.e. the computation is driven by a change in the\n", "input data rather than the service calling for input data when it\n", "decides to run a computation. Streaming systems operate on 'rows' of the\n", "data rather than 'columns'. This is because the full column isn't\n", "normally available as it changes over time. As an important design\n", "principle, the services themselves are stateless, they take their state\n", "from the streaming ecosystem. This ensures the inputs and outputs of\n", "given computations are easy to declare. As a result, persistence of the\n", "data is also handled by the streaming ecosystem and decisions around\n", "data retention or recomputation can be taken at the systems level rather\n", "than the component level.\n", "\n", "## Apache Flink \\[edit\\]\n", "\n", "[Apache Flink](https://en.wikipedia.org/wiki/Apache_Flink) is a stream\n", "processing framework. Flink is a foundation for event driven processing.\n", "This gives a high throughput and low latency framework that operates on\n", "dataflows.\n", "\n", "Data storage is handled by other systems such as Apache Kafka or AWS\n", "Kinesis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stream.join(otherStream)\n", " .where()\n", " .equalTo()\n", " .window()\n", " .apply()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apache Flink allows operations on streams. For example, the join\n", "operation above. In a traditional data base management system, this join\n", "operation may be written in SQL and called on demand. In a streaming\n", "ecosystem, computations occur as and when the streams update.\n", "\n", "The join is handled by the ecosystem surrounding the business logic.\n", "\n", "## Trading System\n", "\n", "As a simple example we'll consider a high frequency trading system. Anne\n", "wishes to build a share trading system. She has access to a high\n", "frequency trading system which provides prices and allows trades at\n", "millisecond intervals. She wishes to build an automated trading system.\n", "\n", "Let's assume that price trading data is available as a data stream. But\n", "the price now is not the only information that Anne needs, she needs an\n", "estimate of the price in the future." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import os" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate an artificial trading stream\n", "days=pd.date_range(start='21/5/2017', end='21/05/2020')\n", "z = np.random.randn(len(days), 1)\n", "x = z.cumsum()+400" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "prices = pd.Series(x, index=days)\n", "hypothetical = prices.loc['21/5/2019':]\n", "real = prices.copy()\n", "real['21/5/2019':] = np.NaN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Figure: Anne has access to the share prices in the black stream but\n", "not in the blue stream. A hypothetical stream is the stream of future\n", "prices. Anne can define this hypothetical under constraints (latency,\n", "input etc). The need for a model is now exposed in the software\n", "infrastructure\n", "\n", "## Hypothetical Streams\n", "\n", "We'll call the future price a hypothetical stream.\n", "\n", "A hypothetical stream is a desired stream of information which cannot be\n", "directly accessed. The lack of direct access may be because the events\n", "happen in the future, or there may be some latency between the event and\n", "the availability of the data.\n", "\n", "Any hypothetical stream will only be provided as a prediction, ideally\n", "with an error bar.\n", "\n", "The nature of the hypothetical Anne needs is dependent on her\n", "decision-making process. In Anne's case it will depend over what period\n", "she is expecting her returns. In MDOP Anne specifies a hypothetical that\n", "is derived from the pricing stream.\n", "\n", "It is not the price stream directly, but Anne looks for *future*\n", "predictions from the price stream, perhaps for price in $T$ days' time.\n", "\n", "At this stage, this stream is merely typed as a hypothetical.\n", "\n", "There are constraints on the hypothetical, they include: the *input*\n", "information, the upper limit of latency between input and prediction,\n", "and the decision Anne needs to make (how far ahead, what her upside,\n", "downside risks are). These three constraints mean that we can only\n", "recover an approximation to the hypothetical.\n", "\n", "## Hypothetical Advantage\n", "\n", "What is the advantage to defining things in this way? By defining,\n", "clearly, the two streams as real and hypothetical variants of each\n", "other, we now enable automation of the deployment and any redeployment\n", "process. The hypothetical can be *instantiated* against the real, and\n", "design criteria can be constantly evaluated triggering retraining when\n", "necessary.\n", "\n", "## Ride Sharing System\n", "\n", "As a second example, we'll consider a ride sharing app.\n", "\n", "Anne is on her way home now; she wishes to hail a car using a ride\n", "sharing app.\n", "\n", "The app is designed in the following way. On opening her app Anne is\n", "notified about driverss in the nearby neighborhood. She is given an\n", "estimate of the time a ride may take to come.\n", "\n", "Given this information about driver availability, Anne may feel\n", "encouraged to enter a destination. Given this destination, a price\n", "estimate can be given. This price is conditioned on other riders that\n", "may wish to go in the same direction, but the price estimate needs to be\n", "made before the user agrees to the ride.\n", "\n", "Business customer service constraints dictate that this price may not\n", "change after Anne's order is confirmed.\n", "\n", "In this simple system, several decisions are being made, each of them on\n", "the basis of a hypothetical.\n", "\n", "When Anne calls for a ride, she is provided with an estimate based on\n", "the expected time a ride can be with her. But this estimate is made\n", "without knowing where Anne wants to go. There are constraints on drivers\n", "imposed by regional boundaries, reaching the end of their shift, or\n", "their current passengers mean that this estimate can only be a best\n", "guess.\n", "\n", "This best guess may well be driven by previous data.\n", "\n", "\n", "\n", "Figure: Service oriented architecture. The data access is buried in\n", "the cost allocation service. Data dependencies of the service cannot be\n", "found without trawling through the underlying code base.\n", "\n", "\n", "\n", "Figure: Data oriented architecture. Now the joins and the updates are\n", "exposed within the streaming ecosystem. We can programatically determine\n", "the factor graph which gives the thread through the model.\n", "\n", "\n", "\n", "Figure: Data-oriented programing. There is a requirement for an\n", "estimate of the driver allocation to give a rough cost estimate before\n", "the user has confirmed the ride. In data-oriented programming, this is\n", "achieved through declaring a hypothetical stream which approximates the\n", "true driver allocation, but with restricted input information and\n", "constraints on the computational latency.\n", "\n", "For the ride sharing system, we start to see a common issue with a more\n", "complex algorithmic decision-making system. Several decisions are being\n", "made multilple times. Let's look at the decisions we need along with\n", "some design criteria.\n", "\n", "1. Car Availability: Estimate time to arrival for Anne's ride using\n", " Anne's location and local available car locations. Latency 50\n", " milliseconds\n", "2. Cost Estimate: Estimate cost for journey using Anne's destination,\n", " location and local available car current destinations and\n", " availability. Latency 50 milliseconds\n", "3. Driver Allocation: Allocate car to minimize transport cost to\n", " destination. Latency 2 seconds.\n", "\n", "So we need:\n", "\n", "1. a hypothetical to estimate availability. It is constrained by\n", " lacking destination information and a low latency requirement.\n", "2. a hypothetical to estimate cost. It is constrained by low latency\n", " requirement and\n", "\n", "Simultaneously, drivers in this data ecosystem have an app which\n", "notifies them about new jobs and recommends them where to go.\n", "\n", "Further advantages. Strategies for data retention (when to snapshot) can\n", "be set globally.\n", "\n", "A few decisions need to be made in this system. First of all, when the\n", "user opens the app, the estimate of the time to the nearest ride may\n", "need to be computed quickly, to avoid latency in the service.\n", "\n", "This may require a quick estimate of the ride availability.\n", "\n", "## Information Dynamics\n", "\n", "With all the second guessing within a complex automated decision-making\n", "system, there are potential problems with information dynamics, the\n", "'closed loop' problem, where the sub-systems are being approximated\n", "(second guessing) and predictions downstream are being affected.\n", "\n", "This leads to the need for a closed loop analysis, for example, see the\n", "[\"Closed Loop Data\n", "Science\"](https://www.gla.ac.uk/schools/computing/research/researchsections/ida-section/closedloop/)\n", "project led by Rod Murray-Smith at Glasgow.\n", "\n", "Our aim is to release our first version of a data-oriented programming\n", "environment by end of June 2019 (pending internal approval).\n", "\n", "## Conclusion \\[edit\\]\n", "\n", "We operate in a technologically evolving environment. Machine learning\n", "is becoming a key coponent in our decision-making capabilities, our\n", "intelligence and strategic command. However, technology drove changes in\n", "battlefield strategy. From the stalemate of the first world war to the\n", "tank-dominated Blitzkrieg of the second, to the asymmetric warfare of\n", "the present. Our technology, tactics and strategies are also constantly\n", "evolving. Machine learning is part of that evolution solution, but the\n", "main challenge is not to become so fixated on the tactics of today that\n", "we miss the evolution of strategy that the technology is suggesting.\n", "\n", "Data oriented programming offers a set of development methodologies\n", "which ensure that the system designer considers what decisions are\n", "required, how they will be made, and critically, declares this within\n", "the system architecture.\n", "\n", "This allows for monitoring of *data quality*, *fairness*, *model\n", "accuracy* and opens the door to a more sophisticated form of auto ML\n", "where full redployments of models are considered while analyzing the\n", "information dynamics of a complex automated decision-making system.\n", "\n", "# References {#references .unnumbered}\n", "\n", "[^1]: We can also become constrained by our tribal thinking, just as\n", " each of the other groups can.\n", "\n", "[^2]: This is related to challenges of machine learning and technical\n", " debt [@Sculley:debt15], although we are trying to frame the solution\n", " here rather than the problem.\n", "\n", "[^3]: [Data Readiness\n", " Levels](http://inverseprobability.com/2017/01/12/data-readiness-levels)\n", " [@Lawrence:drl17] are an attempt to develop a language around data\n", " quality that can bridge the gap between technical solutions and\n", " decision makers such as managers and project planners. They are\n", " inspired by Technology Readiness Levels which attempt to quantify\n", " the readiness of technologies for deployment.\n", "\n", "[^4]: Emulation, or surrogate modelling, is one very promising approach\n", " to forming such a hypervisor. Emulators are models we fit to other\n", " models, often simulations, but the could also be other machine\n", " learning models. These models operate at the meta-level, not on the\n", " systems directly. This means they can be used to model how the\n", " sub-systems interact. As well as emulators we should consider real\n", " time dash boards, anomaly detection, mutlivariate analysis, data\n", " visualization and classical statistical approaches for hypervision\n", " of our deployed systems.\n", "\n", "[^5]: These approaches are one area of focus for my own team's research.\n", " A data first architecture is a prerequisite for efficient deployment\n", " of machine learning systems." ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }