{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning Systems Design\n", "### [Neil D. Lawrence](http://inverseprobability.com), Amazon Cambridge and University of Sheffield\n", "### 2019-06-06\n", "\n", "**Abstract**: Machine learning solutions, in particular those based on deep learning\n", "methods, form an underpinning of the current revolution in “artificial\n", "intelligence” that has dominated popular press headlines and is having a\n", "significant influence on the wider tech agenda. In this talk I will give\n", "an overview of where we are now with machine learning solutions, and\n", "what challenges we face both in the near and far future. These include\n", "practical application of existing algorithms in the face of the need to\n", "explain decision-making, mechanisms for improving the quality and\n", "availability of data, dealing with large unstructured datasets.\n", "\n", "$$\n", "\n", "\\newcommand{\\tk}[1]{\\textbf{TK}: #1}\n", "\\newcommand{\\Amatrix}{\\mathbf{A}}\n", "\\newcommand{\\KL}[2]{\\text{KL}\\left( #1\\,\\|\\,#2 \\right)}\n", "\\newcommand{\\Kaast}{\\kernelMatrix_{\\mathbf{ \\ast}\\mathbf{ \\ast}}}\n", "\\newcommand{\\Kastu}{\\kernelMatrix_{\\mathbf{ \\ast} \\inducingVector}}\n", "\\newcommand{\\Kff}{\\kernelMatrix_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kfu}{\\kernelMatrix_{\\mappingFunctionVector \\inducingVector}}\n", "\\newcommand{\\Kuast}{\\kernelMatrix_{\\inducingVector \\bf\\ast}}\n", "\\newcommand{\\Kuf}{\\kernelMatrix_{\\inducingVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kuu}{\\kernelMatrix_{\\inducingVector \\inducingVector}}\n", "\\newcommand{\\Kuui}{\\Kuu^{-1}}\n", "\\newcommand{\\Qaast}{\\mathbf{Q}_{\\bf \\ast \\ast}}\n", "\\newcommand{\\Qastf}{\\mathbf{Q}_{\\ast \\mappingFunction}}\n", "\\newcommand{\\Qfast}{\\mathbf{Q}_{\\mappingFunctionVector \\bf \\ast}}\n", "\\newcommand{\\Qff}{\\mathbf{Q}_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\aMatrix}{\\mathbf{A}}\n", "\\newcommand{\\aScalar}{a}\n", "\\newcommand{\\aVector}{\\mathbf{a}}\n", "\\newcommand{\\acceleration}{a}\n", "\\newcommand{\\bMatrix}{\\mathbf{B}}\n", "\\newcommand{\\bScalar}{b}\n", "\\newcommand{\\bVector}{\\mathbf{b}}\n", "\\newcommand{\\basisFunc}{\\phi}\n", "\\newcommand{\\basisFuncVector}{\\boldsymbol{ \\basisFunc}}\n", "\\newcommand{\\basisFunction}{\\phi}\n", "\\newcommand{\\basisLocation}{\\mu}\n", "\\newcommand{\\basisMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\basisScalar}{\\basisFunction}\n", "\\newcommand{\\basisVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\activationFunction}{\\phi}\n", "\\newcommand{\\activationMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\activationScalar}{\\basisFunction}\n", "\\newcommand{\\activationVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\bigO}{\\mathcal{O}}\n", "\\newcommand{\\binomProb}{\\pi}\n", "\\newcommand{\\cMatrix}{\\mathbf{C}}\n", "\\newcommand{\\cbasisMatrix}{\\hat{\\boldsymbol{ \\Phi}}}\n", "\\newcommand{\\cdataMatrix}{\\hat{\\dataMatrix}}\n", "\\newcommand{\\cdataScalar}{\\hat{\\dataScalar}}\n", "\\newcommand{\\cdataVector}{\\hat{\\dataVector}}\n", "\\newcommand{\\centeredKernelMatrix}{\\mathbf{ \\MakeUppercase{\\centeredKernelScalar}}}\n", "\\newcommand{\\centeredKernelScalar}{b}\n", "\\newcommand{\\centeredKernelVector}{\\centeredKernelScalar}\n", "\\newcommand{\\centeringMatrix}{\\mathbf{H}}\n", "\\newcommand{\\chiSquaredDist}[2]{\\chi_{#1}^{2}\\left(#2\\right)}\n", "\\newcommand{\\chiSquaredSamp}[1]{\\chi_{#1}^{2}}\n", "\\newcommand{\\conditionalCovariance}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\coregionalizationMatrix}{\\mathbf{B}}\n", "\\newcommand{\\coregionalizationScalar}{b}\n", "\\newcommand{\\coregionalizationVector}{\\mathbf{ \\coregionalizationScalar}}\n", "\\newcommand{\\covDist}[2]{\\text{cov}_{#2}\\left(#1\\right)}\n", "\\newcommand{\\covSamp}[1]{\\text{cov}\\left(#1\\right)}\n", "\\newcommand{\\covarianceScalar}{c}\n", "\\newcommand{\\covarianceVector}{\\mathbf{ \\covarianceScalar}}\n", "\\newcommand{\\covarianceMatrix}{\\mathbf{C}}\n", "\\newcommand{\\covarianceMatrixTwo}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\croupierScalar}{s}\n", "\\newcommand{\\croupierVector}{\\mathbf{ \\croupierScalar}}\n", "\\newcommand{\\croupierMatrix}{\\mathbf{ \\MakeUppercase{\\croupierScalar}}}\n", "\\newcommand{\\dataDim}{p}\n", "\\newcommand{\\dataIndex}{i}\n", "\\newcommand{\\dataIndexTwo}{j}\n", "\\newcommand{\\dataMatrix}{\\mathbf{Y}}\n", "\\newcommand{\\dataScalar}{y}\n", "\\newcommand{\\dataSet}{\\mathcal{D}}\n", "\\newcommand{\\dataStd}{\\sigma}\n", "\\newcommand{\\dataVector}{\\mathbf{ \\dataScalar}}\n", "\\newcommand{\\decayRate}{d}\n", "\\newcommand{\\degreeMatrix}{\\mathbf{ \\MakeUppercase{\\degreeScalar}}}\n", "\\newcommand{\\degreeScalar}{d}\n", "\\newcommand{\\degreeVector}{\\mathbf{ \\degreeScalar}}\n", "% Already defined by latex\n", "%\\newcommand{\\det}[1]{\\left|#1\\right|}\n", "\\newcommand{\\diag}[1]{\\text{diag}\\left(#1\\right)}\n", "\\newcommand{\\diagonalMatrix}{\\mathbf{D}}\n", "\\newcommand{\\diff}[2]{\\frac{\\text{d}#1}{\\text{d}#2}}\n", "\\newcommand{\\diffTwo}[2]{\\frac{\\text{d}^2#1}{\\text{d}#2^2}}\n", "\\newcommand{\\displacement}{x}\n", "\\newcommand{\\displacementVector}{\\textbf{\\displacement}}\n", "\\newcommand{\\distanceMatrix}{\\mathbf{ \\MakeUppercase{\\distanceScalar}}}\n", "\\newcommand{\\distanceScalar}{d}\n", "\\newcommand{\\distanceVector}{\\mathbf{ \\distanceScalar}}\n", "\\newcommand{\\eigenvaltwo}{\\ell}\n", "\\newcommand{\\eigenvaltwoMatrix}{\\mathbf{L}}\n", "\\newcommand{\\eigenvaltwoVector}{\\mathbf{l}}\n", "\\newcommand{\\eigenvalue}{\\lambda}\n", "\\newcommand{\\eigenvalueMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\eigenvalueVector}{\\boldsymbol{ \\lambda}}\n", "\\newcommand{\\eigenvector}{\\mathbf{ \\eigenvectorScalar}}\n", "\\newcommand{\\eigenvectorMatrix}{\\mathbf{U}}\n", "\\newcommand{\\eigenvectorScalar}{u}\n", "\\newcommand{\\eigenvectwo}{\\mathbf{v}}\n", "\\newcommand{\\eigenvectwoMatrix}{\\mathbf{V}}\n", "\\newcommand{\\eigenvectwoScalar}{v}\n", "\\newcommand{\\entropy}[1]{\\mathcal{H}\\left(#1\\right)}\n", "\\newcommand{\\errorFunction}{E}\n", "\\newcommand{\\expDist}[2]{\\left<#1\\right>_{#2}}\n", "\\newcommand{\\expSamp}[1]{\\left<#1\\right>}\n", "\\newcommand{\\expectation}[1]{\\left\\langle #1 \\right\\rangle }\n", "\\newcommand{\\expectationDist}[2]{\\left\\langle #1 \\right\\rangle _{#2}}\n", "\\newcommand{\\expectedDistanceMatrix}{\\mathcal{D}}\n", "\\newcommand{\\eye}{\\mathbf{I}}\n", "\\newcommand{\\fantasyDim}{r}\n", "\\newcommand{\\fantasyMatrix}{\\mathbf{ \\MakeUppercase{\\fantasyScalar}}}\n", "\\newcommand{\\fantasyScalar}{z}\n", "\\newcommand{\\fantasyVector}{\\mathbf{ \\fantasyScalar}}\n", "\\newcommand{\\featureStd}{\\varsigma}\n", "\\newcommand{\\gammaCdf}[3]{\\mathcal{GAMMA CDF}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaDist}[3]{\\mathcal{G}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaSamp}[2]{\\mathcal{G}\\left(#1,#2\\right)}\n", "\\newcommand{\\gaussianDist}[3]{\\mathcal{N}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gaussianSamp}[2]{\\mathcal{N}\\left(#1,#2\\right)}\n", "\\newcommand{\\given}{|}\n", "\\newcommand{\\half}{\\frac{1}{2}}\n", "\\newcommand{\\heaviside}{H}\n", "\\newcommand{\\hiddenMatrix}{\\mathbf{ \\MakeUppercase{\\hiddenScalar}}}\n", "\\newcommand{\\hiddenScalar}{h}\n", "\\newcommand{\\hiddenVector}{\\mathbf{ \\hiddenScalar}}\n", "\\newcommand{\\identityMatrix}{\\eye}\n", "\\newcommand{\\inducingInputScalar}{z}\n", "\\newcommand{\\inducingInputVector}{\\mathbf{ \\inducingInputScalar}}\n", "\\newcommand{\\inducingInputMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\inducingScalar}{u}\n", "\\newcommand{\\inducingVector}{\\mathbf{ \\inducingScalar}}\n", "\\newcommand{\\inducingMatrix}{\\mathbf{U}}\n", "\\newcommand{\\inlineDiff}[2]{\\text{d}#1/\\text{d}#2}\n", "\\newcommand{\\inputDim}{q}\n", "\\newcommand{\\inputMatrix}{\\mathbf{X}}\n", "\\newcommand{\\inputScalar}{x}\n", "\\newcommand{\\inputSpace}{\\mathcal{X}}\n", "\\newcommand{\\inputVals}{\\inputVector}\n", "\\newcommand{\\inputVector}{\\mathbf{ \\inputScalar}}\n", "\\newcommand{\\iterNum}{k}\n", "\\newcommand{\\kernel}{\\kernelScalar}\n", "\\newcommand{\\kernelMatrix}{\\mathbf{K}}\n", "\\newcommand{\\kernelScalar}{k}\n", "\\newcommand{\\kernelVector}{\\mathbf{ \\kernelScalar}}\n", "\\newcommand{\\kff}{\\kernelScalar_{\\mappingFunction \\mappingFunction}}\n", "\\newcommand{\\kfu}{\\kernelVector_{\\mappingFunction \\inducingScalar}}\n", "\\newcommand{\\kuf}{\\kernelVector_{\\inducingScalar \\mappingFunction}}\n", "\\newcommand{\\kuu}{\\kernelVector_{\\inducingScalar \\inducingScalar}}\n", "\\newcommand{\\lagrangeMultiplier}{\\lambda}\n", "\\newcommand{\\lagrangeMultiplierMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\lagrangian}{L}\n", "\\newcommand{\\laplacianFactor}{\\mathbf{ \\MakeUppercase{\\laplacianFactorScalar}}}\n", "\\newcommand{\\laplacianFactorScalar}{m}\n", "\\newcommand{\\laplacianFactorVector}{\\mathbf{ \\laplacianFactorScalar}}\n", "\\newcommand{\\laplacianMatrix}{\\mathbf{L}}\n", "\\newcommand{\\laplacianScalar}{\\ell}\n", "\\newcommand{\\laplacianVector}{\\mathbf{ \\ell}}\n", "\\newcommand{\\latentDim}{q}\n", "\\newcommand{\\latentDistanceMatrix}{\\boldsymbol{ \\Delta}}\n", "\\newcommand{\\latentDistanceScalar}{\\delta}\n", "\\newcommand{\\latentDistanceVector}{\\boldsymbol{ \\delta}}\n", "\\newcommand{\\latentForce}{f}\n", "\\newcommand{\\latentFunction}{u}\n", "\\newcommand{\\latentFunctionVector}{\\mathbf{ \\latentFunction}}\n", "\\newcommand{\\latentFunctionMatrix}{\\mathbf{ \\MakeUppercase{\\latentFunction}}}\n", "\\newcommand{\\latentIndex}{j}\n", "\\newcommand{\\latentScalar}{z}\n", "\\newcommand{\\latentVector}{\\mathbf{ \\latentScalar}}\n", "\\newcommand{\\latentMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\learnRate}{\\eta}\n", "\\newcommand{\\lengthScale}{\\ell}\n", "\\newcommand{\\rbfWidth}{\\ell}\n", "\\newcommand{\\likelihoodBound}{\\mathcal{L}}\n", "\\newcommand{\\likelihoodFunction}{L}\n", "\\newcommand{\\locationScalar}{\\mu}\n", "\\newcommand{\\locationVector}{\\boldsymbol{ \\locationScalar}}\n", "\\newcommand{\\locationMatrix}{\\mathbf{M}}\n", "\\newcommand{\\variance}[1]{\\text{var}\\left( #1 \\right)}\n", "\\newcommand{\\mappingFunction}{f}\n", "\\newcommand{\\mappingFunctionMatrix}{\\mathbf{F}}\n", "\\newcommand{\\mappingFunctionTwo}{g}\n", "\\newcommand{\\mappingFunctionTwoMatrix}{\\mathbf{G}}\n", "\\newcommand{\\mappingFunctionTwoVector}{\\mathbf{ \\mappingFunctionTwo}}\n", "\\newcommand{\\mappingFunctionVector}{\\mathbf{ \\mappingFunction}}\n", "\\newcommand{\\scaleScalar}{s}\n", "\\newcommand{\\mappingScalar}{w}\n", "\\newcommand{\\mappingVector}{\\mathbf{ \\mappingScalar}}\n", "\\newcommand{\\mappingMatrix}{\\mathbf{W}}\n", "\\newcommand{\\mappingScalarTwo}{v}\n", "\\newcommand{\\mappingVectorTwo}{\\mathbf{ \\mappingScalarTwo}}\n", "\\newcommand{\\mappingMatrixTwo}{\\mathbf{V}}\n", "\\newcommand{\\maxIters}{K}\n", "\\newcommand{\\meanMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanScalar}{\\mu}\n", "\\newcommand{\\meanTwoMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanTwoScalar}{m}\n", "\\newcommand{\\meanTwoVector}{\\mathbf{ \\meanTwoScalar}}\n", "\\newcommand{\\meanVector}{\\boldsymbol{ \\meanScalar}}\n", "\\newcommand{\\mrnaConcentration}{m}\n", "\\newcommand{\\naturalFrequency}{\\omega}\n", "\\newcommand{\\neighborhood}[1]{\\mathcal{N}\\left( #1 \\right)}\n", "\\newcommand{\\neilurl}{http://inverseprobability.com/}\n", "\\newcommand{\\noiseMatrix}{\\boldsymbol{ E}}\n", "\\newcommand{\\noiseScalar}{\\epsilon}\n", "\\newcommand{\\noiseVector}{\\boldsymbol{ \\epsilon}}\n", "\\newcommand{\\norm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\normalizedLaplacianMatrix}{\\hat{\\mathbf{L}}}\n", "\\newcommand{\\normalizedLaplacianScalar}{\\hat{\\ell}}\n", "\\newcommand{\\normalizedLaplacianVector}{\\hat{\\mathbf{ \\ell}}}\n", "\\newcommand{\\numActive}{m}\n", "\\newcommand{\\numBasisFunc}{m}\n", "\\newcommand{\\numComponents}{m}\n", "\\newcommand{\\numComps}{K}\n", "\\newcommand{\\numData}{n}\n", "\\newcommand{\\numFeatures}{K}\n", "\\newcommand{\\numHidden}{h}\n", "\\newcommand{\\numInducing}{m}\n", "\\newcommand{\\numLayers}{\\ell}\n", "\\newcommand{\\numNeighbors}{K}\n", "\\newcommand{\\numSequences}{s}\n", "\\newcommand{\\numSuccess}{s}\n", "\\newcommand{\\numTasks}{m}\n", "\\newcommand{\\numTime}{T}\n", "\\newcommand{\\numTrials}{S}\n", "\\newcommand{\\outputIndex}{j}\n", "\\newcommand{\\paramVector}{\\boldsymbol{ \\theta}}\n", "\\newcommand{\\parameterMatrix}{\\boldsymbol{ \\Theta}}\n", "\\newcommand{\\parameterScalar}{\\theta}\n", "\\newcommand{\\parameterVector}{\\boldsymbol{ \\parameterScalar}}\n", "\\newcommand{\\partDiff}[2]{\\frac{\\partial#1}{\\partial#2}}\n", "\\newcommand{\\precisionScalar}{j}\n", "\\newcommand{\\precisionVector}{\\mathbf{ \\precisionScalar}}\n", "\\newcommand{\\precisionMatrix}{\\mathbf{J}}\n", "\\newcommand{\\pseudotargetScalar}{\\widetilde{y}}\n", "\\newcommand{\\pseudotargetVector}{\\mathbf{ \\pseudotargetScalar}}\n", "\\newcommand{\\pseudotargetMatrix}{\\mathbf{ \\widetilde{Y}}}\n", "\\newcommand{\\rank}[1]{\\text{rank}\\left(#1\\right)}\n", "\\newcommand{\\rayleighDist}[2]{\\mathcal{R}\\left(#1|#2\\right)}\n", "\\newcommand{\\rayleighSamp}[1]{\\mathcal{R}\\left(#1\\right)}\n", "\\newcommand{\\responsibility}{r}\n", "\\newcommand{\\rotationScalar}{r}\n", "\\newcommand{\\rotationVector}{\\mathbf{ \\rotationScalar}}\n", "\\newcommand{\\rotationMatrix}{\\mathbf{R}}\n", "\\newcommand{\\sampleCovScalar}{s}\n", "\\newcommand{\\sampleCovVector}{\\mathbf{ \\sampleCovScalar}}\n", "\\newcommand{\\sampleCovMatrix}{\\mathbf{s}}\n", "\\newcommand{\\scalarProduct}[2]{\\left\\langle{#1},{#2}\\right\\rangle}\n", "\\newcommand{\\sign}[1]{\\text{sign}\\left(#1\\right)}\n", "\\newcommand{\\sigmoid}[1]{\\sigma\\left(#1\\right)}\n", "\\newcommand{\\singularvalue}{\\ell}\n", "\\newcommand{\\singularvalueMatrix}{\\mathbf{L}}\n", "\\newcommand{\\singularvalueVector}{\\mathbf{l}}\n", "\\newcommand{\\sorth}{\\mathbf{u}}\n", "\\newcommand{\\spar}{\\lambda}\n", "\\newcommand{\\trace}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\BasalRate}{B}\n", "\\newcommand{\\DampingCoefficient}{C}\n", "\\newcommand{\\DecayRate}{D}\n", "\\newcommand{\\Displacement}{X}\n", "\\newcommand{\\LatentForce}{F}\n", "\\newcommand{\\Mass}{M}\n", "\\newcommand{\\Sensitivity}{S}\n", "\\newcommand{\\basalRate}{b}\n", "\\newcommand{\\dampingCoefficient}{c}\n", "\\newcommand{\\mass}{m}\n", "\\newcommand{\\sensitivity}{s}\n", "\\newcommand{\\springScalar}{\\kappa}\n", "\\newcommand{\\springVector}{\\boldsymbol{ \\kappa}}\n", "\\newcommand{\\springMatrix}{\\boldsymbol{ \\mathcal{K}}}\n", "\\newcommand{\\tfConcentration}{p}\n", "\\newcommand{\\tfDecayRate}{\\delta}\n", "\\newcommand{\\tfMrnaConcentration}{f}\n", "\\newcommand{\\tfVector}{\\mathbf{ \\tfConcentration}}\n", "\\newcommand{\\velocity}{v}\n", "\\newcommand{\\sufficientStatsScalar}{g}\n", "\\newcommand{\\sufficientStatsVector}{\\mathbf{ \\sufficientStatsScalar}}\n", "\\newcommand{\\sufficientStatsMatrix}{\\mathbf{G}}\n", "\\newcommand{\\switchScalar}{s}\n", "\\newcommand{\\switchVector}{\\mathbf{ \\switchScalar}}\n", "\\newcommand{\\switchMatrix}{\\mathbf{S}}\n", "\\newcommand{\\tr}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\loneNorm}[1]{\\left\\Vert #1 \\right\\Vert_1}\n", "\\newcommand{\\ltwoNorm}[1]{\\left\\Vert #1 \\right\\Vert_2}\n", "\\newcommand{\\onenorm}[1]{\\left\\vert#1\\right\\vert_1}\n", "\\newcommand{\\twonorm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\vScalar}{v}\n", "\\newcommand{\\vVector}{\\mathbf{v}}\n", "\\newcommand{\\vMatrix}{\\mathbf{V}}\n", "\\newcommand{\\varianceDist}[2]{\\text{var}_{#2}\\left( #1 \\right)}\n", "% Already defined by latex\n", "%\\newcommand{\\vec}{#1:}\n", "\\newcommand{\\vecb}[1]{\\left(#1\\right):}\n", "\\newcommand{\\weightScalar}{w}\n", "\\newcommand{\\weightVector}{\\mathbf{ \\weightScalar}}\n", "\\newcommand{\\weightMatrix}{\\mathbf{W}}\n", "\\newcommand{\\weightedAdjacencyMatrix}{\\mathbf{A}}\n", "\\newcommand{\\weightedAdjacencyScalar}{a}\n", "\\newcommand{\\weightedAdjacencyVector}{\\mathbf{ \\weightedAdjacencyScalar}}\n", "\\newcommand{\\onesVector}{\\mathbf{1}}\n", "\\newcommand{\\zerosVector}{\\mathbf{0}}\n", "$$\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "# Introduction\n", "\n", "## The Centrifugal Governor \\[edit\\]\n", "\n", "\n", "\n", "Figure: Centrifugal governor as held by \"Science\" on Holborn\n", "Viaduct\n", "\n", "## Boulton and Watt's Steam Engine \\[edit\\]\n", "\n", "\n", "\n", "Figure: Watt's Steam Engine which made Steam Power Efficient and\n", "Practical.\n", "\n", "James Watt's steam engine contained an early machine learning device. In\n", "the same way that modern systems are component based, his engine was\n", "composed of components. One of which is a speed regulator sometimes\n", "known as *Watt's governor*. The two balls in the center of the image,\n", "when spun fast, rise, and through a linkage mechanism.\n", "\n", "The centrifugal governor was made famous by Boulton and Watt when it was\n", "deployed in the steam engine. Studying stability in the governor is the\n", "main subject of James Clerk Maxwell's paper on the theoretical analysis\n", "of governors [@Maxwell:governors1867]. This paper is a founding paper of\n", "control theory. In an acknowledgment of its influence, Wiener used the\n", "name [*cybernetics*](https://en.wikipedia.org/wiki/Cybernetics) to\n", "describe the field of control and communication in animals and the\n", "machine [@Wiener:cybernetics48]. Cybernetics is the Greek word for\n", "governor, which comes from the latin for helmsman.\n", "\n", "A governor is one of the simplest artificial intelligence systems. It\n", "senses the speed of an engine, and acts to change the position of the\n", "valve on the engine to slow it down.\n", "\n", "Although it's a mechanical system a governor can be seen as automating a\n", "role that a human would have traditionally played. It is an early\n", "example of artificial intelligence.\n", "\n", "The centrifugal governor has several parameters, the weight of the balls\n", "used, the length of the linkages and the limits on the balls movement.\n", "\n", "Two principle differences exist between the centrifugal governor and\n", "artificial intelligence systems of today.\n", "\n", "1. The centrifugal governor is a physical system and it is an integral\n", " part of a wider physical system that it regulates (the engine).\n", "2. The parameters of the governor were set by hand, our modern\n", " artificial intelligence systems have their parameters set by *data*.\n", "\n", "\n", "\n", "Figure: The centrifugal governor, an early example of a decision\n", "making system. The parameters of the governor include the lengths of the\n", "linkages (which effect how far the throttle opens in response to\n", "movement in the balls), the weight of the balls (which effects inertia)\n", "and the limits of to which the balls can rise.\n", "\n", "This has the basic components of sense and act that we expect in an\n", "intelligent system, and this system saved the need for a human operator\n", "to manually adjust the system in the case of overspeed. Overspeed has\n", "the potential to destroy an engine, so the governor operates as a safety\n", "device.\n", "\n", "The first wave of automation did bring about sabotoage as a worker's\n", "response. But if machinery was sabotaged, for example, if the linkage\n", "between sensor (the spinning balls) and action (the valve closure) was\n", "broken, this would be obvious to the engine operator at start up time.\n", "The machine could be repaired before operation.\n", "\n", "## What is Machine Learning? \\[edit\\]\n", "\n", "Machine learning allows us to extract knowledge from data to form a\n", "prediction.\n", "\n", "$$\\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}$$\n", "\n", "A machine learning prediction is made by combining a model with data to\n", "form the prediction. The manner in which this is done gives us the\n", "machine learning *algorithm*.\n", "\n", "Machine learning models are *mathematical models* which make weak\n", "assumptions about data, e.g. smoothness assumptions. By combining these\n", "assumptions with the data, we observe we can interpolate between data\n", "points or, occasionally, extrapolate into the future.\n", "\n", "Machine learning is a technology which strongly overlaps with the\n", "methodology of statistics. From a historical/philosophical view point,\n", "machine learning differs from statistics in that the focus in the\n", "machine learning community has been primarily on accuracy of prediction,\n", "whereas the focus in statistics is typically on the interpretability of\n", "a model and/or validating a hypothesis through data collection.\n", "\n", "The rapid increase in the availability of compute and data has led to\n", "the increased prominence of machine learning. This prominence is\n", "surfacing in two different but overlapping domains: data science and\n", "artificial intelligence.\n", "\n", "## From Model to Decision \\[edit\\]\n", "\n", "The real challenge, however, is end-to-end decision making. Taking\n", "information from the environment and using it to drive decision making\n", "to achieve goals.\n", "\n", "## Artificial Intelligence and Data Science \\[edit\\]\n", "\n", "Artificial intelligence has the objective of endowing computers with\n", "human-like intelligent capabilities. For example, understanding an image\n", "(computer vision) or the contents of some speech (speech recognition),\n", "the meaning of a sentence (natural language processing) or the\n", "translation of a sentence (machine translation).\n", "\n", "### Supervised Learning for AI\n", "\n", "The machine learning approach to artificial intelligence is to collect\n", "and annotate a large data set from humans. The problem is characterized\n", "by input data (e.g. a particular image) and a label (e.g. is there a car\n", "in the image yes/no). The machine learning algorithm fits a mathematical\n", "function (I call this the *prediction function*) to map from the input\n", "image to the label. The parameters of the prediction function are set by\n", "minimizing an error between the function’s predictions and the true\n", "data. This mathematical function that encapsulates this error is known\n", "as the *objective function*.\n", "\n", "This approach to machine learning is known as *supervised learning*.\n", "Various approaches to supervised learning use different prediction\n", "functions, objective functions or different optimization algorithms to\n", "fit them.\n", "\n", "For example, *deep learning* makes use of *neural networks* to form the\n", "predictions. A neural network is a particular type of mathematical\n", "function that allows the algorithm designer to introduce invariances\n", "into the function.\n", "\n", "An invariance is an important way of including prior understanding in a\n", "machine learning model. For example, in an image, a car is still a car\n", "regardless of whether it’s in the upper left or lower right corner of\n", "the image. This is known as translation invariance. A neural network\n", "encodes translation invariance in *convolutional layers*. Convolutional\n", "neural networks are widely used in image recognition tasks.\n", "\n", "An alternative structure is known as a recurrent neural network (RNN).\n", "RNNs neural networks encode temporal structure. They use auto regressive\n", "connections in their hidden layers, they can be seen as time series\n", "models which have non-linear auto-regressive basis functions. They are\n", "widely used in speech recognition and machine translation.\n", "\n", "Machine learning has been deployed in Speech Recognition (e.g. Alexa,\n", "deep neural networks, convolutional neural networks for speech\n", "recognition), in computer vision (e.g. Amazon Go, convolutional neural\n", "networks for person recognition and pose detection).\n", "\n", "The field of data science is related to AI, but philosophically\n", "different. It arises because we are increasingly creating large amounts\n", "of data through *happenstance* rather than active collection. In the\n", "modern era data is laid down by almost all our activities. The objective\n", "of data science is to extract insights from this data.\n", "\n", "Classically, in the field of statistics, data analysis proceeds by\n", "assuming that the question (or scientific hypothesis) comes before the\n", "data is created. E.g., if I want to determine the effectiveness of a\n", "particular drug, I perform a *design* for my data collection. I use\n", "foundational approaches such as randomization to account for\n", "confounders. This made a lot of sense in an era where data had to be\n", "actively collected. The reduction in cost of data collection and storage\n", "now means that many data sets are available which weren’t collected with\n", "a particular question in mind. This is a challenge because bias in the\n", "way data was acquired can corrupt the insights we derive. We can perform\n", "randomized control trials (or A/B tests) to verify our conclusions, but\n", "the opportunity is to use data science techniques to better guide our\n", "question selection or even answer a question without the expense of a\n", "full randomized control trial (referred to as A/B testing in modern\n", "internet parlance).\n", "\n", "## Amazon: Bits and Atoms\n", "\n", "## Machine Learning in Supply Chain \\[edit\\]\n", "\n", "\n", "\n", "Figure: Packhorse Bridge under Burbage Edge. This packhorse route\n", "climbs steeply out of Hathersage and heads towards Sheffield. Packhorses\n", "were the main route for transporting goods across the Peak District. The\n", "high cost of transport is one driver of the 'smith' model, where there\n", "is a local skilled person responsible for assembling or creating goods\n", "(e.g. a blacksmith). \n", "\n", "On Sunday mornings in Sheffield, I often used to run across Packhorse\n", "Bridge in Burbage valley. The bridge is part of an ancient network of\n", "trails crossing the Pennines that, before Turnpike roads arrived in the\n", "18th century, was the main way in which goods were moved. Given that the\n", "moors around Sheffield were home to sand quarries, tin mines, lead mines\n", "and the villages in the Derwent valley were known for nail and pin\n", "manufacture, this wasn't simply movement of agricultural goods, but it\n", "was the infrastructure for industrial transport.\n", "\n", "The profession of leading the horses was known as a Jagger and leading\n", "out of the village of Hathersage is Jagger's Lane, a trail that headed\n", "underneath Stanage Edge and into Sheffield.\n", "\n", "The movement of goods from regions of supply to areas of demand is\n", "fundamental to our society. The physical infrastructure of supply chain\n", "has evolved a great deal over the last 300 years.\n", "\n", "\n", "\n", "Figure: Richard Arkwright is regarded of the founder of the modern\n", "factory system. Factories exploit distribution networks to centralize\n", "production of goods. Arkwright located his factory in Cromford due to\n", "proximity to Nottingham Weavers (his market) and availability of water\n", "power from the tributaries of the Derwent river. When he first arrived\n", "there was almost no transportation network. Over the following 200 years\n", "The Cromford Canal (1790s), a Turnpike (now the A6, 1816-18) and the\n", "High Peak Railway (now closed, 1820s) were all constructed to improve\n", "transportation access as the factory blossomed.\n", "\n", "Richard Arkwright is known as the father of the modern factory system.\n", "In 1771 he set up a Mill for spinning cotton yarn in the village of\n", "Cromford, in the Derwent Valley. The Derwent valley is relatively\n", "inaccessible. Raw cotton arrived in Liverpool from the US and India. It\n", "needed to be transported on packhorse across the bridleways of the\n", "Pennines. But Cromford was a good location due to proximity to\n", "Nottingham, where weavers where consuming the finished thread, and the\n", "availability of water power from small tributaries of the Derwent river\n", "for Arkwright's [water\n", "frames](https://en.wikipedia.org/wiki/Spinning_jenny) which automated\n", "the production of yarn from raw cotton.\n", "\n", "By 1794 the Cromford canal was opened to bring coal in to Cromford and\n", "give better transport to Nottingham. The construction of the canals was\n", "driven by the need to improve the transport infrastructure, facilitating\n", "the movement of goods across the UK. Canals, roads and railways were\n", "initially constructed by the economic need for moving goods. To improve\n", "supply chain.\n", "\n", "## Containerization \\[edit\\]\n", "\n", "\n", "\n", "Figure: The container is one of the major drivers of globalization,\n", "and arguably the largest agent of social change in the last 100 years.\n", "It reduces the cost of transportation, significantly changing the\n", "appropriate topology of distribution networks. The container makes it\n", "possible to ship goods halfway around the world for cheaper than it\n", "costs to process those goods, leading to an extended distribution\n", "topology.\n", "\n", "Containerization has had a dramatic effect on global economics, placing\n", "many people in the developing world at the end of the supply chain.\n", "\n", "
\n", "\n", " | \n", "\n", "\n", " | \n", "