{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "abstract: |\n",
    "    Machine learning solutions, in particular those based on deep learning\n",
    "    methods, form an underpinning of the current revolution in “artificial\n",
    "    intelligence” that has dominated popular press headlines and is having a\n",
    "    significant influence on the wider tech agenda. In this talk I will give\n",
    "    an overview of where we are now with machine learning solutions, and\n",
    "    what challenges we face both in the near and far future. These include\n",
    "    practical application of existing algorithms in the face of the need to\n",
    "    explain decision making, mechanisms for improving the quality and\n",
    "    availability of data, dealing with large unstructured datasets.\n",
    "affiliation: University of Sheffield\n",
    "author: 'Neil D. Lawrence'\n",
    "bibliography:\n",
    "- '../other.bib'\n",
    "- '../lawrence.bib'\n",
    "- '../zbooks.bib'\n",
    "csl: '../elsevier-harvard.csl'\n",
    "title: Outlook for AI and Machine Learning\n",
    "transition: None\n",
    "venue: HM Treasury\n",
    "---\n",
    "\n",
    "#### Outlook for UK AI and Machine Learning\n",
    "\n",
    "#### 2018-05-11\n",
    "\n",
    "#### Neil D. Lawrence\n",
    "\n",
    "#### Amazon Cambridge and **University of Sheffield**\n",
    "\n",
    "`@lawrennd` [inverseprobability.com](http://inverseprobability.com)\n",
    "\n",
    "The aim of this presentation is give a sense of the current situation in\n",
    "machine learning and artificial intelligence as well as some perspective\n",
    "on the immediate outlook for the field.\n",
    "\n",
    "This presentation represents my personal opinion as an academic with 20\n",
    "years experience in machine learning, computational biology and data\n",
    "science. This is not in any sense *Amazon* policy, but since September\n",
    "2016 I have been on leave of absence at Amazon.\n",
    "\n",
    "The Gartner Hype Cycle\n",
    "----------------------\n",
    "\n",
    "<img class=\"negate\" src=\"../slides/diagrams/Gartner_Hype_Cycle.png\" width=\"70%\" align=\"center\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "\n",
    "The [Gartner Hype Cycle](https://en.wikipedia.org/wiki/Hype_cycle) tries\n",
    "to assess where an idea is in terms of maturity and adoption. It splits\n",
    "the evolution of technology into a technological trigger, a peak of\n",
    "expectations followed by a trough of disillusionment and a final\n",
    "ascension into a useful technology. It looks rather like a classical\n",
    "control response to a final set point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import pods\n",
    "import mlai\n",
    "import teaching_plots as plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# calling without arguments uses the default query terms\n",
    "data = pods.datasets.google_trends(['big data', 'data science', 'internet of things', 'machine learning']) \n",
    "data['data frame'].set_index('Date', inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "data['data frame'].plot(ax=ax)\n",
    "_ = ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=45)\n",
    "plt.gcf().subplots_adjust(bottom=0.3)\n",
    "handles = ax.get_lines()\n",
    "for handle in handles:\n",
    "    handle.set_visible(False)\n",
    "for i, handle in enumerate(handles):\n",
    "    handle.set_visible(True)\n",
    "    handle.set_linewidth(3)\n",
    "    mlai.write_figure('../slides/diagrams/data-science/bd-ds-iot-ml-google-trends{sample:0>3}.svg'.format(sample=i))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pods.notebook.display_plots('bd-ds-iot-ml-google-trends{sample:0>3}.svg', \n",
    "                            '../slides/diagrams/data-science', sample=(0,3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Google trends gives us insight into how far along various technological\n",
    "terms are on the hype cycle.\n",
    "\n",
    "What is Machine Learning?\n",
    "-------------------------\n",
    "\n",
    "What is machine learning? At its most basic level machine learning is a\n",
    "combination of\n",
    "\n",
    "$$ \\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}$$\n",
    "\n",
    "where *data* is our observations. They can be actively or passively\n",
    "acquired (meta-data). The *model* contains our assumptions, based on\n",
    "previous experience. THat experience can be other data, it can come from\n",
    "transfer learning, or it can merely be our beliefs about the\n",
    "regularities of the universe. In humans our models include our inductive\n",
    "biases. The *prediction* is an action to be taken or a categorization or\n",
    "a quality score. The reason that machine learning has become a mainstay\n",
    "of artificial intelligence is the importance of predictions in\n",
    "artificial intelligence. The data and the model are combined through\n",
    "computation.\n",
    "\n",
    "In practice we normally perform machine learning using two functions. To\n",
    "combine data with a model we typically make use of:\n",
    "\n",
    "**a prediction function** a function which is used to make the\n",
    "predictions. It includes our beliefs about the regularities of the\n",
    "universe, our assumptions about how the world works, e.g. smoothness,\n",
    "spatial similarities, temporal similarities.\n",
    "\n",
    "**an objective function** a function which defines the cost of\n",
    "misprediction. Typically it includes knowledge about the world’s\n",
    "generating processes (probabilistic objectives) or the costs we pay for\n",
    "mispredictions (empiricial risk minimization).\n",
    "\n",
    "The combination of data and model through the prediction function and\n",
    "the objectie function leads to a *learning algorithm*. The class of\n",
    "prediction functions and objective functions we can make use of is\n",
    "restricted by the algorithms they lead to. If the prediction function or\n",
    "the objective function are too complex, then it can be difficult to find\n",
    "an appropriate learning algorithm. Much of the acdemic field of machine\n",
    "learning is the quest for new learning algorithms that allow us to bring\n",
    "different types of models and data together.\n",
    "\n",
    "A useful reference for state of the art in machine learning is the UK\n",
    "Royal Society Report, [Machine Learning: Power and Promise of Computers\n",
    "that Learn by\n",
    "Example](https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf).\n",
    "\n",
    "You can also check my blog post on [“What is Machine\n",
    "Learning?”](http://inverseprobability.com/2017/07/17/what-is-machine-learning)\n",
    "\n",
    "Natural and Artificial Intelligence: Embodiment Factors\n",
    "-------------------------------------------------------\n",
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td>\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "<img class=\"\" src=\"../slides/diagrams/IBM_Blue_Gene_P_supercomputer.jpg\" width=\"40%\" align=\"center\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "<img class=\"\" src=\"../slides/diagrams/ClaudeShannon_MFO3807.jpg\" width=\"25%\" align=\"center\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td>\n",
    "compute\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "$$\\approx 100 \\text{ gigaflops}$$\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "$$\\approx 16 \\text{ petaflops}$$\n",
    "</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td>\n",
    "communicate\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "$$1 \\text{ gigbit/s}$$\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "$$100 \\text{ bit/s}$$\n",
    "</td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td>\n",
    "(compute/communicate)\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "$$10^{4}$$\n",
    "</td>\n",
    "<td align=\"center\">\n",
    "$$10^{14}$$\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "There is a fundamental limit placed on our intelligence based on our\n",
    "ability to communicate. Claude Shannon founded the field of information\n",
    "theory. The clever part of this theory is it allows us to separate our\n",
    "measurement of information from what the information pertains to[^1].\n",
    "\n",
    "Shannon measured information in bits. One bit of information is the\n",
    "amount of information I pass to you when I give you the result of a coin\n",
    "toss. Shannon was also interested in the amount of information in the\n",
    "English language. He estimated that on average a word in the English\n",
    "language contains 12 bits of information.\n",
    "\n",
    "Given typical speaking rates, that gives us an estimate of our ability\n",
    "to communicate of around 100 bits per second. Computers on the other\n",
    "hand can communicate much more rapidly. Current wired network speeds are\n",
    "around a billion bits per second, ten million times faster.\n",
    "\n",
    "When it comes to compute though, our best estimates indicate our\n",
    "computers are slower. A typical modern computer can process make around\n",
    "100 billion floating point operations per second, each floating point\n",
    "operation involves a 64 bit number. So the computer is processing around\n",
    "6,400 billion bits per second.\n",
    "\n",
    "It’s difficult to get similar estimates for humans, but by some\n",
    "estimates the amount of compute we would require to *simulate* a human\n",
    "brain is equivalent to that in the UK’s fastest computer, the MET office\n",
    "machine in Exeter, which in 2018 ranks as the 11th fastest computer in\n",
    "the world. That machine simulates the world’s weather each morning, and\n",
    "then simulates the world’s climate. It is a 16 petaflop machine,\n",
    "processing around 1,000 *trillion* bits per second.\n",
    "\n",
    "So when it comes to our ability to compute we are extraordinary, not\n",
    "compute in our conscious mind, but the underlying neuron firings that\n",
    "underpin both our consciousness, our sbuconsciousness as well as our\n",
    "motor control etc. By analogy I sometimes like to think of us as a\n",
    "Formula One engine. But in terms of our ability to deploy that\n",
    "computation in actual use, to share the results of what we have\n",
    "inferred, we are very limited. So when you imagine the F1 car that\n",
    "represents a psyche, think of an F1 car with bicycle wheels.\n",
    "\n",
    "In contrast, our computers have less computational power, but they can\n",
    "communicate far more fluidly. They are more like a go-kart, less well\n",
    "powered, but with tires that allow them to deploy that power.\n",
    "\n",
    "For humans, that means much of our computation should be dedicated to\n",
    "considering *what* we should compute. To do that efficiently we need to\n",
    "model the world around us. The most complex thing in the world around us\n",
    "is other humans. So it is no surprise that we model them. We second\n",
    "guess what their intentions are, and our communication is only necessary\n",
    "when they are departing from how we model them. Naturally, for this to\n",
    "work well, we need to understand those we work closely with. So it is no\n",
    "surprise that social communication, social bonding, forms so much of a\n",
    "part of our use of our limited bandwidth.\n",
    "\n",
    "There is a second effect here, our need to anthropomorphise objects\n",
    "around us. Our tendency to model our fellow humans extends to when we\n",
    "interact with other entities in our environment. To our pets as well as\n",
    "inanimate objects around us, such as computers or even our cars. This\n",
    "tendency to overinterpret could be a consequence of our limited ability\n",
    "to communicate.\n",
    "\n",
    "For more details see this paper [“Living Together: Mind and Machine\n",
    "Intelligence”](https://arxiv.org/abs/1705.07996), and this [TEDx\n",
    "talk](http://inverseprobability.com/talks/lawrence-tedx17/living-together.html).\n",
    "\n",
    "Evolved Relationship with Information\n",
    "-------------------------------------\n",
    "\n",
    "The high bandwidth of computers has resulted in a close relationship\n",
    "between the computer and data. Larege amounts of information can flow\n",
    "between the two. The degree to which the computer is mediating our\n",
    "relationship with data means that we should consider it an intermediary.\n",
    "\n",
    "Origininally our low bandwith relationship with data was affected by two\n",
    "characteristics. Firstly, our tendency to over-interpret driven by our\n",
    "need to extract as much knowledge from our low bandwidth information\n",
    "channel as possible. Secondly, by our improved understanding of the\n",
    "domain of *mathematical* statistics and how our cognitive biases can\n",
    "mislead us.\n",
    "\n",
    "With this new set up there is a potential for assimilating far more\n",
    "information via the computer, but the computer can present this to us in\n",
    "various ways. If it’s motives are not aligned with ours then it can\n",
    "misrepresent the information. This needn’t be nefarious it can be simply\n",
    "as a result of the computer pursuing a different objective from us. For\n",
    "example, if the computer is aiming to maximize our interaction time that\n",
    "may be a different objective from ours which may be to summarize\n",
    "information in a representative manner in the *shortest* possible lenght\n",
    "of time.\n",
    "\n",
    "For example, for me it was a common experience to pick up my telephone\n",
    "with the intention of checking when my next appointme was, but to soon\n",
    "find myself distracted by another application on the phone, and end up\n",
    "reading something on the internet. By the time I’d finished reading, I\n",
    "would often have forgotten the reason I picked up my phone in the first\n",
    "place.\n",
    "\n",
    "We can benefit enormously from the very large amount of information we\n",
    "can now obtain through this evolved relationship between us and data.\n",
    "Biology has already benefited from large scale data sharing and the\n",
    "improved inferences that can be drawn through summarizing data by\n",
    "computer. That has underpinned the revolution in computational biology.\n",
    "But in our daily lives this phenomenon is affecting us in ways which we\n",
    "haven’t envisaged.\n",
    "\n",
    "Better mediation of this flow actually requires a better understanding\n",
    "of human-computer interaction. This in turn involves understanding our\n",
    "own intelligence better, what its cognitive biases are and how these\n",
    "might mislead us.\n",
    "\n",
    "For further thoughts see [this Guardian\n",
    "article](https://www.theguardian.com/media-network/2015/jul/23/data-driven-economy-marketing)\n",
    "on marketing in the internet era and [this blog\n",
    "post](http://inverseprobability.com/2015/12/04/what-kind-of-ai) on\n",
    "System Zero.\n",
    "\n",
    "### Societal Effects\n",
    "\n",
    "We have already seen the effects of this changed dynamic in biology and\n",
    "computational biology. Improved sensorics have led to the new domains of\n",
    "transcriptomics, epigenomics, and ‘rich phenomics’ as well as\n",
    "considerably augmenting our capabilities in genomics.\n",
    "\n",
    "Biologists have had to become data-savvy, they require a rich\n",
    "understanding of the available data resources and need to assimilate\n",
    "existing data sets in their hypothesis generation as well as their\n",
    "experimental design. Modern biology has become a far more quantitative\n",
    "science, but the quantitativeness has required new methods developed in\n",
    "the domains of *computational biology* and *bioinformatics*.\n",
    "\n",
    "There is also great promise for personalized health, but in health the\n",
    "wide data-sharing that has underpinned success in the computational\n",
    "biology community is much harder to cary out.\n",
    "\n",
    "We can expect to see these phenomena reflected in wider society.\n",
    "Particularly as we make use of more automated decision making based only\n",
    "on data.\n",
    "\n",
    "The main phenomenon we see across the board is the shift in dynamic from\n",
    "the direct pathway between human and data, as traditionally mediated by\n",
    "classical statistcs, to a new flow of information via the computer. This\n",
    "change of dynamics gives us the modern and emerging domain of *data\n",
    "science*.\n",
    "\n",
    "Human Communication\n",
    "-------------------"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pods\n",
    "pods.notebook.display_plots('anne-bob-conversation{sample:0>3}.svg', \n",
    "                            '../slides/diagrams', sample=(0,7))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For human conversation to work, we require an internal model of who we\n",
    "are speaking to. We model each other, and combine our sense of who they\n",
    "are, who they think we are, and what has been said. This is our approach\n",
    "to dealing with the limited bandwidth connection we have. Empathy and\n",
    "understanding of intent. Mental dispositional concepts are used to\n",
    "augment our limited communication bandwidth.\n",
    "\n",
    "Fritz Heider referred to the important point of a conversation as being\n",
    "that they are happenings that are “*psychologically represented* in each\n",
    "of the participants” (his emphasis) [@Heider:interpersonal58]\n",
    "\n",
    "### Machine Learning and Narratives\n",
    "\n",
    "<img class=\"\" src=\"../slides/diagrams/Classic_baby_shoes.jpg\" width=\"60%\" align=\"\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "\n",
    "<center>\n",
    "*For sale: baby shoes, never worn.*\n",
    "</center>\n",
    "Consider the six word novel, apocraphally credited to Ernest Hemingway,\n",
    "“For sale: baby shoes, never worn”. To understand what that means to a\n",
    "human, you need a great deal of additional context. Context that is not\n",
    "directly accessible to a machine that has not got both the evolved and\n",
    "contextual understanding of our own condition to realize both the\n",
    "implication of the advert and what that implication means emotionally to\n",
    "the previous owner."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.lib.display import YouTubeVideo\n",
    "YouTubeVideo('8FIEZXMUM2I')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Fritz Heider](https://en.wikipedia.org/wiki/Fritz_Heider) and [Marianne\n",
    "Simmel](https://en.wikipedia.org/wiki/Marianne_Simmel)’s experiments\n",
    "with animated shapes from 1944 [@Heider:experimental44]. Our\n",
    "interpretation of these objects as showing motives and even emotion is a\n",
    "combination of our desire for narrative, a need for understanding of\n",
    "each other, and our ability to empathise. At one level, these are\n",
    "crudely drawn objects, but in another key way, the animator has\n",
    "communicated a story through simple facets such as their relative\n",
    "motions, their sizes and their actions. We apply our psychological\n",
    "representations to these faceless shapes in an effort to interpret their\n",
    "actions.\n",
    "\n",
    "### \n",
    "\n",
    "> There are three types of lies: lies, damned lies and statistics\n",
    ">\n",
    "> Benjamin Disraeli 1804-1881\n",
    "\n",
    "The quote lies, damned lies and statistics was credited to Benjamin\n",
    "Disraeli by Mark Twain in his autobiography. It characterizes the idea\n",
    "that statistic can be made to prove anything. But Disraeli died in 1881\n",
    "and Mark Twain died in 1910. The important breakthrough in overcoming\n",
    "our tendency to overinterpet data came with the formalization of the\n",
    "field through the development of *mathematical statistics*.\n",
    "\n",
    "### *Mathematical* Statistics\n",
    "\n",
    "<img class=\"\" src=\"../slides/diagrams/Portrait_of_Karl_Pearson.jpg\" width=\"30%\" align=\"\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "\n",
    "[Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson) (1857-1936),\n",
    "[Ronald Fisher](https://en.wikipedia.org/wiki/Ronald_Fisher) (1890-1962)\n",
    "and others considered the question of what conclusions can truly be\n",
    "drawn from data. Their mathematical studies act as a restraint on our\n",
    "tendency to over-interpret and see patterns where there are none. They\n",
    "introduced concepts such as randomized control trials that form a\n",
    "mainstay of the our decision making today, from government, to\n",
    "clinicians to large scale A/B testing that determines the nature of the\n",
    "web interfaces we interact with on social media and shopping.\n",
    "\n",
    "Today the statement “There are three types of lies: lies, damned lies\n",
    "and ‘big data’” may be more apt. We are revisiting many of the mistakes\n",
    "made in interpreting data from the 19th century. Big data is laid down\n",
    "by happenstance, rather than actively collected with a particular\n",
    "question in mind. That means it needs to be treated with care when\n",
    "conclusions are being drawn. For data science to succede it needs the\n",
    "same form of rigour that Pearson and Fisher brought to statistics, a\n",
    "“mathematical data science” is needed.\n",
    "\n",
    "### Artificial Intelligence and Data Science\n",
    "\n",
    "Machine learning technologies have been the driver of two related, but\n",
    "distinct disciplines. The first is *data science*. Data science is an\n",
    "emerging field that arises from the fact that we now collect so much\n",
    "data by happenstance, rather than by *experimental design*. Classical\n",
    "statistics is the science of drawing conclusions from data, and to do so\n",
    "statistical experiments are carefully designed. In the modern era we\n",
    "collect so much data that there’s a desire to draw inferences directly\n",
    "from the data.\n",
    "\n",
    "As well as machine learning, the field of data science draws from\n",
    "statistics, cloud computing, data storage (e.g. streaming data),\n",
    "visualization and data mining.\n",
    "\n",
    "In contrast, artificial intelligence technologies typically focus on\n",
    "emulating some form of human behaviour, such as understanding an image,\n",
    "or some speech, or translating text from one form to another. The recent\n",
    "advances in artifcial intelligence have come from machine learning\n",
    "providing the automation. But in contrast to data science, in artifcial\n",
    "intelligence the data is normally collected with the specific task in\n",
    "mind. In this sense it has strong relations to classical statistics.\n",
    "\n",
    "Classically artificial intelligence worried more about *logic* and\n",
    "*planning* and focussed less on data driven decision making. Modern\n",
    "machine learning owes more to the field of *Cybernetics*\n",
    "[@Wiener:cybernetics48] than artificial intelligence. Related fields\n",
    "include *robotics*, *speech recognition*, *language understanding* and\n",
    "*computer vision*.\n",
    "\n",
    "There are strong overlaps between the fields, the wide availability of\n",
    "data by happenstance makes it easier to collect data for designing AI\n",
    "systems. These relations are coming through wide availability of sensing\n",
    "technologies that are interconnected by celluar networks, WiFi and the\n",
    "internet. This phenomenon is sometimes known as the *Internet of\n",
    "Things*, but this feels like a dangerous misnomer. We must never forget\n",
    "that we are interconnecting people, not things.\n",
    "\n",
    "### What does Machine Learning do?\n",
    "\n",
    "Any process of automation allows us to scale what we do by codifying a\n",
    "process in some way that makes it efficient and repeatable. Machine\n",
    "learning automates by emulating human (or other actions) found in data.\n",
    "Machine learning codifies in the form of a mathematical function that is\n",
    "learnt by a computer. If we can create these mathematical functions in\n",
    "ways in which they can interconnect, then we can also build systems.\n",
    "\n",
    "Machine learning works through codifing a prediction of interest into a\n",
    "mathematical function. For example, we can try and predict the\n",
    "probability that a customer wants to by a jersey given knowledge of\n",
    "their age, and the latitude where they live. The technique known as\n",
    "logistic regression estimates the odds that someone will by a jumper as\n",
    "a linear weighted sum of the features of interest.\n",
    "\n",
    "$$ \\text{odds} = \\frac{p(\\text{bought})}{p(\\text{not bought})} $$\n",
    "$$ \\log \\text{odds}  = \\beta_0 + \\beta_1 \\text{age} + \\beta_2 \\text{latitude}$$\n",
    "\n",
    "Here $\\beta_0$, $\\beta_1$ and $\\beta_2$ are the parameters of the model.\n",
    "If $\\beta_1$ and $\\beta_2$ are both positive, then the log-odds that\n",
    "someone will buy a jumper increase with increasing latitude and age, so\n",
    "the further north you are and the older you are the more likely you are\n",
    "to buy a jumper. The parameter $\\beta_0$ is an offset parameter, and\n",
    "gives the log-odds of buying a jumper at zero age and on the equator. It\n",
    "is likely to be negative\\[\\^logarithms\\] indicating that the purchase is\n",
    "odds-against. This is actually a classical statistical model, and models\n",
    "like logistic regression are widely used to estimate probabilities from\n",
    "ad-click prediction to risk of disease.\n",
    "\n",
    "This is called a generalized linear model, we can also think of it as\n",
    "estimating the *probability* of a purchase as a nonlinear function of\n",
    "the features (age, lattitude) and the parameters (the $\\beta$ values).\n",
    "The function is known as the *sigmoid* or [logistic\n",
    "function](https://en.wikipedia.org/wiki/Logistic_regression), thus the\n",
    "name *logistic* regression.\n",
    "\n",
    "$$ p(\\text{bought}) =  {\\sigma\\left(\\beta_0 + \\beta_1 \\text{age} + \\beta_2 \\text{latitude}\\right)}$$\n",
    "\n",
    "In the case where we have *features* to help us predict, we sometimes\n",
    "denote such features as a vector, ${{\\bf {x}}}$, and we then use an\n",
    "inner product between the features and the parameters,\n",
    "$\\boldsymbol{\\beta}^\\top {{\\bf {x}}}= \\beta_1 {x}_1 + \\beta_2 {x}_2 + \\beta_3 {x}_3 ...$,\n",
    "to represent the argument of the sigmoid.\n",
    "\n",
    "$$ p(\\text{bought}) =  {\\sigma\\left(\\boldsymbol{\\beta}^\\top {{\\bf {x}}}\\right)}$$\n",
    "\n",
    "More generally, we aim to predict some aspect of our data, ${y}$, by\n",
    "relating it through a mathematical function, ${f}(\\cdot)$, to the\n",
    "parameters, $\\boldsymbol{\\beta}$ and the data, ${{\\bf {x}}}$.\n",
    "\n",
    "$$ {y}=  {f}\\left({{\\bf {x}}}, \\boldsymbol{\\beta}\\right)$$\n",
    "\n",
    "We call ${f}(\\cdot)$ the *prediction function*\n",
    "\n",
    "To obtain the fit to data, we use a separate function called the\n",
    "*objective function* that gives us a mathematical representation of the\n",
    "difference between our predictions and the real data.\n",
    "\n",
    "$${E}(\\boldsymbol{\\beta}, {\\mathbf{Y}}, {{\\bf X}})$$ A commonly used\n",
    "examples (for example in a regression problem) is least squares,\n",
    "$${E}(\\boldsymbol{\\beta}, {\\mathbf{Y}}, {{\\bf X}}) = \\sum_{i=1}^{n}\\left({y}_i - {f}({{\\bf {x}}}_i, \\boldsymbol{\\beta})\\right)^2.$$\n",
    "\n",
    "If a linear prediction funciton is combined with the least squares\n",
    "objective function then that gives us a classical *linear regression*,\n",
    "another classical statistical model. Statistics often focusses on linear\n",
    "models because it makes interpretation of the model easier.\n",
    "Interpretation is key in statistics because the aim is normally to\n",
    "validate questions by analysis of data. Machine learning has typically\n",
    "focussed more on the prediction function itself and worried less about\n",
    "the interpretation of parameters, which are normally denoted by\n",
    "$\\mathbf{w}$ instead of $\\boldsymbol{\\beta}$. As a result *non-linear*\n",
    "functions are explored more often as they tend to improve quality of\n",
    "predictions but at the expense of interpretability.\n",
    "\n",
    "### Deep Learning\n",
    "\n",
    "-   These are interpretable models: vital for disease etc.\n",
    "\n",
    "-   Modern machine learning methods are less interpretable\n",
    "\n",
    "-   Example: face recognition\n",
    "\n",
    "### \n",
    "\n",
    "<small>Outline of the DeepFace architecture. A front-end of a single\n",
    "convolution-pooling-convolution filtering on the rectified input,\n",
    "followed by three locally-connected layers and two fully-connected\n",
    "layers. Color illustrates feature maps produced at each layer. The net\n",
    "includes more than 120 million parameters, where more than 95% come from\n",
    "the local and fully connected.</small>\n",
    "\n",
    "<img class=\"\" src=\"../slides/diagrams/deepface_neg.png\" width=\"100%\" align=\"\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "\n",
    "<p align=\"right\">\n",
    "<small>Source: DeepFace</small>\n",
    "</p>\n",
    "### \n",
    "\n",
    "<img class=\"\" src=\"../slides/diagrams/576px-Early_Pinball.jpg\" width=\"50%\" align=\"\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "\n",
    "We can think of what these models are doing as being similar to early\n",
    "pin ball machines. In a neural network, we input a number (or numbers),\n",
    "whereas in pinball, we input a ball. The location of the ball on the\n",
    "left-right axis can be thought of as the number. As the ball falls\n",
    "through the machine, each layer of pins can be thought of as a different\n",
    "layer of neurons. Each layer acts to move the ball from left to right.\n",
    "\n",
    "In a pinball machine, when the ball gets to the bottom it might fall\n",
    "into a hole defining a score, in a neural network, that is equivalent to\n",
    "the decision: a classification of the input object.\n",
    "\n",
    "An image has more than one number associated with it, so it’s like\n",
    "playing pinball in a *hyper-space*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pods\n",
    "pods.notebook.display_plots('pinball{sample:0>3}.svg', \n",
    "                            '../slides/diagrams', sample=(1,2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At initialization, the pins aren’t in the right place to bring the ball\n",
    "to the correct decision.\n",
    "\n",
    "Learning involves moving all the pins to be in the right position, so\n",
    "that the ball falls in the right place. But moving all these pins in\n",
    "hyperspace can be difficult. In a hyper space you have to put a lot of\n",
    "data through the machine for to explore the positions of all the pins.\n",
    "Adversarial learning reflects the fact that a ball can be moved a small\n",
    "distance and lead to a very different result.\n",
    "\n",
    "Probabilistic methods explore more of the space by considering a range\n",
    "of possible paths for the ball through the machine.\n",
    "\n",
    "### Uncertainty and Learning\n",
    "\n",
    "-   In this “vanilla” form these machines “don’t know when they don’t\n",
    "    know”.\n",
    "\n",
    "-   Doubt is vital in real world decision making.\n",
    "\n",
    "-   Incorporating this in systems is a long time focus of my technical\n",
    "    research.\n",
    "\n",
    "### Comparison with Human Learning & Embodiment\n",
    "\n",
    "-   The emulation of intelligence does not exhibit all the\n",
    "    meta-modelling humans perform.\n",
    "\n",
    "### Data Science\n",
    "\n",
    "-   Industrial Revolution 4.0?\n",
    "-   *Industrial Revolution* (1760-1840) term coined by Arnold Toynbee,\n",
    "    late 19th century.\n",
    "-   Maybe: But this one is dominated by *data* not *capital*\n",
    "-   That presents *challenges* and *opportunities*\n",
    "\n",
    "compare [digital\n",
    "oligarchy](https://www.theguardian.com/media-network/2015/mar/05/digital-oligarchy-algorithms-personal-data)\n",
    "vs [how Africa can benefit from the data\n",
    "revolution](https://www.theguardian.com/media-network/2015/aug/25/africa-benefit-data-science-information)\n",
    "\n",
    "-   Apple vs Nokia: How you handle disruption.\n",
    "\n",
    "Disruptive technologies take time to assimilate, and best practices, as\n",
    "well as the pitfalls of new technologies take time to share.\n",
    "Historically, new technologies led to new professions. [Isambard Kingdom\n",
    "Brunel](https://en.wikipedia.org/wiki/Isambard_Kingdom_Brunel) (born\n",
    "1806) was a leading innovator in civil, mechanical and naval\n",
    "engineering. Each of these has its own professional institutions founded\n",
    "in 1818, 1847, and 1860 respectively.\n",
    "\n",
    "[Nikola Tesla](https://en.wikipedia.org/wiki/Nikola_Tesla) developed the\n",
    "modern approach to electrical distribution, he was born in 1856 and the\n",
    "American Instiute for Electrical Engineers was founded in 1884, the UK\n",
    "equivalent was founded in 1871.\n",
    "\n",
    "[William Schockley Jr](https://en.wikipedia.org/wiki/William_Shockley),\n",
    "born 1910, led the group that developed the transistor, referred to as\n",
    "“the man who brought silicon to Silicon Valley”, in 1963 the American\n",
    "Institute for Electical Engineers merged with the Institute of Radio\n",
    "Engineers to form the Institute of Electrical and Electronic Engineers.\n",
    "\n",
    "[Watts S. Humphrey](https://en.wikipedia.org/wiki/Watts_Humphrey), born\n",
    "1927, was known as the “father of software quality”, in the 1980s he\n",
    "founded a program aimed at understanding and managing the software\n",
    "process. The British Computer Society was founded in 1956.\n",
    "\n",
    "Why the need for these professions? Much of it is about codification of\n",
    "best practice and developing trust between the public and practitioners.\n",
    "These fundamental characteristics of the professions are shared with the\n",
    "oldest professions (Medicine, Law) as well as the newest (Information\n",
    "Technology).\n",
    "\n",
    "So where are we today? My best guess is we are somewhere equivalent to\n",
    "the 1980s for Software Engineering. In terms of professional deployment\n",
    "we have a basic understanding of the equivalent of “programming” but\n",
    "much less understanding of *machine learning systems design* and *data\n",
    "infrastructure*. How the components we ahve developed interoperate\n",
    "together in a reliable and accountable manner. Best practice is still\n",
    "evolving, but perhaps isn’t being shared widely enough.\n",
    "\n",
    "One problem is that the art of data science is superficially similar to\n",
    "regular software engineering. Although in practice it is rather\n",
    "different. Modern software engineering practice operates to generate\n",
    "code which is well tested as it is written, agile programming techniques\n",
    "provide the appropriate degree of flexibility for the individual\n",
    "programmers alongside sufficient formalization and testing. These\n",
    "techniques have evolved from an overly restrictive formalization that\n",
    "was proposed in the early days of software engineering.\n",
    "\n",
    "While data science involves programming, it is different in the\n",
    "following way. Most of the work in data science involves understanding\n",
    "the data and the appropriate manipulations to apply to extract knowledge\n",
    "from the data. The eventual number of lines of code that are required to\n",
    "extract that knowledge are often very few, but the amount of thought and\n",
    "attention that needs to be applied to each line is much more than a\n",
    "traditional line of software code. Testing of those lines is also of a\n",
    "different nature, provisions have to be made for evolving data\n",
    "environments. Any development work is often done on a static snapshot of\n",
    "data, but deployment is made in a live environment where the nature of\n",
    "data changes. Quality control involves checking for degradation in\n",
    "performance arising form unanticipated changes in data quality. It may\n",
    "also need to check for regulatory conformity. For example, in the UK the\n",
    "General Data Protection Regulation stipulates standards of\n",
    "explainability and fairness that may need to be monitored. These\n",
    "concerns do not affect traditional software deployments.\n",
    "\n",
    "Others are also pointing out these challenges, [this\n",
    "post](https://medium.com/@karpathy/software-2-0-a64152b37c35) from\n",
    "Andrej Karpathy (now head of AI at Tesla) covers the notion of “Software\n",
    "2.0”. Google researchers have highlighted the challenges of “Technical\n",
    "Debt” in machine learning [@Sculley:debt15]. Researchers at Berkeley\n",
    "have characterized the systems challenges associated with machine\n",
    "learning [@Stoica:systemsml17].\n",
    "\n",
    "Data science is not only about technical expertise and analysis of data,\n",
    "we need to also generate a culture of decision making that acknowledges\n",
    "the true challenges in data-driven automated decision making. In\n",
    "particular, a focus on algorithms has neglected the importance of data\n",
    "in driving decisions. The quality of data is paramount in that poor\n",
    "quality data will inevitably lead to poor quality decisions. Anecdotally\n",
    "most data scientists will suggest that 80% of their time is spent on\n",
    "data clean up, and only 20% on actually modelling.\n",
    "\n",
    "### The Software Crisis\n",
    "\n",
    "> The major cause of the software crisis is that the machines have\n",
    "> become several orders of magnitude more powerful! To put it quite\n",
    "> bluntly: as long as there were no machines, programming was no problem\n",
    "> at all; when we had a few weak computers, programming became a mild\n",
    "> problem, and now we have gigantic computers, programming has become an\n",
    "> equally gigantic problem.\n",
    ">\n",
    "> Edsger Dijkstra (1930-2002), The Humble Programmer\n",
    "\n",
    "In the late sixties early software programmers made note of the\n",
    "increasing costs of software development and termed the challenges\n",
    "associated with it as the “[Software\n",
    "Crisis](https://en.wikipedia.org/wiki/Software_crisis)”. Edsger Dijkstra\n",
    "referred to the crisis in his 1972 Turing Award winner’s address.\n",
    "\n",
    "### The Data Crisis\n",
    "\n",
    "> The major cause of the data crisis is that machines have become more\n",
    "> interconnected than ever before. Data access is therefore cheap, but\n",
    "> data quality is often poor. What we need is cheap high quality data.\n",
    "> That implies that we develop processes for improving and verifying\n",
    "> data quality that are efficient.\n",
    ">\n",
    "> There would seem to be two ways for improving efficiency. Firstly, we\n",
    "> should not duplicate work. Secondly, where possible we should automate\n",
    "> work.\n",
    "\n",
    "What I term “The Data Crisis” is the modern equivalent of this problem.\n",
    "The quantity of modern data, and the lack of attention paid to data as\n",
    "it is initially “laid down” and the costs of data cleaning are bringing\n",
    "about a crisis in data-driven decision making. Just as with software,\n",
    "the crisis is most correctly addressed by ‘scaling’ the manner in which\n",
    "we process our data. Duplication of work occurs because the value of\n",
    "data cleaning is not correctly recognised in management decision making\n",
    "processes. Automation of work is increasingly possible through\n",
    "techniques in “artificial intelligence”, but this will also require\n",
    "better management of the data science pipeline so that data about data\n",
    "science (meta-data science) can be correctly assimilated and processed.\n",
    "The Alan Turing institute has a program focussed on this area, [AI for\n",
    "Data\n",
    "Analytics](https://www.turing.ac.uk/research_projects/artificial-intelligence-data-analytics/).\n",
    "\n",
    "<img class=\"\" src=\"../slides/diagrams/Medievalplowingwoodcut.jpg\" width=\"\" align=\"\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "\n",
    "Our current information infrastructure bears a close relation with\n",
    "*feudal systems* of government. In the feudal system a lord had a duty\n",
    "of care over his serfs and vassals, a duty to protect subjects. But in\n",
    "practice there was a power-asymetry. In feudal days protection was\n",
    "against Viking raiders, today, it is against information raiders.\n",
    "However, when there is an information leak, when there is a failure it\n",
    "is too late. Alternatively, our data is publicly shared, in an\n",
    "information commons. Akin to common land of the medieval village. But\n",
    "just as commons were subject to overgrazing and poor management, so it\n",
    "is that much of our data cannot be managed in this way. In particularly\n",
    "personal, sensitive data.\n",
    "\n",
    "I explored this idea further in [this Guardian Op-Ed from\n",
    "2015](https://www.theguardian.com/media-network/2015/nov/16/information-barons-threaten-autonomy-privacy-online).\n",
    "\n",
    "### Rest of the Talk\n",
    "\n",
    "-   Importance of data infrastructure\n",
    "\n",
    "<!--include{_data-science/includes/data-infrastructure.md}-->\n",
    "<!--include{_data-science/includes/data-readiness-levels.md}-->\n",
    "<!--include{_data-science/includes/data-science-as-debugging.md}-->\n",
    "Public Use of Data for Public Good\n",
    "----------------------------------\n",
    "\n",
    "Since machine learning methods are so dependent on data, Understanding\n",
    "public attitudes to the use of their data is key to developing machine\n",
    "learning methods that maintain the trust of the public. Nowhere are the\n",
    "benefits of machine learning more profound, and the potential pitfalls\n",
    "more catastrophic than in the use of machine learning in health data.\n",
    "\n",
    "The promise is for methods that take a personalized perspective on our\n",
    "individual health, but health data is some of the most sensitive data\n",
    "available to us. This is recognised both by the public and by\n",
    "regulation.\n",
    "\n",
    "With this in mind The Wellcome Trust launched a report on\n",
    "[“Understanding Patient\n",
    "Data”](https://wellcome.ac.uk/news/understanding-patient-data-launches-today)\n",
    "authored by Nicola Perrin, driven by the National Data Guardian’s\n",
    "recommendations.\n",
    "\n",
    "From this report we know that patients trust Universities and hospitals\n",
    "more than the trust commercial entities and insurers. However, there are\n",
    "a number of different ways in which data can be mishandled, it is not\n",
    "only the intent of the data-controllers that effects our data security.\n",
    "\n",
    "For example, the recent WannaCry virus attack which demonstrated the\n",
    "unpreparedness of much of the NHS IT infrastructure for a virus\n",
    "exhibiting an exploit that was well known to the security community. The\n",
    "key point is that the public trust the *intent* of academics and medical\n",
    "professionals, but actual *capability* could be at variance with the\n",
    "intent.\n",
    "\n",
    "<img class=\"\" src=\"../slides/diagrams/health/bush-pilot-grant-mcconachie.jpg\" width=\"60%\" align=\"\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "\n",
    "<center>\n",
    "*Bush Pilot Grant McConachie*\n",
    "</center>\n",
    "The situation is somewhat reminiscient of early aviation. This is where\n",
    "we are with our data science capabilities. By analogy, the engine of the\n",
    "plane is our data security infrastructure, the basic required technology\n",
    "to make us safe. The pilot is the health professional performing data\n",
    "analytics. The nature of the job of early pilots and indeed today’s\n",
    "*bush pilots* (who fly to remote places) included a need to understand\n",
    "the mechanics of the engine. Just as a health data scientist, today,\n",
    "needs to deal with security of the infrastructure as well as the nature\n",
    "of the analysis.\n",
    "\n",
    "<img class=\"\" src=\"../slides/diagrams/health/British_Airways_at_SFO.jpg\" width=\"50%\" align=\"\" style=\"background:none; border:none; box-shadow:none;\">\n",
    "<center>\n",
    "*British Airways 747 at SFO*\n",
    "</center>\n",
    "I suspect most passengers would find it disconcerting if the pilot of a\n",
    "747 was seen working on the engine shortly before a flight. As aviation\n",
    "has become more widespread, there is now a separation of\n",
    "responsibilities between pilots and mechanics. Indeed, Rolls Royce\n",
    "maintain ownership of their engines today, and merely lease them to the\n",
    "aircraft company. The responsibility for maintenance of the engine is\n",
    "entirely with Rolls Royce, yet the pilot is responsibility for the\n",
    "safety of the aircraft and its passengers.\n",
    "\n",
    "We need to develop a modern data-infrastructure for which separates the\n",
    "need for security of infrastructure from the decision making of the data\n",
    "analyst.\n",
    "\n",
    "This separation of responsibility according to expertise needs to be\n",
    "emulated when considering health data infrastructure. This resolves the\n",
    "*intent-capability* dilemma, by ensuring a separation of\n",
    "responsibilities to those that are best placed to address the issues.\n",
    "\n",
    "### Propagation of Best Practice\n",
    "\n",
    "We must also be careful to maintain openness in this new genaration of\n",
    "digital solutions for patient care. Matthew Syed’s book, “Black Box\n",
    "Thinking” [@Syed:blackbox15], emphasizes the importance of surfacing\n",
    "errors as a route to learning and improved process. Taking aviation as\n",
    "an example, and contrasting it with the culture in medicine, Matthew\n",
    "relates the story of [Martin\n",
    "Bromiley](https://chfg.org/trustees/martin-bromiley/), an airline pilot\n",
    "whose wife died during a routine hospital procedure and his efforts to\n",
    "improve the culture of safety in medicine. The motivation for the book\n",
    "is the difference in culture between aviation and medicine in how errors\n",
    "are acknowledged and dealt with. We must ensure that these high\n",
    "standards of oversight apply to the era of data-driven automated\n",
    "decision making.\n",
    "\n",
    "In particular, while there is much to be gained by involving comemrcial\n",
    "companies, if the process by which they are drawing inference about\n",
    "patient condition is hidden (for example, due to commercial\n",
    "confidentiality), this may prevent us from understanding errors in\n",
    "diagnosis or treatment. This would be a retrograde step. It may be that\n",
    "health device certification needs modification or reform for data-driven\n",
    "automated decision making, but we need a spirit of transparency around\n",
    "how these systems are deriving their inferences to ensure best practice.\n",
    "\n",
    "<!--include{_data-science/includes/gdpr.md}-->\n",
    "<!--include{_ai/includes/government-reports.md}-->\n",
    "Data Trusts\n",
    "-----------\n",
    "\n",
    "The machine learning solutions we are dependent on to drive automated\n",
    "decision making are dependent on data. But with regard to personal data\n",
    "there are important issues of privacy. Data sharing brings benefits, but\n",
    "also exposes our digital selves. From the use of social media data for\n",
    "targeted advertising to influence us, to the use of genetic data to\n",
    "identify criminals, or natural family members. Control of our virtual\n",
    "selves maps on to control of our actual selves.\n",
    "\n",
    "The fuedal system that is implied by current data protection legislation\n",
    "has signficant power asymmetries at its heart, in that the data\n",
    "controller has a duty of care over the data subject, but the data\n",
    "subject may only discover failings in that duty of care when it’s too\n",
    "late. Data controllers also may have conflicting motivations, and often\n",
    "their primary motivation is *not* towards the data-subject, but that is\n",
    "a consideration in their wider agenda.\n",
    "\n",
    "I proposed [Data\n",
    "Trusts](https://www.theguardian.com/media-network/2016/jun/03/data-trusts-privacy-fears-feudalism-democracy)\n",
    "as a solution to this problem. Inspired by *land societies* that formed\n",
    "in the 19th century to bring democratic representation to the growing\n",
    "middle classes. A land society was a mutual organisation where resources\n",
    "were pooled for the common good.\n",
    "\n",
    "A Data Trust would be a legal entity where the trustees responsibility\n",
    "was entirely to the members of the trust. So the motivation of the\n",
    "data-controllers is aligned only with the data-subjects. How data is\n",
    "handled would be subject to the terms under which the trust was\n",
    "convened. The success of an individual trust would be contingent on it\n",
    "satisfying its members with appropriate balancing of individual privacy\n",
    "with the benefits of data sharing.\n",
    "\n",
    "Formation of Data Trusts became the number one recommendation of the\n",
    "Hall-Presenti report on AI, but the manner in which this is done will\n",
    "have a significant impact on their utility. It feels important to have a\n",
    "diversity of approaches, and yet it feels important that any individual\n",
    "trust would be large enough to be taken seriously in representing the\n",
    "views of its members in wider negotiations.\n",
    "\n",
    "<!--include{_ml/includes/resolve-deploy-innovate.md}-->\n",
    "<!--include{_ai/includes/ml-systems-design-long.md}-->\n",
    "### Thanks!\n",
    "\n",
    "-   twitter: @lawrennd\n",
    "-   blog:\n",
    "    [http://inverseprobability.com](http://inverseprobability.com/blog.html)\n",
    "-   [Mike Jordan’s Medium\n",
    "    Post](https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7)\n",
    "\n",
    "[^1]: the challenge of understanding what it pertains to is known as\n",
    "    knowledge representation)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}