{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data First Culture\n", "\n", "### [Neil D. Lawrence](http://inverseprobability.com), University of\n", "\n", "Cambridge\n", "\n", "### 2022-07-07" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Abstract**: Digital transformation has offered the promise of moving\n", "from a manual decision-making world to a world where decisions can be\n", "rational, data-driven and automated. The first step to digital\n", "transformation is mapping the world of atoms (material, customers,\n", "logistic networks) into the world of bits. But the real challenges may\n", "start once this is complete. In this talk we introduce the notion of\n", "‘post digital transformation’: the challenges of doing business in a\n", "digital world." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "::: {.cell .markdown}\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Henry Ford’s Faster Horse\n", "\n", "\\[edit\\]\n", "\n", "\n", "\n", "Figure: A 1925 Ford Model T built at Henry Ford’s Highland Park Plant\n", "in Dearborn, Michigan. This example now resides in Australia, owned by\n", "the founder of FordModelT.net. From\n", "\n", "\n", "It’s said that Henry Ford’s customers wanted a “a faster horse.” If\n", "Henry Ford was selling us artificial intelligence today, what would the\n", "customer call for, “a smarter human?” That’s certainly the picture of\n", "machine intelligence we find in science fiction narratives, but the\n", "reality of what we’ve developed is much more mundane.\n", "\n", "Car engines produce prodigious power from petrol. Machine intelligences\n", "deliver decisions derived from data. In both cases the scale of\n", "consumption enables a speed of operation that is far beyond the\n", "capabilities of their natural counterparts. Unfettered energy\n", "consumption has consequences in the form of climate change. Does\n", "unbridled data consumption also have consequences for us?\n", "\n", "If we devolve decision making to machines, we depend on those machines\n", "to accommodate our needs. If we don’t understand how those machines\n", "operate, we lose control over our destiny. Our mistake has been to see\n", "machine intelligence as a reflection of our intelligence. We cannot\n", "understand the smarter human without understanding the human. To\n", "understand the machine, we need to better understand ourselves." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.rcParams.update({'font.size': 22})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## notutils\n", "\n", "\\[edit\\]\n", "\n", "This small package is a helper package for various notebook utilities\n", "used\n", "\n", "The software can be installed using" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install notutils" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "from the command prompt where you can access your python installation.\n", "\n", "The code is also available on GitHub:\n", "\n", "\n", "Once `notutils` is installed, it can be imported in the usual manner." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import notutils" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## pods\n", "\n", "\\[edit\\]\n", "\n", "In Sheffield we created a suite of software tools for ‘Open Data\n", "Science.’ Open data science is an approach to sharing code, models and\n", "data that should make it easier for companies, health professionals and\n", "scientists to gain access to data science techniques.\n", "\n", "You can also check this blog post on [Open Data\n", "Science](http://inverseprobability.com/2014/07/01/open-data-science).\n", "\n", "The software can be installed using" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install pods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "from the command prompt where you can access your python installation.\n", "\n", "The code is also available on GitHub: \n", "\n", "Once `pods` is installed, it can be imported in the usual manner." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## mlai\n", "\n", "\\[edit\\]\n", "\n", "The `mlai` software is a suite of helper functions for teaching and\n", "demonstrating machine learning algorithms. It was first used in the\n", "Machine Learning and Adaptive Intelligence course in Sheffield in 2013.\n", "\n", "The software can be installed using" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install mlai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "from the command prompt where you can access your python installation.\n", "\n", "The code is also available on GitHub: \n", "\n", "Once `mlai` is installed, it can be imported in the usual manner." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import mlai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Gartner Hype Cycle\n", "\n", "\\[edit\\]\n", "\n", "\n", "\n", "Figure: The Gartner Hype Cycle places technologies on a graph that\n", "relates to the expectations we have of a technology against its actual\n", "influence. Early hope for a new techology is often displaced by\n", "disillusionment due to the time it takes for a technology to be usefully\n", "deployed.\n", "\n", "The [Gartner Hype Cycle](https://en.wikipedia.org/wiki/Hype_cycle) tries\n", "to assess where an idea is in terms of maturity and adoption. It splits\n", "the evolution of technology into a technological trigger, a peak of\n", "expectations followed by a trough of disillusionment and a final\n", "ascension into a useful technology. It looks rather like a classical\n", "control response to a final set point." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cycle for ML Terms\n", "\n", "\\[edit\\]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Google Trends\n", "\n", "\\[edit\\]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install pytrends" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import mlai.plot as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot.google_trends(terms=['artificial intelligence', 'big data', 'data mining', 'deep learning', 'machine learning'], \n", " initials='ai-bd-dm-dl-ml', \n", " diagrams='./data-science')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import notutils as nu\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nu.display_plots('ai-bd-dm-dl-ml-google-trends{sample:0>3}.svg', \n", " './data-science/', sample=IntSlider(0, 0, 4, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Figure: Google trends for ‘artificial intelligence,’ ‘big data,’\n", "‘data mining,’ ‘deep learning,’ ‘machine learning’ as different\n", "technological terms gives us insight into their popularity over\n", "time.\n", "\n", "Google trends gives us insight into the interest for different terms\n", "over time.\n", "\n", "Examining Google treds for ‘artificial intelligence,’ ‘big data,’ ‘data\n", "mining,’ ‘deep learning’ and ‘machine learning’ we can see that\n", "‘artificial intelligence’ *may* be entering a plateau of productivity,\n", "‘big data’ is entering the trough of disillusionment, and ‘data mining’\n", "seems to be deeply within the trough. On the other hand, ‘deep learning’\n", "and ‘machine learning’ appear to be ascending to the peak of inflated\n", "expectations having experienced a technology trigger.\n", "\n", "For deep learning that technology trigger was the ImageNet result of\n", "2012 (Krizhevsky et al., n.d.). This step change in performance on\n", "object detection in images was achieved through convolutional neural\n", "networks, popularly known as ‘deep learning.’" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# What is Machine Learning?\n", "\n", "\\[edit\\]\n", "\n", "What is machine learning? At its most basic level machine learning is a\n", "combination of\n", "\n", "$$\\text{data} + \\text{model} \\stackrel{\\text{compute}}{\\rightarrow} \\text{prediction}$$\n", "\n", "where *data* is our observations. They can be actively or passively\n", "acquired (meta-data). The *model* contains our assumptions, based on\n", "previous experience. That experience can be other data, it can come from\n", "transfer learning, or it can merely be our beliefs about the\n", "regularities of the universe. In humans our models include our inductive\n", "biases. The *prediction* is an action to be taken or a categorization or\n", "a quality score. The reason that machine learning has become a mainstay\n", "of artificial intelligence is the importance of predictions in\n", "artificial intelligence. The data and the model are combined through\n", "computation.\n", "\n", "In practice we normally perform machine learning using two functions. To\n", "combine data with a model we typically make use of:\n", "\n", "**a prediction function** a function which is used to make the\n", "predictions. It includes our beliefs about the regularities of the\n", "universe, our assumptions about how the world works, e.g., smoothness,\n", "spatial similarities, temporal similarities.\n", "\n", "**an objective function** a function which defines the cost of\n", "misprediction. Typically, it includes knowledge about the world’s\n", "generating processes (probabilistic objectives) or the costs we pay for\n", "mispredictions (empirical risk minimization).\n", "\n", "The combination of data and model through the prediction function and\n", "the objective function leads to a *learning algorithm*. The class of\n", "prediction functions and objective functions we can make use of is\n", "restricted by the algorithms they lead to. If the prediction function or\n", "the objective function are too complex, then it can be difficult to find\n", "an appropriate learning algorithm. Much of the academic field of machine\n", "learning is the quest for new learning algorithms that allow us to bring\n", "different types of models and data together.\n", "\n", "A useful reference for state of the art in machine learning is the UK\n", "Royal Society Report, [Machine Learning: Power and Promise of Computers\n", "that Learn by\n", "Example](https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf).\n", "\n", "You can also check my post blog post on [What is Machine\n", "Learning?](http://inverseprobability.com/2017/07/17/what-is-machine-learning)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Artificial Intelligence and Data Science\n", "\n", "\\[edit\\]\n", "\n", "Artificial intelligence has the objective of endowing computers with\n", "human-like intelligent capabilities. For example, understanding an image\n", "(computer vision) or the contents of some speech (speech recognition),\n", "the meaning of a sentence (natural language processing) or the\n", "translation of a sentence (machine translation)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Supervised Learning for AI\n", "\n", "The machine learning approach to artificial intelligence is to collect\n", "and annotate a large data set from humans. The problem is characterized\n", "by input data (e.g. a particular image) and a label (e.g. is there a car\n", "in the image yes/no). The machine learning algorithm fits a mathematical\n", "function (I call this the *prediction function*) to map from the input\n", "image to the label. The parameters of the prediction function are set by\n", "minimizing an error between the function’s predictions and the true\n", "data. This mathematical function that encapsulates this error is known\n", "as the *objective function*.\n", "\n", "This approach to machine learning is known as *supervised learning*.\n", "Various approaches to supervised learning use different prediction\n", "functions, objective functions or different optimization algorithms to\n", "fit them.\n", "\n", "For example, *deep learning* makes use of *neural networks* to form the\n", "predictions. A neural network is a particular type of mathematical\n", "function that allows the algorithm designer to introduce invariances\n", "into the function.\n", "\n", "An invariance is an important way of including prior understanding in a\n", "machine learning model. For example, in an image, a car is still a car\n", "regardless of whether it’s in the upper left or lower right corner of\n", "the image. This is known as translation invariance. A neural network\n", "encodes translation invariance in *convolutional layers*. Convolutional\n", "neural networks are widely used in image recognition tasks.\n", "\n", "An alternative structure is known as a recurrent neural network (RNN).\n", "RNNs neural networks encode temporal structure. They use auto regressive\n", "connections in their hidden layers, they can be seen as time series\n", "models which have non-linear auto-regressive basis functions. They are\n", "widely used in speech recognition and machine translation.\n", "\n", "Machine learning has been deployed in Speech Recognition (e.g. Alexa,\n", "deep neural networks, convolutional neural networks for speech\n", "recognition), in computer vision (e.g. Amazon Go, convolutional neural\n", "networks for person recognition and pose detection).\n", "\n", "The field of data science is related to AI, but philosophically\n", "different. It arises because we are increasingly creating large amounts\n", "of data through *happenstance* rather than active collection. In the\n", "modern era data is laid down by almost all our activities. The objective\n", "of data science is to extract insights from this data.\n", "\n", "Classically, in the field of statistics, data analysis proceeds by\n", "assuming that the question (or scientific hypothesis) comes before the\n", "data is created. E.g., if I want to determine the effectiveness of a\n", "particular drug, I perform a *design* for my data collection. I use\n", "foundational approaches such as randomization to account for\n", "confounders. This made a lot of sense in an era where data had to be\n", "actively collected. The reduction in cost of data collection and storage\n", "now means that many data sets are available which weren’t collected with\n", "a particular question in mind. This is a challenge because bias in the\n", "way data was acquired can corrupt the insights we derive. We can perform\n", "randomized control trials (or A/B tests) to verify our conclusions, but\n", "the opportunity is to use data science techniques to better guide our\n", "question selection or even answer a question without the expense of a\n", "full randomized control trial (referred to as A/B testing in modern\n", "internet parlance)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Intellectual Debt\n", "\n", "\n", "\n", "Figure: Jonathan Zittrain’s term to describe the challenges of\n", "explanation that come with AI is Intellectual Debt.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Information and Embodiment\n", "\n", "\\[edit\\]\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "*Claude Shannon*\n", "\n", "
\n", "\n", "Figure: Claude Shannon (1916-2001)\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "bits/min\n", "\n", "\n", "\n", "billions\n", "\n", "\n", "\n", "2,000\n", "\n", "
\n", "\n", "billion
calculations/s\n", "\n", "
\n", "\n", "\\~100\n", "\n", "\n", "\n", "a billion\n", "\n", "
\n", "\n", "embodiment\n", "\n", "\n", "\n", "20 minutes\n", "\n", "\n", "\n", "5 billion years\n", "\n", "
\n", "\n", "Figure: Embodiment factors are the ratio between our ability to\n", "compute and our ability to communicate. Relative to the machine we are\n", "also locked in. In the table we represent embodiment as the length of\n", "time it would take to communicate one second’s worth of computation. For\n", "computers it is a matter of minutes, but for a human, it is a matter of\n", "thousands of millions of years." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evolved Relationship with Information\n", "\n", "\\[edit\\]\n", "\n", "The high bandwidth of computers has resulted in a close relationship\n", "between the computer and data. Large amounts of information can flow\n", "between the two. The degree to which the computer is mediating our\n", "relationship with data means that we should consider it an intermediary.\n", "\n", "Originaly our low bandwith relationship with data was affected by two\n", "characteristics. Firstly, our tendency to over-interpret driven by our\n", "need to extract as much knowledge from our low bandwidth information\n", "channel as possible. Secondly, by our improved understanding of the\n", "domain of *mathematical* statistics and how our cognitive biases can\n", "mislead us.\n", "\n", "With this new set up there is a potential for assimilating far more\n", "information via the computer, but the computer can present this to us in\n", "various ways. If it’s motives are not aligned with ours then it can\n", "misrepresent the information. This needn’t be nefarious it can be simply\n", "as a result of the computer pursuing a different objective from us. For\n", "example, if the computer is aiming to maximize our interaction time that\n", "may be a different objective from ours which may be to summarize\n", "information in a representative manner in the *shortest* possible length\n", "of time.\n", "\n", "For example, for me, it was a common experience to pick up my telephone\n", "with the intention of checking when my next appointment was, but to soon\n", "find myself distracted by another application on the phone, and end up\n", "reading something on the internet. By the time I’d finished reading, I\n", "would often have forgotten the reason I picked up my phone in the first\n", "place.\n", "\n", "There are great benefits to be had from the huge amount of information\n", "we can unlock from this evolved relationship between us and data. In\n", "biology, large scale data sharing has been driven by a revolution in\n", "genomic, transcriptomic and epigenomic measurement. The improved\n", "inferences that can be drawn through summarizing data by computer have\n", "fundamentally changed the nature of biological science, now this\n", "phenomenon is also infuencing us in our daily lives as data measured by\n", "*happenstance* is increasingly used to characterize us.\n", "\n", "Better mediation of this flow actually requires a better understanding\n", "of human-computer interaction. This in turn involves understanding our\n", "own intelligence better, what its cognitive biases are and how these\n", "might mislead us.\n", "\n", "For further thoughts see Guardian article on [marketing in the internet\n", "era](https://www.theguardian.com/media-network/2015/jul/23/data-driven-economy-marketing)\n", "from 2015.\n", "\n", "You can also check my blog post on [System\n", "Zero](http://inverseprobability.com/2015/12/04/what-kind-of-ai). also\n", "from 2015." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New Flow of Information\n", "\n", "\\[edit\\]\n", "\n", "Classically the field of statistics focussed on mediating the\n", "relationship between the machine and the human. Our limited bandwidth of\n", "communication means we tend to over-interpret the limited information\n", "that we are given, in the extreme we assign motives and desires to\n", "inanimate objects (a process known as anthropomorphizing). Much of\n", "mathematical statistics was developed to help temper this tendency and\n", "understand when we are valid in drawing conclusions from data.\n", "\n", "\n", "\n", "Figure: The trinity of human, data and computer, and highlights the\n", "modern phenomenon. The communication channel between computer and data\n", "now has an extremely high bandwidth. The channel between human and\n", "computer and the channel between data and human is narrow. New direction\n", "of information flow, information is reaching us mediated by the\n", "computer. The focus on classical statistics reflected the importance of\n", "the direct communication between human and data. The modern challenges\n", "of data science emerge when that relationship is being mediated by the\n", "machine.\n", "\n", "Data science brings new challenges. In particular, there is a very large\n", "bandwidth connection between the machine and data. This means that our\n", "relationship with data is now commonly being mediated by the machine.\n", "Whether this is in the acquisition of new data, which now happens by\n", "happenstance rather than with purpose, or the interpretation of that\n", "data where we are increasingly relying on machines to summarise what the\n", "data contains. This is leading to the emerging field of data science,\n", "which must not only deal with the same challenges that mathematical\n", "statistics faced in tempering our tendency to over interpret data, but\n", "must also deal with the possibility that the machine has either\n", "inadvertently or malisciously misrepresented the underlying data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bandwidth Constrained Conversations\n", "\n", "\\[edit\\]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import notutils as nu\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import notutils as nu" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nu.display_plots('anne-bob-conversation{sample:0>3}.svg', \n", " 'https://inverseprobability.com/talks/./slides/diagrams/', sample=IntSlider(0, 0, 7, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Figure: Conversation relies on internal models of other\n", "individuals.\n", "\n", "\n", "\n", "Figure: Misunderstanding of context and who we are talking to leads\n", "to arguments.\n", "\n", "Embodiment factors imply that, in our communication between humans, what\n", "is *not* said is, perhaps, more important than what is said. To\n", "communicate with each other we need to have a model of who each of us\n", "are.\n", "\n", "To aid this, in society, we are required to perform roles. Whether as a\n", "parent, a teacher, an employee or a boss. Each of these roles requires\n", "that we conform to certain standards of behaviour to facilitate\n", "communication between ourselves.\n", "\n", "Control of self is vitally important to these communications.\n", "\n", "The high availability of data available to humans undermines\n", "human-to-human communication channels by providing new routes to\n", "undermining our control of self.\n", "\n", "The consequences between this mismatch of power and delivery are to be\n", "seen all around us. Because, just as driving an F1 car with bicycle\n", "wheels would be a fine art, so is the process of communication between\n", "humans.\n", "\n", "If I have a thought and I wish to communicate it, I first of all need to\n", "have a model of what you think. I should think before I speak. When I\n", "speak, you may react. You have a model of who I am and what I was trying\n", "to say, and why I chose to say what I said. Now we begin this dance,\n", "where we are each trying to better understand each other and what we are\n", "saying. When it works, it is beautiful, but when misdeployed, just like\n", "a badly driven F1 car, there is a horrible crash, an argument.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lies and Damned Lies\n", "\n", "\\[edit\\]\n", "\n", "> There are three types of lies: lies, damned lies and statistics\n", ">\n", "> Benjamin Disraeli 1804-1881\n", "\n", "Benjamin Disraeli said[1] that there three types of lies: lies, damned\n", "lies and statistics. Disraeli died in 1881, 30 years before the first\n", "academic department of applied statistics was founded at UCL. If\n", "Disraeli were alive today, it is likely that he’d rephrase his quote:\n", "\n", "> There are three types of lies, lies damned lies and *big data*.\n", "\n", "Why? Because the challenges of understanding and interpreting big data\n", "today are similar to those that Disraeli faced in governing an empire\n", "through statistics in the latter part of the 19th century.\n", "\n", "The quote lies, damned lies and statistics was credited to Benjamin\n", "Disraeli by Mark Twain in his autobiography. It characterizes the idea\n", "that statistic can be made to prove anything. But Disraeli died in 1881\n", "and Mark Twain died in 1910. The important breakthrough in overcoming\n", "our tendency to overinterpet data came with the formalization of the\n", "field through the development of *mathematical statistics*.\n", "\n", "Data has an elusive quality, it promises so much but can deliver little,\n", "it can mislead and misrepresent. To harness it, it must be tamed. In\n", "Disraeli’s time during the second half of the 19th century, numbers and\n", "data were being accumulated, the social sciences were being developed.\n", "There was a large scale collection of data for the purposes of\n", "government.\n", "\n", "The modern ‘big data era’ is on the verge of delivering the same sense\n", "of frustration that Disraeli experienced, the early promise of big data\n", "as a panacea is evolving to demands for delivery. For me, personally,\n", "peak-hype coincided with an email I received inviting collaboration on a\n", "project to deploy “*Big Data* and *Internet of Things* in an *Industry\n", "4.0* environment.” Further questioning revealed that the actual project\n", "was optimization of the efficiency of a manufacturing production line, a\n", "far more tangible and *realizable* goal.\n", "\n", "The antidote to this verbage is found in increasing awareness. When\n", "dealing with data the first trap to avoid is the games of buzzword bingo\n", "that we are wont to play. The first goal is to quantify what challenges\n", "can be addressed and what techniques are required. Behind the hype\n", "fundamentals are changing. The phenomenon is about the increasing access\n", "we have to data. The manner in which customers information is recorded\n", "and processes are codified and digitized with little overhead. Internet\n", "of things is about the increasing number of cheap sensors that can be\n", "easily interconnected through our modern network structures. But\n", "businesses are about making money, and these phenomena need to be recast\n", "in those terms before their value can be realized.\n", "\n", "[1] [Disraeli is attributed this quote by Mark\n", "Twain](https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## *Mathematical* Statistics\n", "\n", "[Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson) (1857-1936),\n", "[Ronald Fisher](https://en.wikipedia.org/wiki/Ronald_Fisher) (1890-1962)\n", "and others considered the question of what conclusions can truly be\n", "drawn from data. Their mathematical studies act as a restraint on our\n", "tendency to over-interpret and see patterns where there are none. They\n", "introduced concepts such as randomized control trials that form a\n", "mainstay of the our decision making today, from government, to\n", "clinicians to large scale A/B testing that determines the nature of the\n", "web interfaces we interact with on social media and shopping.\n", "\n", "\n", "\n", "Figure: Karl Pearson (1857-1936), one of the founders of Mathematical\n", "Statistics.\n", "\n", "Their movement did the most to put statistics to rights, to eradicate\n", "the ‘damned lies.’ It was known as [‘mathematical\n", "statistics’](https://en.wikipedia.org/wiki/Mathematical_statistics).\n", "Today I believe we should look to the emerging field of *data science*\n", "to provide the same role. Data science is an amalgam of statistics, data\n", "mining, computer systems, databases, computation, machine learning and\n", "artificial intelligence. Spread across these fields are the tools we\n", "need to realize data’s potential. For many businesses this might be\n", "thought of as the challenge of ‘converting bits into atoms.’ Bits: the\n", "data stored on computer, atoms: the physical manifestation of what we\n", "do; the transfer of goods, the delivery of service. From fungible to\n", "tangible. When solving a challenge through data there are a series of\n", "obstacles that need to be addressed.\n", "\n", "Firstly, data awareness: what data you have and where its stored.\n", "Sometimes this includes changing your conception of what data is and how\n", "it can be obtained. From automated production lines to apps on employee\n", "smart phones. Often data is locked away: manual log books, confidential\n", "data, personal data. For increasing awareness an internal audit can\n", "help. The website [data.gov.uk](https://data.gov.uk/) hosts data made\n", "available by the UK government. To create this website the government’s\n", "departments went through an audit of what data they each hold and what\n", "data they could make available. Similarly, within private buisnesses\n", "this type of audit could be useful for understanding their internal\n", "digital landscape: after all the key to any successful campaign is a\n", "good map.\n", "\n", "Secondly, availability. How well are the data sources interconnected?\n", "How well curated are they? The curse of Disraeli was associated with\n", "unreliable data and *unreliable statistics*. The misrepresentations this\n", "leads to are worse than the absence of data as they give a false sense\n", "of confidence to decision making. Understanding how to avoid these\n", "pitfalls involves an improved sense of data and its value, one that\n", "needs to permeate the organization.\n", "\n", "The final challenge is analysis, the accumulation of the necessary\n", "expertise to digest what the data tells us. Data requires intepretation,\n", "and interpretation requires experience. Analysis is providing a\n", "bottleneck due to a skill shortage, a skill shortage made more acute by\n", "the fact that, ideally, analysis should be carried out by individuals\n", "not only skilled in data science but also equipped with the domain\n", "knowledge to understand the implications in a given application, and to\n", "see opportunities for improvements in efficiency." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ‘Mathematical Data Science’\n", "\n", "As a term ‘big data’ promises much and delivers little, to get true\n", "value from data, it needs to be curated and evaluated. The three stages\n", "of awareness, availability and analysis provide a broad framework\n", "through which organizations should be assessing the potential in the\n", "data they hold. Hand waving about big data solutions will not do, it\n", "will only lead to self-deception. The castles we build on our data\n", "landscapes must be based on firm foundations, process and scientific\n", "analysis. If we do things right, those are the foundations that will be\n", "provided by the new field of data science.\n", "\n", "Today the statement “There are three types of lies: lies, damned lies\n", "and ‘big data’” may be more apt. We are revisiting many of the mistakes\n", "made in interpreting data from the 19th century. Big data is laid down\n", "by happenstance, rather than actively collected with a particular\n", "question in mind. That means it needs to be treated with care when\n", "conclusions are being drawn. For data science to succede it needs the\n", "same form of rigour that Pearson and Fisher brought to statistics, a\n", "“mathematical data science” is needed.\n", "\n", "You can also check my blog post on [Lies, Damned Lies and Big\n", "Data](http://inverseprobability.com/2016/11/19/lies-damned-lies-big-data)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Heider and Simmel (1944)\n", "\n", "\\[edit\\]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('8FIEZXMUM2I')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Figure: Fritz Heider and Marianne Simmel’s video of shapes from\n", "Heider and Simmel (1944).\n", "\n", "[Fritz Heider](https://en.wikipedia.org/wiki/Fritz_Heider) and [Marianne\n", "Simmel](https://en.wikipedia.org/wiki/Marianne_Simmel)’s experiments\n", "with animated shapes from 1944 (Heider and Simmel, 1944). Our\n", "interpretation of these objects as showing motives and even emotion is a\n", "combination of our desire for narrative, a need for understanding of\n", "each other, and our ability to empathise. At one level, these are\n", "crudely drawn objects, but in another key way, the animator has\n", "communicated a story through simple facets such as their relative\n", "motions, their sizes and their actions. We apply our psychological\n", "representations to these faceless shapes in an effort to interpret their\n", "actions.\n", "\n", "See also a recent review paper on Human Cooperation by Henrich and\n", "Muthukrishna (2021)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computer Conversations\n", "\n", "\\[edit\\]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import notutils as nu\n", "from ipywidgets import IntSlider" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import notutils as nu" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nu.display_plots('anne-bob-conversation{sample:0>3}.svg', \n", " 'https://inverseprobability.com/talks/./slides/diagrams/', sample=IntSlider(0, 0, 7, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Figure: Conversation relies on internal models of other\n", "individuals.\n", "\n", "\n", "\n", "Figure: Misunderstanding of context and who we are talking to leads\n", "to arguments.\n", "\n", "Similarly, we find it difficult to comprehend how computers are making\n", "decisions. Because they do so with more data than we can possibly\n", "imagine.\n", "\n", "In many respects, this is not a problem, it’s a good thing. Computers\n", "and us are good at different things. But when we interact with a\n", "computer, when it acts in a different way to us, we need to remember\n", "why.\n", "\n", "Just as the first step to getting along with other humans is\n", "understanding other humans, so it needs to be with getting along with\n", "our computers.\n", "\n", "Embodiment factors explain why, at the same time, computers are so\n", "impressive in simulating our weather, but so poor at predicting our\n", "moods. Our complexity is greater than that of our weather, and each of\n", "us is tuned to read and respond to one another.\n", "\n", "Their intelligence is different. It is based on very large quantities of\n", "data that we cannot absorb. Our computers don’t have a complex internal\n", "model of who we are. They don’t understand the human condition. They are\n", "not tuned to respond to us as we are to each other.\n", "\n", "Embodiment factors encapsulate a profound thing about the nature of\n", "humans. Our locked in intelligence means that we are striving to\n", "communicate, so we put a lot of thought into what we’re communicating\n", "with. And if we’re communicating with something complex, we naturally\n", "anthropomorphize them.\n", "\n", "We give our dogs, our cats and our cars human motivations. We do the\n", "same with our computers. We anthropomorphize them. We assume that they\n", "have the same objectives as us and the same constraints. They don’t.\n", "\n", "This means, that when we worry about artificial intelligence, we worry\n", "about the wrong things. We fear computers that behave like more powerful\n", "versions of ourselves that will struggle to outcompete us.\n", "\n", "In reality, the challenge is that our computers cannot be human enough.\n", "They cannot understand us with the depth we understand one another. They\n", "drop below our cognitive radar and operate outside our mental models.\n", "\n", "The real danger is that computers don’t anthropomorphize. They’ll make\n", "decisions in isolation from us without our supervision, because they\n", "can’t communicate truly and deeply with us." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Big Data Paradox\n", "\n", "\\[edit\\]\n", "\n", "The big data paradox is the modern phenomenon of “as we collect more\n", "data, we understand less.” It is emerging in several domains, political\n", "polling, characterization of patients for trials data, monitoring\n", "twitter for political sentiment.\n", "\n", "I like to think of the phenomenon as relating to the notion of “can’t\n", "see the wood for the trees.” Classical statistics, with randomized\n", "controlled trials, improved society’s understanding of data. It improved\n", "our ability to monitor the forest, to consider population health, voting\n", "patterns etc. It is critically dependent on active approaches to data\n", "collection that deal with confounders. This data collection can be very\n", "expensive.\n", "\n", "In business today, it is still the gold standard, A/B tests are used to\n", "understand the effect of an intervention on revenue or customer capture\n", "or supply chain costs.\n", "\n", "\n", "\n", "Figure: New beech leaves growing in the Gribskov Forest in the\n", "northern part of Sealand, Denmark. Photo from wikimedia commons by\n", "Malene Thyssen, .\n", "\n", "The new phenomenon is *happenstance data*. Data that is not actively\n", "collected with a question in mind. As a result, it can mislead us. For\n", "example, if we assume the politics of active users of twitter is\n", "reflective of the wider population’s politics, then we may be misled.\n", "\n", "However, this happenstance data often allows us to characterise a\n", "particular individual to a high degree of accuracy. Classical statistics\n", "was all about the forest, but big data can often become about the\n", "individual tree. As a result we are misled about the situation.\n", "\n", "The phenomenon is more dangerous, because our perception is that we are\n", "characterizing the wider scenario with ever increasing accuracy. Whereas\n", "we are just becoming distracted by detail that may or may not be\n", "pertinent to the wider situation.\n", "\n", "This is related to our limited bandwidth as humans, and the ease with\n", "which we are distracted by detail. The data-inattention-cognitive-bias." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Big Model Paradox\n", "\n", "\\[edit\\]\n", "\n", "The big data paradox has a sister: the big model paradox. As we build\n", "more and more complex models, we start believing that we have a\n", "high-fidelity representation of reality. But the complexity of reality\n", "is way beyond our feeble imaginings. So we end up with a highly complex\n", "model, but one that falls well short in terms of reflecting reality. The\n", "complexity of the model means that it moves beyond our understanding." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Complexity in Action\n", "\n", "\\[edit\\]\n", "\n", "As an exercise in understanding complexity, watch the following video.\n", "You will see the basketball being bounced around, and the players\n", "moving. Your job is to count the passes of those dressed in white and\n", "ignore those of the individuals dressed in black." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('vJG698U2Mvo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Figure: Daniel Simon’s famous illusion “monkey business.” Focus on\n", "the movement of the ball distracts the viewer from seeing other aspects\n", "of the image.\n", "\n", "In a classic study Simons and Chabris (1999) ask subjects to count the\n", "number of passes of the basketball between players on the team wearing\n", "white shirts. Fifty percent of the time, these subjects don’t notice the\n", "gorilla moving across the scene.\n", "\n", "The phenomenon of inattentional blindness is well known, e.g in their\n", "paper Simons and Charbris quote the Hungarian neurologist, Rezsö Bálint,\n", "\n", "> It is a well-known phenomenon that we do not notice anything happening\n", "> in our surroundings while being absorbed in the inspection of\n", "> something; focusing our attention on a certain object may happen to\n", "> such an extent that we cannot perceive other objects placed in the\n", "> peripheral parts of our visual field, although the light rays they\n", "> emit arrive completely at the visual sphere of the cerebral cortex.\n", ">\n", "> Rezsö Bálint 1907 (translated in Husain and Stein 1988, page 91)\n", "\n", "When we combine the complexity of the world with our relatively low\n", "bandwidth for information, problems can arise. Our focus on what we\n", "perceive to be the most important problem can cause us to miss other\n", "(potentially vital) contextual information.\n", "\n", "This phenomenon is known as selective attention or ‘inattentional\n", "blindness.’" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('_oGAzq5wM_Q')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Figure: For a longer talk on inattentional bias from Daniel Simons\n", "see this video." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Selective Attention Bias\n", "\n", "\\[edit\\]\n", "\n", "We are going to see how inattention biases can play out in data analysis\n", "by going through a simple example. The analysis involves body mass index\n", "and activity information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## BMI Steps Data\n", "\n", "\\[edit\\]\n", "\n", "The BMI Steps example is taken from Yanai and Lercher (2020). We are\n", "given a data set of body-mass index measurements against step counts.\n", "For convenience we have packaged the data so that it can be easily\n", "downloaded." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pods.datasets.bmi_steps()\n", "X = data['X'] \n", "y = data['Y']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is good practice to give our variables interpretable names so that\n", "the analysis may be clearly understood by others. Here the `steps` count\n", "is the first dimension of the covariate, the `bmi` is the second\n", "dimension and the `gender` is stored in `y` with `1` for female and `0`\n", "for male." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "steps = X[:, 0]\n", "bmi = X[:, 1]\n", "gender = y[:, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check the mean steps and the mean of the BMI." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Steps mean is {mean}.'.format(mean=steps.mean()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('BMI mean is {mean}.'.format(mean=bmi.mean()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## BMI Steps Data Analysis\n", "\n", "\\[edit\\]\n", "\n", "We can also separate out the means from the male and female populations.\n", "In python this can be done by setting male and female indices as\n", "follows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "male_ind = (gender==0)\n", "female_ind = (gender==1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we can extract the variables for the two populations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "male_steps = steps[male_ind]\n", "male_bmi = bmi[male_ind]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And as before we compute the mean." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Male steps mean is {mean}.'.format(mean=male_steps.mean()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Male BMI mean is {mean}.'.format(mean=male_bmi.mean()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can get the same result for the female portion of the\n", "populaton." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "female_steps = steps[female_ind]\n", "female_bmi = bmi[female_ind]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Female steps mean is {mean}.'.format(mean=female_steps.mean()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Female BMI mean is {mean}.'.format(mean=female_bmi.mean()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Interesting, the female BMI average is slightly higher than the male BMI\n", "average. The number of steps in the male group is higher than that in\n", "the female group. Perhaps the steps and the BMI are anti-correlated. The\n", "more steps, the lower the BMI.\n", "\n", "Python provides a statistics package. We’ll import this in `python` so\n", "that we can try and understand the correlation between the `steps` and\n", "the `BMI`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.stats import pearsonr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "corr, _ = pearsonr(steps, bmi)\n", "print(\"Pearson's overall correlation: {corr}\".format(corr=corr))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "male_corr, _ = pearsonr(male_steps, male_bmi)\n", "print(\"Pearson's correlation for males: {corr}\".format(corr=male_corr))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "female_corr, _ = pearsonr(female_steps, female_bmi)\n", "print(\"Pearson's correlation for females: {corr}\".format(corr=female_corr))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import mlai.plot as plot\n", "import mlai\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(X[male_ind, 0], X[male_ind, 1], 'g.',markersize=10)\n", "_ = ax.plot(X[female_ind, 0], X[female_ind, 1], 'r.',markersize=10)\n", "_ = ax.set_xlabel('steps', fontsize=20)\n", "_ = ax.set_ylabel('BMI', fontsize=20)\n", "xlim = (0, 15000)\n", "ylim = (15, 32.5)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "mlai.write_figure(filename='bmi-steps.svg',\n", " directory='./datasets',\n", " transparent=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A Hypothesis as a Liability\n", "\n", "This analysis is from an article titled “A Hypothesis as a Liability”\n", "(Yanai and Lercher, 2020), they start their article with the following\n", "quite from Herman Hesse.\n", "\n", "> \" ‘When someone seeks,’ said Siddhartha, ‘then it easily happens that\n", "> his eyes see only the thing that he seeks, and he is able to find\n", "> nothing, to take in nothing. \\[…\\] Seeking means: having a goal. But\n", "> finding means: being free, being open, having no goal.’ \"\n", ">\n", "> Hermann Hesse\n", "\n", "Their idea is that having a hypothesis can constrain our thinking.\n", "However, in answer to their paper Felin et al. (2021) argue that some\n", "form of hypothesis is always necessary, suggesting that a hypothesis\n", "*can* be a liability\n", "\n", "My view is captured in the introductory chapter to an edited volume on\n", "computational systems biology that I worked on with Mark Girolami,\n", "Magnus Rattray and Guido Sanguinetti.\n", "\n", "\n", "\n", "Figure: Quote from Lawrence (2010) highlighting the importance of\n", "interaction between data and hypothesis.\n", "\n", "Popper nicely captures the interaction between hypothesis and data by\n", "relating it to the chicken and the egg. The important thing is that\n", "these two co-evolve." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Number Theatre\n", "\n", "Unfortunately, we don’t always have time to wait for this process to\n", "converge to an answer we can all rely on before a decision is required.\n", "\n", "Not only can we be misled by data before a decision is made, but\n", "sometimes we can be misled by data to justify the making of a decision.\n", "David Spiegelhalter refers to the phenomenon of “Number Theatre” in a\n", "conversation with Andrew Marr from May 2020 on the presentation of data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('9388XmWIHXg')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Figure: Professor Sir David Spiegelhalter on Andrew Marr on 10th May\n", "2020 speaking about some of the challengers around data, data\n", "presentation, and decision making in a pandemic. David mentions number\n", "theatre at 9 minutes 10 seconds.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Theatre\n", "\n", "Data Theatre exploits data inattention bias to present a particular view\n", "on events that may misrepresents through selective presentation.\n", "Statisticians are one of the few groups that are trained with a\n", "sufficient degree of data skepticism. But it can also be combatted\n", "through ensuring there are domain experts present, and that they can\n", "speak freely.\n", "\n", "\n", "\n", "Figure: The pheonomenon of number theatre or *data theatre* was\n", "described by David Spiegelhalter and is nicely sumamrized by Martin\n", "Robbins in this sub-stack article\n", ".\n", "\n", "The best book I have found for teaching the skeptical sense of data that\n", "underlies the statisticians craft is David Spiegelhalter’s *Art of\n", "Statistics*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Art of Statistics\n", "\n", "\\[edit\\]\n", "\n", "
\n", "\n", "\n", " \n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "David Spiegelhalter\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "Figure: [The Art of Statistics by David\n", "Spiegelhalter](https://www.amazon.co.uk/Art-Statistics-Learning-Pelican-Books-ebook/dp/B07HQDJD99)\n", "is an excellent read on the pitfalls of data interpretation.\n", "\n", "David’s (Spiegelhalter, 2019) book brings important examples from\n", "statistics to life in an intelligent and entertaining way. It is highly\n", "readable and gives an opportunity to fast-track towards the important\n", "skill of data-skepticism that is the mark of a professional\n", "statistician." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "\\[edit\\]\n", "\n", "See the Gorilla *don’t* be the Gorilla.\n", "\n", "\n", "\n", "Figure: A famous quote from Mike Tyson before his fight with Evander\n", "Holyfield: “Everyone has a plan until they get punched in the mouth.”\n", "Don’t let the gorilla punch you in the mouth. See the gorilla, but don’t\n", "be the gorilla. Photo credit:\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Thanks!\n", "\n", "For more information on these subjects and more you might want to check\n", "the following resources.\n", "\n", "- twitter: [@lawrennd](https://twitter.com/lawrennd)\n", "- podcast: [The Talking Machines](http://thetalkingmachines.com)\n", "- newspaper: [Guardian Profile\n", " Page](http://www.theguardian.com/profile/neil-lawrence)\n", "- blog:\n", " [http://inverseprobability.com](http://inverseprobability.com/blog.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Felin, T., Koenderink, J., Krueger, J.I., Noble, D., Ellis, G.F.R.,\n", "2021. The data-hypothesis relationship. Genome Biology 22.\n", "\n", "\n", "Heider, F., Simmel, M., 1944. An experimental study of apparent\n", "behavior. The American Journal of Psychology 57, 243–259.\n", "\n", "\n", "Henrich, J., Muthukrishna, M., 2021. The origins and psychology of human\n", "cooperation. Annual Review of Psychology 72, 207–240.\n", "\n", "\n", "Krizhevsky, A., Sutskever, I., Hinton, G.E., n.d. ImageNet\n", "classification with deep convolutional neural networks. pp. 1097–1105.\n", "\n", "Lawrence, N.D., 2010. Introduction to learning and inference in\n", "computational systems biology.\n", "\n", "Simons, D.J., Chabris, C.F., 1999. Gorillas in our midst: Sustained\n", "inattentional blindness for dynamic events. Perception 28, 1059–1074.\n", "\n", "\n", "Spiegelhalter, D.J., 2019. The art of statistics. Pelican.\n", "\n", "Yanai, I., Lercher, M., 2020. A hypothesis is a liability. Genome\n", "Biology 21." ] } ], "nbformat": 4, "nbformat_minor": 5, "metadata": {} }