{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### Towards Machine Learning Systems Design\n", "\n", "#### 2018-05-02\n", "\n", "#### Neil D. Lawrence\n", "\n", "#### Amazon Research Cambridge and University of Sheffield\n", "\n", "`@lawrennd` [inverseprobability.com](http://inverseprobability.com)\n", "\n", "### What is Machine Learning?\n", "\n", "First of all, we'll consider the question, what is machine learning? By\n", "my definition Machine Learning is a combination of\n", "\n", "$$\\text{data} + \\text{model} \\rightarrow \\text{prediction}$$\n", "\n", "where *data* is our observations. They can be actively or passively\n", "acquired (meta-data). The *model* contains our assumptions, based on\n", "previous experience. THat experience can be other data, it can come from\n", "transfer learning, or it can merely be our beliefs about the\n", "regularities of the universe. In humans our models include our inductive\n", "biases. The *prediction* is an action to be taken or a categorization or\n", "a quality score. The reason that machine learning has become a mainstay\n", "of artificial intelligence is the importance of predictions in\n", "artificial intelligence.\n", "\n", "In practice we normally perform machine learning using two functions. To\n", "combine data with a model we typically make use of:\n", "\n", "**a prediction function** a function which is used to make the\n", "predictions. It includes our beliefs about the regularities of the\n", "universe, our assumptions about how the world works, e.g. smoothness,\n", "spatial similarities, temporal similarities.\n", "\n", "**an objective function** a function which defines the cost of\n", "misprediction. Typically it includes knowledge about the world's\n", "generating processes (probabilistic objectives) or the costs we pay for\n", "mispredictions (empiricial risk minimization).\n", "\n", "The combination of data and model through the prediction function and\n", "the objectie function leads to a *learning algorithm*. The class of\n", "prediction functions and objective functions we can make use of is\n", "restricted by the algorithms they lead to. If the prediction function or\n", "the objective function are too complex, then it can be difficult to find\n", "an appropriate learning algorithm.\n", "\n", "A useful reference for state of the art in machine learning is the UK\n", "Royal Society Report, [Machine Learning: Power and Promise of Computers\n", "that Learn by\n", "Example](https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf).\n", "\n", "You can also check my blog post on [\"What is Machine\n", "Learning?\"](http://inverseprobability.com/2017/07/17/what-is-machine-learning)\n", "\n", "### Machine Learning as the Driver ...\n", "\n", "... of two different domains\n", "\n", "1. *Data Science*: arises from the fact that we now capture data by\n", " happenstance.\n", "\n", "2. *Artificial Intelligence*: emulation of human behaviour.\n", "\n", "### What does Machine Learning do?\n", "\n", "- ML Automates through Data\n", " - *Strongly* related to statistics.\n", " - Field underpins revolution in *data science* and *AI*\n", "- With AI: logic, *robotics*, *computer vision*, *speech*\n", "- With Data Science: *databases*, *data mining*, *statistics*,\n", " *visualization*\n", "\n", "### \"Embodiment Factors\"\n", "\n", "
\n", " | \n", "\n", "\n", " | \n", "\n", "\n", " | \n", "
\n", "compute\n", " | \n", "\n", "\\~10 gigaflops\n", " | \n", "\n", "\\~ 1000 teraflops?\n", " | \n", "
\n", "communicate\n", " | \n", "\n", "\\~1 gigbit/s\n", " | \n", "\n", "\\~ 100 bit/s\n", " | \n", "(compute/communicate)\n", " | \n", "\n", "10\n", " | \n", "\n", "\\~ 1013\n", "\n", " |
\n", "Source: DeepFace\n", "
\n", "### \n", "\n", "\n", "\n", "We can think of what these models are doing as being similar to early\n", "pin ball machines. In a neural network, we input a number (or numbers),\n", "whereas in pinball, we input a ball. The location of the ball on the\n", "left-right axis can be thought of as the number. As the ball falls\n", "through the machine, each layer of pins can be thought of as a different\n", "layer of neurons. Each layer acts to move the ball from left to right.\n", "\n", "In a pinball machine, when the ball gets to the bottom it might fall\n", "into a hole defining a score, in a neural network, that is equivalent to\n", "the decision: a classification of the input object.\n", "\n", "An image has more than one number associated with it, so it's like\n", "playing pinball in a *hyper-space*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "pods.notebook.display_plots('pinball{sample:0>3}.svg', \n", " '../slides/diagrams', sample=(1,2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At initialization, the pins aren't in the right place to bring the ball\n", "to the correct decision.\n", "\n", "Learning involves moving all the pins to be in the right position, so\n", "that the ball falls in the right place. But moving all these pins in\n", "hyperspace can be difficult. In a hyper space you have to put a lot of\n", "data through the machine for to explore the positions of all the pins.\n", "Adversarial learning reflects the fact that a ball can be moved a small\n", "distance and lead to a very different result.\n", "\n", "Probabilistic methods explore more of the space by considering a range\n", "of possible paths for the ball through the machine." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import teaching_plots as plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s compute_kernel mlai.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load -s eq_cov mlai.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(10)\n", "plot.rejection_samples(compute_kernel, kernel=eq_cov, \n", " lengthscale=0.25, diagrams='../slides/diagrams/gp')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "pods.notebook.display_plots('gp_rejection_samples{sample:0>3}.svg', \n", " '../slides/diagrams/gp', sample=(1,5))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data\n", "\n", "The first thing we will do is load a standard data set for regression\n", "modelling. The data consists of the pace of Olympic Gold Medal Marathon\n", "winners for the Olympics from 1896 to present. First we load in the data\n", "and plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pods.datasets.olympic_marathon_men()\n", "x = data['X']\n", "y = data['Y']\n", "\n", "offset = y.mean()\n", "scale = np.sqrt(y.var())\n", "\n", "xlim = (1875,2030)\n", "ylim = (2.5, 6.5)\n", "yhat = (y-offset)/scale\n", "\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "_ = ax.plot(x, y, 'r.',markersize=10)\n", "ax.set_xlabel('year', fontsize=20)\n", "ax.set_ylabel('pace min/km', fontsize=20)\n", "ax.set_xlim(xlim)\n", "ax.set_ylim(ylim)\n", "\n", "mlai.write_figure(figure=fig, filename='../slides/diagrams/datasets/olympic-marathon.svg', transparent=True, frameon=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Olympic Marathon Data\n", "\n", "\n", "- Gold medal times for Olympic Marathon since 1896.\n", "\n", "- Marathons before 1924 didn’t have a standardised distance.\n", "\n", "- Present results using pace per km.\n", "\n", "- In 1904 Marathon was badly organised leading to very slow times.\n", "\n", " | \n", "\n",
"![image](../slides/diagrams/Stephen_Kiprotich.jpg) Image from\n",
"Wikimedia Commons | \n",
"