{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generalization: Model Validation\n", "\n", "### 27th October 2015 Neil D. Lawrence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we had to summarise the objectives of machine learning in one word, a very good candidate for that word would be *generalization*. What is generalization? From a human perspective it might be summarised as the ability to take lessons learned in one domain and apply them to another domain. If we accept the definition given in the first session for machine learning, \n", "$$\n", "\\text{data} + \\text{model} \\xrightarrow{\\text{compute}} \\text{prediction}\n", "$$\n", "then we see that without a model we can't generalise: we only have data. Data is fine for answering very specific questions, like \"Who won the Olympic Marathon in 2012?\", because we have that answer stored, however, we are not given the answer to many other questions. For example, Alan Turing was a formidable marathon runner, in 1946 he ran a time 2 hours 46 minutes (just under four minutes per kilometer, faster than I and most of the other [Endcliffe Park Run](http://www.parkrun.org.uk/sheffieldhallam/) runners can do 5 km). What is the probability he would have won an Olympics if one had been held in 1946? \n", "![Alan Turing, Times in the Times](http://www.turing.org.uk/turing/pi2/times2.gif)![Alan Turing running in 1946](http://www.turing.org.uk/turing/pi2/run.jpg)\n", "