{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An Introduction to Experimental Design with Emukit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# General imports\n", "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from matplotlib import colors as mcolors\n", "\n", "# Figure config\n", "LEGEND_SIZE = 15" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Navigation\n", "\n", "1. [What is experimental design?](#1.-What-is-experimental-design?)\n", "\n", "2. [The ingredients of experimental design](#2.-The-ingredients-of-experimental-design)\n", "\n", "3. [Emukit's experimental design interface](#3.-Emukit's-experimental-design-interface)\n", "\n", "4. [References](#4.-References)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. What is experimental design?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider a function $f: \\mathbb{X} \\rightarrow \\mathbb{R}$, $x\\mapsto f(x)$ which is defined in some constrained input space $\\mathbb{X}$. The function might be unknown, and we may only learn about it by querying it at some locations $x$ to obtain (possibly noisy) measurements $y(x) = f(x) + \\epsilon$, $\\epsilon \\sim \\mathcal{N}(0, \\sigma^2_{noise})$.\n", "*Experimental design* (ExpDesign) tries to predict the function $f(x)$ as accurately as possible also in locations where it has not been observed. This is especially useful if one needs to know the value of $f$ at a particular point $x_{new}$ but it would take too long in real life to evaluate $f$. This happens for example when $f(x)$ is the output of a time-consuming computer simulation, and a decision that needs to be made depends on the value $f(x_{new})$. \n", "An example of such a scenario might be a simulation of a tsunami [[Saito, 2019]](#4.-References) \n", "that is being run whenever an earthquake happens, in order to decide if inhabited regions need to be evacuated, and there is just not enough time to query the precise but expensive simulation. The function $f(x)$ in this case might describe the severity of the tsunami (wave height), and the inputs $x$ might describe physical measurement on the ocean ground.\n", "An *emulator* for the function $f$ that can be queried instead of the simulation, and would give an approximate answer with a calibrated error bar which can be used to make the decision instead. For this, the emulator first needs to be trained on \"datapoints\" which are the results of previous simulation runs of $f$.\n", "\n", "To make an emulator as reliable and functional as possible, the aim is to learn the function $f$ as well as possible given some limited number of function evaluations.\n", "\n", "There are two crucial bits in experimental design:\n", "\n", " - A prior probability measure $p(f)$ which captures our prior beliefs on $f$, called the model. Everytime we observe new data $D$ the prior will be updated to a 'posterior' $p(f|D)$ using the available data. Obtaining the data $D$ would require running the costly simulation.\n", "\n", " - An acquisition function $a: \\mathbb{X} \\rightarrow \\mathbb{R}$ which for each point in the input space quantifies the utility of evaluating this point. The central idea of the acquisition function is that the next point that will be acquired should be maximally informative to learn $f$.\n", "\n", "Given these ingredients, ExpDesign essentially iterates the following three steps: \n", "1. fit the model $p(f|D_{n})$ on the currently available data $D_{n}$.\n", "2. find the most interesting point to evaluate by $x_{n+1} \\in \\operatorname*{arg\\:max}_{x \\in \\mathbb{X}} a_n(x)$\n", "3. evaluate the objective function at $x_{n+1}$, obtain $y_{n+1}$ and add the new observation to the data $D_{n+1} \\leftarrow D_{n} \\cup \\{x_{n+1}, y_{n+1}\\}$.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. The ingredients of experimental design" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "