{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An Introduction to Bayesian Quadrature with Emukit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# General imports\n", "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from matplotlib import colors as mcolors\n", "\n", "# Figure config\n", "LEGEND_SIZE = 15\n", "FIGURE_SIZE = (12, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Navigation\n", "\n", "1. [What is Bayesian quadrature?](#1.-What-is-Bayesian-quadrature?)\n", "\n", "2. [The ingredients of Bayesian quadrature](#2.-The-quadrature-of-Bayesian-optimization)\n", "\n", "3. [Emukit's Bayesian quadrature interface](#3.-Emukit's-Bayesian-quadrature-interface)\n", "\n", "4. [References](#4.-References)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. What is Bayesian quadrature?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We wish to integrate a function $f: \\mathbb{X} \\rightarrow \\mathbb{R}$, $x\\mapsto f(x)$ which is defined on some input space $\\mathbb{X}\\subseteq\\mathbb{R}^D$\n", "\n", "$$\n", "F : = \\int_{\\mathbb{X}}f(x)\\mathrm{d}x.\n", "$$\n", "\n", "The domain $\\mathbb{X}$ can equally be defined by the integral bounds, i.e., $bounds = \\{(lower\\_bound_d, upper\\_bound_d)\\}_{d=1}^D$. The function $f$ is called the *integrand*. Integration problems occur for example when computing estimators of some quantity or when marginalizing a probability distribution. Numerical solvers are needed whenever $f(x)$ is not integrable analytically.\n", "\n", "Bayesian quadrature (BQ) [[1, 2]](#4.-References) is a numerical method that infers the value of the integral $F$ from a finite number of usually noise-free integrand evaluations $f(x_n)$, at locations $x_n$, $n=1, ...,N$. It does so by first emulating the integrand function $f$ with a surrogate model.\n", "Then, an estimate for $F$ is obtained by integrating the surrogate model of $f$. \n", "The key point is that the integration of the surrogate model is much easier, often analytic, in comparison to the original integration problem. Hence BQ shifts the problem of solving a hard integration problem to a regression problem on the integrand and an easier, often analytic integration problem of the regression model.\n", "\n", "\n", "#### Surrogate model of the integrand\n", "The surrogate for the integrand $f$ is often a Gaussian process (GP) which can be integrated analytically if the kernel of the GP can be integrated analytically; an example of such a kernel is the RBF/Gaussian kernel. Sometimes also a process related to a Gaussian process is used as surrogate (e.g., [[3]](#4.-References)), then the surrogate is approximated with yet another GP which can then be integrated again. \n", "\n", "#### Distribution over the integral value\n", "Bayesian quadrature returns a *distribution* over the integral value $F$, not only an estimate of the integral value. An estimate can then be deduced, e.g., by taking the maximum a posteriori (MAP). If a GP is used as surrogate for $f$ then the integral distribution is of Gaussian form $F\\sim\\mathcal{N}(\\mathfrak{m}_n, \\mathfrak{v}_n)$. We obtain a distribution on $F$ because the model on the integrand $f$ was a distribution in the first place. A benefit of a distribution as return type in contrast to an estimator only is that a distribution also quantifies the uncertainty on the integral estimation. \n", "\n", "The integral distribution $F\\sim\\mathcal{N}(\\mathfrak{m}_n, \\mathfrak{v}_n)$ is exact and a Gaussian due to the closeness property of GPs under affine transformations. The univariate mean and variance of the integral distribution are respectively\n", "\n", "$$\n", "\\mathfrak{m}_n = \\int_{\\mathbb{X}} \\mu_n(x)\\mathrm{d}x, \\qquad\\qquad\n", "\\mathfrak{v}_n = \\int_{\\mathbb{X}} \\int_{\\mathbb{X}}^3 k_n(x, x')\\mathrm{d}x\\mathrm{d}x'.\n", "$$\n", "\n", "We can obtain the mean $\\mathfrak{m}_n$ and variance $\\mathfrak{v}_n$ of the integral distribution analytically if we know the analytic forms of the integral over the GP posterior mean function $\\mu_n(x)$ and the double integral over the posterior covariance function $k_n(x, x')$; this is fulfilled e.g., by the RBF kernel. \n", "\n", "#### Acquiring data\n", "\n", "In practice, integrand evaluation $f(x_n)$ can be expensive and we are restricted to a limited budged of $N$ choices of locations $x_n$ at which we can evaluate $f$.\n", "Thus we would like to choose these locations wisely, or some best way possible. What 'best' means is encoded in a user-defined utility which is equal to, or approximated by an *acquisition* function $a_n(x)$. The acquisition function always has knowledge of the current model of the integrand, in this case the GP on $f$ with mean $\\mu_n$ and kernel $k_n$.\n", "\n", "The integral $F$ is a global property of $f$, meaning that local information of $f$ can not be used alone to meaningfully guess $F$. This is reflected in the acquisition functions that are commonly used in BQ--they, too, are commonly global quantities. Examples are the integral-variance-reduction or the mutual information between the integral value and an additional hypothetically collected point at location $x$, i.e., \n", "\n", "$$\n", "x_{n+1} \\in \\operatorname*{arg\\:max}_{x \\in \\mathbb{X}} a_n(x).\n", "$$\n", "\n", "\n", "\n", "Given the surrogate GP on $f$, the acquisition function $a$, and a function handle to $f(x)$, BQ essentially iterates the following three steps until the budget of $N$ evaluations is used up: \n", "1. fit the GP $p(f|D_n)$ on the currently available dataset $D_n\\{(f(x_i), x_i)\\}_{i=1}^n$.\n", "2. condition the acquisition function $a_n(x)$ on $p(f|D_n)$ and find the maximizer, i.e., $x_{n+1} = \\operatorname*{arg\\:max}_{x \\in \\mathbb{X}} a_n(x)$.\n", "3. evaluate the objective function at $x_{n+1}$ to obtain $f(x_{n+1})$, and add the new observation to the dataset $D_{n+1} \\leftarrow D_{n} \\cup \\{x_{n+1}, f(x_{n+1})\\}$.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. The ingredients of Bayesian quadrature" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "