{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An Introduction to Bayesian Optimization with Emukit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "### General imports\n", "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from matplotlib import colors as mcolors\n", "\n", "### --- Figure config\n", "LEGEND_SIZE = 15" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Navigation\n", "\n", "1. [What is Bayesian optimization?](#1.-What-is-Bayesian-optimization?)\n", "\n", "2. [The ingredients of Bayesian optimization](#2.-The-ingredients-of-Bayesian-optimization)\n", "\n", "3. [Emukit's Bayesian optimization interface](#3.-Emukit's-Bayesian-optimization-interface)\n", "\n", "4. [References](#4.-References)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. What is Bayesian optimization?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a function $f: \\mathbb{X} \\rightarrow \\mathbb{R}$ which is defined in some constrained input space $\\mathbb{X}$, Bayesian optimization (BO) [[Shahriari et al, 2016]](#4.-References) tries to find the global minimum $x_{\\star} \\in \\mathbb{X}$ of the function $f$ by solving the global optimization problem :\n", "\n", "$$ x_{\\star} = \\operatorname*{arg\\:min}_{x \\in \\mathbb{X}} f(x). $$\n", "\n", "Typically these objective functions $f$ are noisy, i.e $y(x) = f(x) + \\epsilon$ with $\\epsilon \\sim N(0, \\sigma_{noise})$ and expensive to evaluate. Additionally we assume that no gradient information is available and hence we treat $f$ as a black-box. \n", "Popular examples for such black-box optimization problems are:\n", "\n", " - optimizing the hyperparameters of a machine learning algorithms such as for instance a neural network, where each function evaluation requires to train and validate the neural network \n", " \n", " - optimizing the parameters of a controller for a robot\n", " \n", " - etc.\n", "\n", "There are two crucial bits in Bayesian optimization:\n", "\n", " - A prior probability measure $p(f)$ which captures our prior beliefs on $f$, called the model. Everytime we observe new data $D$ the prior will be updated to a 'posterior' $p(f|D)$ using the available data.\n", "\n", " - An acquisition function $a: \\mathbb{X} \\rightarrow \\mathbb{R}$ which for each point in the input space quantifies the utility of evaluating this point. The central idea of the acquisition function is to trade off the exploration in regions of the input space where the model is still uncertain and the exploitation of the model's confidence about the good regions of the input space.\n", "\n", "Given these ingredients, BO essentially iterates the following three steps until it achieves a predfined stopping criteria: \n", "1. fit the model $p(f|D_{n})$ on the currently available data $D_{n}$.\n", "2. find the most interesting point to evaluate by $x_{n+1} \\in \\operatorname*{arg\\:max}_{x \\in \\mathbb{X}} a(x)$\n", "3. evaluate the objective function at $x_{n+1}$, obtain $y_{n+1}$ and add the new observation to the data $D_{n+1} \\leftarrow D_{n} \\cup \\{x_{n+1}, y_{n+1}\\}$ \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. The ingredients of Bayesian optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "