{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reliable uncertainty estimates for neural network predictions\n", "\n", "I previously wrote about [Bayesian neural networks](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-neural-networks/bayesian_neural_networks.ipynb) and explained how uncertainty estimates can be obtained for network predictions. Uncertainty in predictions that comes from uncertainty in network weights is called *epistemic uncertainty* or model uncertainty. A simple regression example demonstrated how epistemic uncertainty increases in regions outside the training data distribution:\n", "\n", "![Epistemic uncertainty](images/epistemic-uncertainty.png)\n", "\n", "A reader later [experimented](https://github.com/krasserm/bayesian-machine-learning/issues/8) with discontinuous ranges of training data and found that uncertainty estimates are lower than expected in training data \"gaps\", as shown in the following figure near the center of the $x$ axis. In these out-of-distribution (OOD) regions the network is over-confident in its predictions. One reason for this over-confidence is that weight priors usually impose only weak constraints over network outputs in OOD regions.\n", "\n", "![Epistemic uncertainty gap](images/epistemic-uncertainty-gap.png)\n", "\n", "If we could instead define a prior in data space directly we could better control uncertainty estimates for OOD data. A prior in data space better captures assumptions about input-output relationships than priors in weight space. Including such a prior through a loss in data space would allow a network to learn distributions over weights that better generalize to OOD regions i.e. enables a network to output more reliable uncertainty estimates.\n", "\n", "This is exactly what the paper [Noise Contrastive Priors for Functional Uncertainty](http://proceedings.mlr.press/v115/hafner20a.html) does. In this article I'll give an introduction to their approach and demonstrate how it fixes over-confidence in OOD regions. I will again use non-linear regression with one-dimensional inputs as an example and plan to cover higher-demensional inputs in a later article. \n", "\n", "Application of noise contrastive priors (NCPs) is not limited to Bayesian neural networks, they can also be applied to deterministic neural networks. Here, I'll use a Bayesian neural network and implement it with Tensorflow 2 and [Tensorflow Probability](https://www.tensorflow.org/probability)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import logging\n", "import numpy as np\n", "import tensorflow as tf\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "from tensorflow.keras.layers import Input, Dense, Lambda, LeakyReLU\n", "from tensorflow.keras.models import Model\n", "from tensorflow.keras.regularizers import L2\n", "from tensorflow_probability import distributions as tfd\n", "from tensorflow_probability import layers as tfpl\n", "from scipy.stats import norm\n", "\n", "from utils import (train,\n", " backprop,\n", " select_bands, \n", " select_subset,\n", " style,\n", " plot_data, \n", " plot_prediction, \n", " plot_uncertainty)\n", "\n", "%matplotlib inline\n", "logging.getLogger('tensorflow').setLevel(logging.ERROR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "rng = np.random.RandomState(123)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The training dataset are 40 noisy samples from a sinusoidal function f taken from two distinct regions of the input space (red dots). The gray dots illustrate how the noise level increases with $x$ (heteroskedastic noise). " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "text/plain": [ "