{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Ability of Gaussian process regression (GPR) to estimate data noise-level\n\nThis example shows the ability of the\n:class:`~sklearn.gaussian_process.kernels.WhiteKernel` to estimate the noise\nlevel in the data. Moreover, we show the importance of kernel hyperparameters\ninitialization.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data generation\n\nWe will work in a setting where `X` will contain a single feature. We create a\nfunction that will generate the target to be predicted. We will add an\noption to add some noise to the generated target.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n\n\ndef target_generator(X, add_noise=False):\n target = 0.5 + np.sin(3 * X)\n if add_noise:\n rng = np.random.RandomState(1)\n target += rng.normal(0, 0.3, size=target.shape)\n return target.squeeze()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look to the target generator where we will not add any noise to\nobserve the signal that we would like to predict.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X = np.linspace(0, 5, num=80).reshape(-1, 1)\ny = target_generator(X, add_noise=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nplt.plot(X, y, label=\"Expected signal\")\nplt.legend()\nplt.xlabel(\"X\")\n_ = plt.ylabel(\"y\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The target is transforming the input `X` using a sine function. Now, we will\ngenerate few noisy training samples. To illustrate the noise level, we will\nplot the true signal together with the noisy training samples.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "rng = np.random.RandomState(0)\nX_train = rng.uniform(0, 5, size=20).reshape(-1, 1)\ny_train = target_generator(X_train, add_noise=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot(X, y, label=\"Expected signal\")\nplt.scatter(\n x=X_train[:, 0],\n y=y_train,\n color=\"black\",\n alpha=0.4,\n label=\"Observations\",\n)\nplt.legend()\nplt.xlabel(\"X\")\n_ = plt.ylabel(\"y\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimisation of kernel hyperparameters in GPR\n\nNow, we will create a\n:class:`~sklearn.gaussian_process.GaussianProcessRegressor`\nusing an additive kernel adding a\n:class:`~sklearn.gaussian_process.kernels.RBF` and\n:class:`~sklearn.gaussian_process.kernels.WhiteKernel` kernels.\nThe :class:`~sklearn.gaussian_process.kernels.WhiteKernel` is a kernel that\nwill able to estimate the amount of noise present in the data while the\n:class:`~sklearn.gaussian_process.kernels.RBF` will serve at fitting the\nnon-linearity between the data and the target.\n\nHowever, we will show that the hyperparameter space contains several local\nminima. It will highlights the importance of initial hyperparameter values.\n\nWe will create a model using a kernel with a high noise level and a large\nlength scale, which will explain all variations in the data by noise.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.gaussian_process import GaussianProcessRegressor\nfrom sklearn.gaussian_process.kernels import RBF, WhiteKernel\n\nkernel = 1.0 * RBF(length_scale=1e1, length_scale_bounds=(1e-2, 1e3)) + WhiteKernel(\n noise_level=1, noise_level_bounds=(1e-10, 1e1)\n)\ngpr = GaussianProcessRegressor(kernel=kernel, alpha=0.0)\ngpr.fit(X_train, y_train)\ny_mean, y_std = gpr.predict(X, return_std=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot(X, y, label=\"Expected signal\")\nplt.scatter(x=X_train[:, 0], y=y_train, color=\"black\", alpha=0.4, label=\"Observations\")\nplt.errorbar(X, y_mean, y_std, label=\"Posterior mean \u00b1 std\")\nplt.legend()\nplt.xlabel(\"X\")\nplt.ylabel(\"y\")\n_ = plt.title(\n (\n f\"Initial: {kernel}\\nOptimum: {gpr.kernel_}\\nLog-Marginal-Likelihood: \"\n f\"{gpr.log_marginal_likelihood(gpr.kernel_.theta)}\"\n ),\n fontsize=8,\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the optimum kernel found still has a high noise level and an even\nlarger length scale. The length scale reaches the maximum bound that we\nallowed for this parameter and we got a warning as a result.\n\nMore importantly, we observe that the model does not provide useful\npredictions: the mean prediction seems to be constant: it does not follow the\nexpected noise-free signal.\n\nNow, we will initialize the :class:`~sklearn.gaussian_process.kernels.RBF`\nwith a larger `length_scale` initial value and the\n:class:`~sklearn.gaussian_process.kernels.WhiteKernel` with a smaller initial\nnoise level lower while keeping the parameter bounds unchanged.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "kernel = 1.0 * RBF(length_scale=1e-1, length_scale_bounds=(1e-2, 1e3)) + WhiteKernel(\n noise_level=1e-2, noise_level_bounds=(1e-10, 1e1)\n)\ngpr = GaussianProcessRegressor(kernel=kernel, alpha=0.0)\ngpr.fit(X_train, y_train)\ny_mean, y_std = gpr.predict(X, return_std=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot(X, y, label=\"Expected signal\")\nplt.scatter(x=X_train[:, 0], y=y_train, color=\"black\", alpha=0.4, label=\"Observations\")\nplt.errorbar(X, y_mean, y_std, label=\"Posterior mean \u00b1 std\")\nplt.legend()\nplt.xlabel(\"X\")\nplt.ylabel(\"y\")\n_ = plt.title(\n (\n f\"Initial: {kernel}\\nOptimum: {gpr.kernel_}\\nLog-Marginal-Likelihood: \"\n f\"{gpr.log_marginal_likelihood(gpr.kernel_.theta)}\"\n ),\n fontsize=8,\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we see that the model's predictions are more precise than the\nprevious model's: this new model is able to estimate the noise-free\nfunctional relationship.\n\nLooking at the kernel hyperparameters, we see that the best combination found\nhas a smaller noise level and shorter length scale than the first model.\n\nWe can inspect the negative Log-Marginal-Likelihood (LML) of\n:class:`~sklearn.gaussian_process.GaussianProcessRegressor`\nfor different hyperparameters to get a sense of the local minima.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from matplotlib.colors import LogNorm\n\nlength_scale = np.logspace(-2, 4, num=80)\nnoise_level = np.logspace(-2, 1, num=80)\nlength_scale_grid, noise_level_grid = np.meshgrid(length_scale, noise_level)\n\nlog_marginal_likelihood = [\n gpr.log_marginal_likelihood(theta=np.log([0.36, scale, noise]))\n for scale, noise in zip(length_scale_grid.ravel(), noise_level_grid.ravel())\n]\nlog_marginal_likelihood = np.reshape(log_marginal_likelihood, noise_level_grid.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "vmin, vmax = (-log_marginal_likelihood).min(), 50\nlevel = np.around(np.logspace(np.log10(vmin), np.log10(vmax), num=20), decimals=1)\nplt.contour(\n length_scale_grid,\n noise_level_grid,\n -log_marginal_likelihood,\n levels=level,\n norm=LogNorm(vmin=vmin, vmax=vmax),\n)\nplt.colorbar()\nplt.xscale(\"log\")\nplt.yscale(\"log\")\nplt.xlabel(\"Length-scale\")\nplt.ylabel(\"Noise-level\")\nplt.title(\"Negative log-marginal-likelihood\")\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that there are two local minima that correspond to the combination of\nhyperparameters previously found. Depending on the initial values for the\nhyperparameters, the gradient-based optimization might or might not\nconverge to the best model. It is thus important to repeat the optimization\nseveral times for different initializations. This can be done by setting the\n`n_restarts_optimizer` parameter of the\n:class:`~sklearn.gaussian_process.GaussianProcessRegressor` class.\n\nLet's try again to fit our model with the bad initial values but this time\nwith 10 random restarts.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "kernel = 1.0 * RBF(length_scale=1e1, length_scale_bounds=(1e-2, 1e3)) + WhiteKernel(\n noise_level=1, noise_level_bounds=(1e-10, 1e1)\n)\ngpr = GaussianProcessRegressor(\n kernel=kernel, alpha=0.0, n_restarts_optimizer=10, random_state=0\n)\ngpr.fit(X_train, y_train)\ny_mean, y_std = gpr.predict(X, return_std=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.plot(X, y, label=\"Expected signal\")\nplt.scatter(x=X_train[:, 0], y=y_train, color=\"black\", alpha=0.4, label=\"Observations\")\nplt.errorbar(X, y_mean, y_std, label=\"Posterior mean \u00b1 std\")\nplt.legend()\nplt.xlabel(\"X\")\nplt.ylabel(\"y\")\n_ = plt.title(\n (\n f\"Initial: {kernel}\\nOptimum: {gpr.kernel_}\\nLog-Marginal-Likelihood: \"\n f\"{gpr.log_marginal_likelihood(gpr.kernel_.theta)}\"\n ),\n fontsize=8,\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we hoped, random restarts allow the optimization to find the best set\nof hyperparameters despite the bad initial values.\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }