{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Comparison of kernel ridge regression and SVR\n\nBoth kernel ridge regression (KRR) and SVR learn a non-linear function by\nemploying the kernel trick, i.e., they learn a linear function in the space\ninduced by the respective kernel which corresponds to a non-linear function in\nthe original space. They differ in the loss functions (ridge versus\nepsilon-insensitive loss). In contrast to SVR, fitting a KRR can be done in\nclosed-form and is typically faster for medium-sized datasets. On the other\nhand, the learned model is non-sparse and thus slower than SVR at\nprediction-time.\n\nThis example illustrates both methods on an artificial dataset, which\nconsists of a sinusoidal target function and strong noise added to every fifth\ndatapoint.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Authors: The scikit-learn developers\nSPDX-License-Identifier: BSD-3-Clause\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate sample data\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n\nrng = np.random.RandomState(42)\n\nX = 5 * rng.rand(10000, 1)\ny = np.sin(X).ravel()\n\n# Add noise to targets\ny[::5] += 3 * (0.5 - rng.rand(X.shape[0] // 5))\n\nX_plot = np.linspace(0, 5, 100000)[:, None]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct the kernel-based regression models\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.kernel_ridge import KernelRidge\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.svm import SVR\n\ntrain_size = 100\n\nsvr = GridSearchCV(\n SVR(kernel=\"rbf\", gamma=0.1),\n param_grid={\"C\": [1e0, 1e1, 1e2, 1e3], \"gamma\": np.logspace(-2, 2, 5)},\n)\n\nkr = GridSearchCV(\n KernelRidge(kernel=\"rbf\", gamma=0.1),\n param_grid={\"alpha\": [1e0, 0.1, 1e-2, 1e-3], \"gamma\": np.logspace(-2, 2, 5)},\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare times of SVR and Kernel Ridge Regression\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import time\n\nt0 = time.time()\nsvr.fit(X[:train_size], y[:train_size])\nsvr_fit = time.time() - t0\nprint(f\"Best SVR with params: {svr.best_params_} and R2 score: {svr.best_score_:.3f}\")\nprint(\"SVR complexity and bandwidth selected and model fitted in %.3f s\" % svr_fit)\n\nt0 = time.time()\nkr.fit(X[:train_size], y[:train_size])\nkr_fit = time.time() - t0\nprint(f\"Best KRR with params: {kr.best_params_} and R2 score: {kr.best_score_:.3f}\")\nprint(\"KRR complexity and bandwidth selected and model fitted in %.3f s\" % kr_fit)\n\nsv_ratio = svr.best_estimator_.support_.shape[0] / train_size\nprint(\"Support vector ratio: %.3f\" % sv_ratio)\n\nt0 = time.time()\ny_svr = svr.predict(X_plot)\nsvr_predict = time.time() - t0\nprint(\"SVR prediction for %d inputs in %.3f s\" % (X_plot.shape[0], svr_predict))\n\nt0 = time.time()\ny_kr = kr.predict(X_plot)\nkr_predict = time.time() - t0\nprint(\"KRR prediction for %d inputs in %.3f s\" % (X_plot.shape[0], kr_predict))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Look at the results\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nsv_ind = svr.best_estimator_.support_\nplt.scatter(\n X[sv_ind],\n y[sv_ind],\n c=\"r\",\n s=50,\n label=\"SVR support vectors\",\n zorder=2,\n edgecolors=(0, 0, 0),\n)\nplt.scatter(X[:100], y[:100], c=\"k\", label=\"data\", zorder=1, edgecolors=(0, 0, 0))\nplt.plot(\n X_plot,\n y_svr,\n c=\"r\",\n label=\"SVR (fit: %.3fs, predict: %.3fs)\" % (svr_fit, svr_predict),\n)\nplt.plot(\n X_plot, y_kr, c=\"g\", label=\"KRR (fit: %.3fs, predict: %.3fs)\" % (kr_fit, kr_predict)\n)\nplt.xlabel(\"data\")\nplt.ylabel(\"target\")\nplt.title(\"SVR versus Kernel Ridge\")\n_ = plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The previous figure compares the learned model of KRR and SVR when both\ncomplexity/regularization and bandwidth of the RBF kernel are optimized using\ngrid-search. The learned functions are very similar; however, fitting KRR is\napproximately 3-4 times faster than fitting SVR (both with grid-search).\n\nPrediction of 100000 target values could be in theory approximately three\ntimes faster with SVR since it has learned a sparse model using only\napproximately 1/3 of the training datapoints as support vectors. However, in\npractice, this is not necessarily the case because of implementation details\nin the way the kernel function is computed for each model that can make the\nKRR model as fast or even faster despite computing more arithmetic\noperations.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualize training and prediction times\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure()\n\nsizes = np.logspace(1, 3.8, 7).astype(int)\nfor name, estimator in {\n \"KRR\": KernelRidge(kernel=\"rbf\", alpha=0.01, gamma=10),\n \"SVR\": SVR(kernel=\"rbf\", C=1e2, gamma=10),\n}.items():\n train_time = []\n test_time = []\n for train_test_size in sizes:\n t0 = time.time()\n estimator.fit(X[:train_test_size], y[:train_test_size])\n train_time.append(time.time() - t0)\n\n t0 = time.time()\n estimator.predict(X_plot[:1000])\n test_time.append(time.time() - t0)\n\n plt.plot(\n sizes,\n train_time,\n \"o-\",\n color=\"r\" if name == \"SVR\" else \"g\",\n label=\"%s (train)\" % name,\n )\n plt.plot(\n sizes,\n test_time,\n \"o--\",\n color=\"r\" if name == \"SVR\" else \"g\",\n label=\"%s (test)\" % name,\n )\n\nplt.xscale(\"log\")\nplt.yscale(\"log\")\nplt.xlabel(\"Train size\")\nplt.ylabel(\"Time (seconds)\")\nplt.title(\"Execution Time\")\n_ = plt.legend(loc=\"best\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This figure compares the time for fitting and prediction of KRR and SVR for\ndifferent sizes of the training set. Fitting KRR is faster than SVR for\nmedium-sized training sets (less than a few thousand samples); however, for\nlarger training sets SVR scales better. With regard to prediction time, SVR\nshould be faster than KRR for all sizes of the training set because of the\nlearned sparse solution, however this is not necessarily the case in practice\nbecause of implementation details. Note that the degree of sparsity and thus\nthe prediction time depends on the parameters epsilon and C of the SVR.\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualize the learning curves\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import LearningCurveDisplay\n\n_, ax = plt.subplots()\n\nsvr = SVR(kernel=\"rbf\", C=1e1, gamma=0.1)\nkr = KernelRidge(kernel=\"rbf\", alpha=0.1, gamma=0.1)\n\ncommon_params = {\n \"X\": X[:100],\n \"y\": y[:100],\n \"train_sizes\": np.linspace(0.1, 1, 10),\n \"scoring\": \"neg_mean_squared_error\",\n \"negate_score\": True,\n \"score_name\": \"Mean Squared Error\",\n \"score_type\": \"test\",\n \"std_display_style\": None,\n \"ax\": ax,\n}\n\nLearningCurveDisplay.from_estimator(svr, **common_params)\nLearningCurveDisplay.from_estimator(kr, **common_params)\nax.set_title(\"Learning curves\")\nax.legend(handles=ax.get_legend_handles_labels()[0], labels=[\"SVR\", \"KRR\"])\n\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }