{ "cells": [ { "cell_type": "markdown", "id": "d0831f1b", "metadata": {}, "source": [ "# 机器学习调参练习" ] }, { "cell_type": "markdown", "id": "62875267", "metadata": {}, "source": [ "在机器学习中,超参数是指无法从数据中学习而需要在训练前提供的参数。机器学习模型的性能在很大程度上依赖于寻找最佳超参数集。\n", "\n", "超参数调整一般是指调整模型的超参数,这基本上是一个非常耗时的过程。在本文中,我们将和你一起研习 3 种最流行的超参数调整技术:\n", "\n", "- **网格搜索**\n", "\n", "- **随机搜索**\n", "\n", "- **贝叶斯搜索**" ] }, { "cell_type": "markdown", "id": "495f25f8", "metadata": {}, "source": [ "其实还有第零种调参方法,就是手动调参,因为简单机械,就不在本文讨论范围内。\n", "\n", "为方便阅读,列出本文的结构如下:\n", "\n", "1.获取和准备数据 \n", "\n", "2.网格搜索 \n", "\n", "3.随机搜索 \n", "\n", "4.贝叶斯搜索 \n", "\n", "5.写在最后\n", "\n", "## 获取和准备数据\n", "\n", "\n", "\n", "为演示方便,本文使用内置乳腺癌数据来训练**支持向量分类**(SVC)。可以通过`load_breast_cancer`函数获取数据。" ] }, { "cell_type": "code", "execution_count": 1, "id": "d9ccf734", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean radiusmean texturemean perimetermean areamean smoothnessmean compactnessmean concavitymean concave pointsmean symmetrymean fractal dimension...worst radiusworst textureworst perimeterworst areaworst smoothnessworst compactnessworst concavityworst concave pointsworst symmetryworst fractal dimension
017.9910.38122.801001.00.118400.277600.30010.147100.24190.07871...25.3817.33184.602019.00.16220.66560.71190.26540.46010.11890
120.5717.77132.901326.00.084740.078640.08690.070170.18120.05667...24.9923.41158.801956.00.12380.18660.24160.18600.27500.08902
219.6921.25130.001203.00.109600.159900.19740.127900.20690.05999...23.5725.53152.501709.00.14440.42450.45040.24300.36130.08758
311.4220.3877.58386.10.142500.283900.24140.105200.25970.09744...14.9126.5098.87567.70.20980.86630.68690.25750.66380.17300
420.2914.34135.101297.00.100300.132800.19800.104300.18090.05883...22.5416.67152.201575.00.13740.20500.40000.16250.23640.07678
\n", "

5 rows × 30 columns

\n", "
" ], "text/plain": [ " mean radius mean texture mean perimeter mean area mean smoothness \\\n", "0 17.99 10.38 122.80 1001.0 0.11840 \n", "1 20.57 17.77 132.90 1326.0 0.08474 \n", "2 19.69 21.25 130.00 1203.0 0.10960 \n", "3 11.42 20.38 77.58 386.1 0.14250 \n", "4 20.29 14.34 135.10 1297.0 0.10030 \n", "\n", " mean compactness mean concavity mean concave points mean symmetry \\\n", "0 0.27760 0.3001 0.14710 0.2419 \n", "1 0.07864 0.0869 0.07017 0.1812 \n", "2 0.15990 0.1974 0.12790 0.2069 \n", "3 0.28390 0.2414 0.10520 0.2597 \n", "4 0.13280 0.1980 0.10430 0.1809 \n", "\n", " mean fractal dimension ... worst radius worst texture worst perimeter \\\n", "0 0.07871 ... 25.38 17.33 184.60 \n", "1 0.05667 ... 24.99 23.41 158.80 \n", "2 0.05999 ... 23.57 25.53 152.50 \n", "3 0.09744 ... 14.91 26.50 98.87 \n", "4 0.05883 ... 22.54 16.67 152.20 \n", "\n", " worst area worst smoothness worst compactness worst concavity \\\n", "0 2019.0 0.1622 0.6656 0.7119 \n", "1 1956.0 0.1238 0.1866 0.2416 \n", "2 1709.0 0.1444 0.4245 0.4504 \n", "3 567.7 0.2098 0.8663 0.6869 \n", "4 1575.0 0.1374 0.2050 0.4000 \n", "\n", " worst concave points worst symmetry worst fractal dimension \n", "0 0.2654 0.4601 0.11890 \n", "1 0.1860 0.2750 0.08902 \n", "2 0.2430 0.3613 0.08758 \n", "3 0.2575 0.6638 0.17300 \n", "4 0.1625 0.2364 0.07678 \n", "\n", "[5 rows x 30 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "from sklearn.datasets import load_breast_cancer\n", "\n", "cancer = load_breast_cancer()\n", "df_X = pd.DataFrame(cancer['data'], columns=cancer['feature_names'])\n", "df_X.head()" ] }, { "cell_type": "markdown", "id": "c9858c66", "metadata": {}, "source": [ "接下来为特征和目标标签创建`df_X`和`df_y`,如下所示:" ] }, { "cell_type": "code", "execution_count": 2, "id": "21d18a00", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Cancer
00
10
20
30
40
\n", "
" ], "text/plain": [ " Cancer\n", "0 0\n", "1 0\n", "2 0\n", "3 0\n", "4 0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_y = pd.DataFrame(cancer['target'], columns=['Cancer'])\n", "df_y.head()" ] }, { "cell_type": "markdown", "id": "6fce020b", "metadata": {}, "source": [ "> PS :如果想了解更多关于数据集的信息,可以运行`print(cancer['DESCR'])`打印出摘要和特征信息。" ] }, { "cell_type": "code", "execution_count": 3, "id": "22f22551", "metadata": {}, "outputs": [], "source": [ "# print(cancer['DESCR'])" ] }, { "cell_type": "markdown", "id": "f07c227e", "metadata": {}, "source": [ "接下来,使用`training_test_split()`方法将数据集拆分为训练集 \\(70\\%\\) 和测试集 \\(30\\%\\) :" ] }, { "cell_type": "code", "execution_count": 4, "id": "ac1969eb", "metadata": {}, "outputs": [], "source": [ "# train test split\n", "from sklearn.model_selection import train_test_split\n", "import numpy as np\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(df_X,\n", " np.ravel(df_y),\n", " test_size=0.3)" ] }, { "cell_type": "markdown", "id": "e9b9b725", "metadata": {}, "source": [ "我们将训练**支持向量分类器**\\(SVC\\) 模型。正则化参数`C`和核系数`gamma`是 SVC 中最重要的两个超参数:\n", "\n", "- 正则化参数`C`决定了正则化的强度。\n", "- 核系数`gamma`控制核的宽度。SVC 默认使用**径向基函数 \\(RBF\\)**核(也称为**高斯核**)。\n", "\n", "我们将在以下教程中调整这两个参数。" ] }, { "cell_type": "markdown", "id": "f82a6929", "metadata": {}, "source": [ "## 网格搜索" ] }, { "cell_type": "markdown", "id": "10a155f5", "metadata": {}, "source": [ "最优值`C`和`gamma`是比较难找得到的。最简单的解决方案是尝试一堆组合,看看哪种组合效果最好。这种创建参数“网格”并尝试所有可能组合的方法称为**网格搜索。**\n", "\n", "![image.png](images/1.png)\n", "\n", "这种方法非常常见,所以 Scikit-learn 在`GridSearchCV`中内置了这种功能。CV 代表交叉验证,这是另一种评估和改进机器学习模型的技术。\n", "\n", "`GridSearchCV`需要一个描述准备尝试的参数和要训练的模型的字典。网格搜索的参数网格定义为字典,其中键是参数,值是要测试的一系列设置值。下面动手试试,首先定义候选参数` C``和``gamma `,如下所示:" ] }, { "cell_type": "code", "execution_count": 5, "id": "f133b38b", "metadata": {}, "outputs": [], "source": [ "param_grid = { \n", " 'C': [0.1, 1, 10, 100, 1000], \n", " 'gamma': [1, 0.1, 0.01, 0.001, 0.0001] \n", "}" ] }, { "cell_type": "markdown", "id": "7ea8adea", "metadata": {}, "source": [ "接下来创建一个`GridSearchCV`对象,并使用训练数据进行训练模型。" ] }, { "cell_type": "code", "execution_count": 6, "id": "e660eeaf", "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import GridSearchCV\n", "from sklearn.svm import SVC\n", "svc1=SVC()\n", "grid = GridSearchCV(svc1, param_grid, refit=True, verbose=3)" ] }, { "cell_type": "code", "execution_count": 7, "id": "e843fb99", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 25 candidates, totalling 125 fits\n", "[CV] C=0.1, gamma=1 ..................................................\n", "[CV] ...................... C=0.1, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=1 ..................................................\n", "[CV] ...................... C=0.1, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=1 ..................................................\n", "[CV] ...................... C=0.1, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=1 ..................................................\n", "[CV] ...................... C=0.1, gamma=1, score=0.595, total= 0.0s\n", "[CV] C=0.1, gamma=1 ..................................................\n", "[CV] ...................... C=0.1, gamma=1, score=0.608, total= 0.0s\n", "[CV] C=0.1, gamma=0.1 ................................................\n", "[CV] .................... C=0.1, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.1 ................................................\n", "[CV] .................... C=0.1, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.1 ................................................\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[CV] .................... C=0.1, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.1 ................................................\n", "[CV] .................... C=0.1, gamma=0.1, score=0.595, total= 0.0s\n", "[CV] C=0.1, gamma=0.1 ................................................\n", "[CV] .................... C=0.1, gamma=0.1, score=0.608, total= 0.0s\n", "[CV] C=0.1, gamma=0.01 ...............................................\n", "[CV] ................... C=0.1, gamma=0.01, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.01 ...............................................\n", "[CV] ................... C=0.1, gamma=0.01, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.01 ...............................................\n", "[CV] ................... C=0.1, gamma=0.01, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.01 ...............................................\n", "[CV] ................... C=0.1, gamma=0.01, score=0.595, total= 0.0s\n", "[CV] C=0.1, gamma=0.01 ...............................................\n", "[CV] ................... C=0.1, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=0.1, gamma=0.001 ..............................................\n", "[CV] .................. C=0.1, gamma=0.001, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.001 ..............................................\n", "[CV] .................. C=0.1, gamma=0.001, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.001 ..............................................\n", "[CV] .................. C=0.1, gamma=0.001, score=0.600, total= 0.0s\n", "[CV] C=0.1, gamma=0.001 ..............................................\n", "[CV] .................. C=0.1, gamma=0.001, score=0.595, total= 0.0s\n", "[CV] C=0.1, gamma=0.001 ..............................................\n", "[CV] .................. C=0.1, gamma=0.001, score=0.608, total= 0.0s\n", "[CV] C=0.1, gamma=0.0001 .............................................\n", "[CV] ................. C=0.1, gamma=0.0001, score=0.963, total= 0.0s\n", "[CV] C=0.1, gamma=0.0001 .............................................\n", "[CV] ................. C=0.1, gamma=0.0001, score=0.963, total= 0.0s\n", "[CV] C=0.1, gamma=0.0001 .............................................\n", "[CV] ................. C=0.1, gamma=0.0001, score=0.887, total= 0.0s\n", "[CV] C=0.1, gamma=0.0001 .............................................\n", "[CV] ................. C=0.1, gamma=0.0001, score=0.886, total= 0.0s\n", "[CV] C=0.1, gamma=0.0001 .............................................\n", "[CV] ................. C=0.1, gamma=0.0001, score=0.962, total= 0.0s\n", "[CV] C=1, gamma=1 ....................................................\n", "[CV] ........................ C=1, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=1, gamma=1 ....................................................\n", "[CV] ........................ C=1, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=1, gamma=1 ....................................................\n", "[CV] ........................ C=1, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=1, gamma=1 ....................................................\n", "[CV] ........................ C=1, gamma=1, score=0.595, total= 0.0s\n", "[CV] C=1, gamma=1 ....................................................\n", "[CV] ........................ C=1, gamma=1, score=0.608, total= 0.0s\n", "[CV] C=1, gamma=0.1 ..................................................\n", "[CV] ...................... C=1, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=1, gamma=0.1 ..................................................\n", "[CV] ...................... C=1, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=1, gamma=0.1 ..................................................\n", "[CV] ...................... C=1, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=1, gamma=0.1 ..................................................\n", "[CV] ...................... C=1, gamma=0.1, score=0.595, total= 0.0s\n", "[CV] C=1, gamma=0.1 ..................................................\n", "[CV] ...................... C=1, gamma=0.1, score=0.608, total= 0.0s\n", "[CV] C=1, gamma=0.01 .................................................\n", "[CV] ..................... C=1, gamma=0.01, score=0.613, total= 0.0s\n", "[CV] C=1, gamma=0.01 .................................................\n", "[CV] ..................... C=1, gamma=0.01, score=0.600, total= 0.0s\n", "[CV] C=1, gamma=0.01 .................................................\n", "[CV] ..................... C=1, gamma=0.01, score=0.613, total= 0.0s\n", "[CV] C=1, gamma=0.01 .................................................\n", "[CV] ..................... C=1, gamma=0.01, score=0.595, total= 0.0s\n", "[CV] C=1, gamma=0.01 .................................................\n", "[CV] ..................... C=1, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=1, gamma=0.001 ................................................\n", "[CV] .................... C=1, gamma=0.001, score=0.950, total= 0.0s\n", "[CV] C=1, gamma=0.001 ................................................\n", "[CV] .................... C=1, gamma=0.001, score=0.938, total= 0.0s\n", "[CV] C=1, gamma=0.001 ................................................\n", "[CV] .................... C=1, gamma=0.001, score=0.900, total= 0.0s\n", "[CV] C=1, gamma=0.001 ................................................\n", "[CV] .................... C=1, gamma=0.001, score=0.899, total= 0.0s\n", "[CV] C=1, gamma=0.001 ................................................\n", "[CV] .................... C=1, gamma=0.001, score=0.924, total= 0.0s\n", "[CV] C=1, gamma=0.0001 ...............................................\n", "[CV] ................... C=1, gamma=0.0001, score=0.963, total= 0.0s\n", "[CV] C=1, gamma=0.0001 ...............................................\n", "[CV] ................... C=1, gamma=0.0001, score=0.988, total= 0.0s\n", "[CV] C=1, gamma=0.0001 ...............................................\n", "[CV] ................... C=1, gamma=0.0001, score=0.900, total= 0.0s\n", "[CV] C=1, gamma=0.0001 ...............................................\n", "[CV] ................... C=1, gamma=0.0001, score=0.873, total= 0.0s\n", "[CV] C=1, gamma=0.0001 ...............................................\n", "[CV] ................... C=1, gamma=0.0001, score=0.962, total= 0.0s\n", "[CV] C=10, gamma=1 ...................................................\n", "[CV] ....................... C=10, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=10, gamma=1 ...................................................\n", "[CV] ....................... C=10, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=10, gamma=1 ...................................................\n", "[CV] ....................... C=10, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=10, gamma=1 ...................................................\n", "[CV] ....................... C=10, gamma=1, score=0.595, total= 0.0s\n", "[CV] C=10, gamma=1 ...................................................\n", "[CV] ....................... C=10, gamma=1, score=0.608, total= 0.0s\n", "[CV] C=10, gamma=0.1 .................................................\n", "[CV] ..................... C=10, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=10, gamma=0.1 .................................................\n", "[CV] ..................... C=10, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=10, gamma=0.1 .................................................\n", "[CV] ..................... C=10, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=10, gamma=0.1 .................................................\n", "[CV] ..................... C=10, gamma=0.1, score=0.595, total= 0.0s\n", "[CV] C=10, gamma=0.1 .................................................\n", "[CV] ..................... C=10, gamma=0.1, score=0.608, total= 0.0s\n", "[CV] C=10, gamma=0.01 ................................................\n", "[CV] .................... C=10, gamma=0.01, score=0.625, total= 0.0s\n", "[CV] C=10, gamma=0.01 ................................................\n", "[CV] .................... C=10, gamma=0.01, score=0.600, total= 0.0s\n", "[CV] C=10, gamma=0.01 ................................................\n", "[CV] .................... C=10, gamma=0.01, score=0.613, total= 0.0s\n", "[CV] C=10, gamma=0.01 ................................................\n", "[CV] .................... C=10, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=10, gamma=0.01 ................................................\n", "[CV] .................... C=10, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=10, gamma=0.001 ...............................................\n", "[CV] ................... C=10, gamma=0.001, score=0.938, total= 0.0s\n", "[CV] C=10, gamma=0.001 ...............................................\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[CV] ................... C=10, gamma=0.001, score=0.950, total= 0.0s\n", "[CV] C=10, gamma=0.001 ...............................................\n", "[CV] ................... C=10, gamma=0.001, score=0.887, total= 0.0s\n", "[CV] C=10, gamma=0.001 ...............................................\n", "[CV] ................... C=10, gamma=0.001, score=0.886, total= 0.0s\n", "[CV] C=10, gamma=0.001 ...............................................\n", "[CV] ................... C=10, gamma=0.001, score=0.924, total= 0.0s\n", "[CV] C=10, gamma=0.0001 ..............................................\n", "[CV] .................. C=10, gamma=0.0001, score=0.963, total= 0.0s\n", "[CV] C=10, gamma=0.0001 ..............................................\n", "[CV] .................. C=10, gamma=0.0001, score=0.938, total= 0.0s\n", "[CV] C=10, gamma=0.0001 ..............................................\n", "[CV] .................. C=10, gamma=0.0001, score=0.900, total= 0.0s\n", "[CV] C=10, gamma=0.0001 ..............................................\n", "[CV] .................. C=10, gamma=0.0001, score=0.911, total= 0.0s\n", "[CV] C=10, gamma=0.0001 ..............................................\n", "[CV] .................. C=10, gamma=0.0001, score=0.924, total= 0.0s\n", "[CV] C=100, gamma=1 ..................................................\n", "[CV] ...................... C=100, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=100, gamma=1 ..................................................\n", "[CV] ...................... C=100, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=100, gamma=1 ..................................................\n", "[CV] ...................... C=100, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=100, gamma=1 ..................................................\n", "[CV] ...................... C=100, gamma=1, score=0.595, total= 0.0s\n", "[CV] C=100, gamma=1 ..................................................\n", "[CV] ...................... C=100, gamma=1, score=0.608, total= 0.0s\n", "[CV] C=100, gamma=0.1 ................................................\n", "[CV] .................... C=100, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=100, gamma=0.1 ................................................\n", "[CV] .................... C=100, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=100, gamma=0.1 ................................................\n", "[CV] .................... C=100, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=100, gamma=0.1 ................................................\n", "[CV] .................... C=100, gamma=0.1, score=0.595, total= 0.0s\n", "[CV] C=100, gamma=0.1 ................................................\n", "[CV] .................... C=100, gamma=0.1, score=0.608, total= 0.0s\n", "[CV] C=100, gamma=0.01 ...............................................\n", "[CV] ................... C=100, gamma=0.01, score=0.625, total= 0.0s\n", "[CV] C=100, gamma=0.01 ...............................................\n", "[CV] ................... C=100, gamma=0.01, score=0.600, total= 0.0s\n", "[CV] C=100, gamma=0.01 ...............................................\n", "[CV] ................... C=100, gamma=0.01, score=0.613, total= 0.0s\n", "[CV] C=100, gamma=0.01 ...............................................\n", "[CV] ................... C=100, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=100, gamma=0.01 ...............................................\n", "[CV] ................... C=100, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=100, gamma=0.001 ..............................................\n", "[CV] .................. C=100, gamma=0.001, score=0.938, total= 0.0s\n", "[CV] C=100, gamma=0.001 ..............................................\n", "[CV] .................. C=100, gamma=0.001, score=0.950, total= 0.0s\n", "[CV] C=100, gamma=0.001 ..............................................\n", "[CV] .................. C=100, gamma=0.001, score=0.887, total= 0.0s\n", "[CV] C=100, gamma=0.001 ..............................................\n", "[CV] .................. C=100, gamma=0.001, score=0.886, total= 0.0s\n", "[CV] C=100, gamma=0.001 ..............................................\n", "[CV] .................. C=100, gamma=0.001, score=0.924, total= 0.0s\n", "[CV] C=100, gamma=0.0001 .............................................\n", "[CV] ................. C=100, gamma=0.0001, score=0.950, total= 0.0s\n", "[CV] C=100, gamma=0.0001 .............................................\n", "[CV] ................. C=100, gamma=0.0001, score=0.950, total= 0.0s\n", "[CV] C=100, gamma=0.0001 .............................................\n", "[CV] ................. C=100, gamma=0.0001, score=0.912, total= 0.0s\n", "[CV] C=100, gamma=0.0001 .............................................\n", "[CV] ................. C=100, gamma=0.0001, score=0.911, total= 0.0s\n", "[CV] C=100, gamma=0.0001 .............................................\n", "[CV] ................. C=100, gamma=0.0001, score=0.924, total= 0.0s\n", "[CV] C=1000, gamma=1 .................................................\n", "[CV] ..................... C=1000, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=1000, gamma=1 .................................................\n", "[CV] ..................... C=1000, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=1000, gamma=1 .................................................\n", "[CV] ..................... C=1000, gamma=1, score=0.600, total= 0.0s\n", "[CV] C=1000, gamma=1 .................................................\n", "[CV] ..................... C=1000, gamma=1, score=0.595, total= 0.0s\n", "[CV] C=1000, gamma=1 .................................................\n", "[CV] ..................... C=1000, gamma=1, score=0.608, total= 0.0s\n", "[CV] C=1000, gamma=0.1 ...............................................\n", "[CV] ................... C=1000, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=1000, gamma=0.1 ...............................................\n", "[CV] ................... C=1000, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=1000, gamma=0.1 ...............................................\n", "[CV] ................... C=1000, gamma=0.1, score=0.600, total= 0.0s\n", "[CV] C=1000, gamma=0.1 ...............................................\n", "[CV] ................... C=1000, gamma=0.1, score=0.595, total= 0.0s\n", "[CV] C=1000, gamma=0.1 ...............................................\n", "[CV] ................... C=1000, gamma=0.1, score=0.608, total= 0.0s\n", "[CV] C=1000, gamma=0.01 ..............................................\n", "[CV] .................. C=1000, gamma=0.01, score=0.625, total= 0.0s\n", "[CV] C=1000, gamma=0.01 ..............................................\n", "[CV] .................. C=1000, gamma=0.01, score=0.600, total= 0.0s\n", "[CV] C=1000, gamma=0.01 ..............................................\n", "[CV] .................. C=1000, gamma=0.01, score=0.613, total= 0.0s\n", "[CV] C=1000, gamma=0.01 ..............................................\n", "[CV] .................. C=1000, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=1000, gamma=0.01 ..............................................\n", "[CV] .................. C=1000, gamma=0.01, score=0.608, total= 0.0s\n", "[CV] C=1000, gamma=0.001 .............................................\n", "[CV] ................. C=1000, gamma=0.001, score=0.938, total= 0.0s\n", "[CV] C=1000, gamma=0.001 .............................................\n", "[CV] ................. C=1000, gamma=0.001, score=0.950, total= 0.0s\n", "[CV] C=1000, gamma=0.001 .............................................\n", "[CV] ................. C=1000, gamma=0.001, score=0.887, total= 0.0s\n", "[CV] C=1000, gamma=0.001 .............................................\n", "[CV] ................. C=1000, gamma=0.001, score=0.886, total= 0.0s\n", "[CV] C=1000, gamma=0.001 .............................................\n", "[CV] ................. C=1000, gamma=0.001, score=0.924, total= 0.0s\n", "[CV] C=1000, gamma=0.0001 ............................................\n", "[CV] ................ C=1000, gamma=0.0001, score=0.950, total= 0.0s\n", "[CV] C=1000, gamma=0.0001 ............................................\n", "[CV] ................ C=1000, gamma=0.0001, score=0.938, total= 0.0s\n", "[CV] C=1000, gamma=0.0001 ............................................\n", "[CV] ................ C=1000, gamma=0.0001, score=0.912, total= 0.0s\n", "[CV] C=1000, gamma=0.0001 ............................................\n", "[CV] ................ C=1000, gamma=0.0001, score=0.886, total= 0.0s\n", "[CV] C=1000, gamma=0.0001 ............................................\n", "[CV] ................ C=1000, gamma=0.0001, score=0.924, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 125 out of 125 | elapsed: 2.6s finished\n" ] }, { "data": { "text/plain": [ "GridSearchCV(estimator=SVC(),\n", " param_grid={'C': [0.1, 1, 10, 100, 1000],\n", " 'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},\n", " verbose=3)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grid.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "id": "9159ed85", "metadata": {}, "source": [ "一旦训练完成后,我们可以通过`GridSearchCV`的`best_params_`属性查看搜索到的最佳参数,并使用`best_estimator_`属性查看最佳模型:" ] }, { "cell_type": "code", "execution_count": 8, "id": "b41334f3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'C': 1, 'gamma': 0.0001}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 找到最好的参数\n", "grid.best_params_" ] }, { "cell_type": "code", "execution_count": 9, "id": "39abb48a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "SVC(C=1, gamma=0.0001)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 找到最好的模型\n", "grid.best_estimator_" ] }, { "cell_type": "markdown", "id": "8fd266b8", "metadata": {}, "source": [ "训练完成后,现在选择并采用该网格搜索到的最佳模型,并使用测试集进行预测并创建分类报告和混淆矩阵。" ] }, { "cell_type": "code", "execution_count": 10, "id": "829ae47e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 42 11]\n", " [ 1 117]]\n" ] } ], "source": [ "# 使用最好的估计器进行预测\n", "grid_predictions = grid.predict(X_test)\n", "# 混淆矩阵\n", "from sklearn.metrics import classification_report, confusion_matrix\n", "print(confusion_matrix(y_test, grid_predictions))" ] }, { "cell_type": "code", "execution_count": 11, "id": "e516e05e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.98 0.79 0.88 53\n", " 1 0.91 0.99 0.95 118\n", "\n", " accuracy 0.93 171\n", " macro avg 0.95 0.89 0.91 171\n", "weighted avg 0.93 0.93 0.93 171\n", "\n" ] } ], "source": [ "# 分类模型报告\n", "print(classification_report(y_test, grid_predictions))" ] }, { "cell_type": "markdown", "id": "1f60d201", "metadata": {}, "source": [ "## 随机搜索\n", "\n", "网格搜索尝试超参数的所有组合,因此增加了计算的时间复杂度,在数据量较大,或者模型较为复杂等等情况下,可能导致不可行的计算成本,这样网格搜索调参方法就不适用了。然而,**随机搜索**提供更便利的替代方案,该方法只测试你选择的超参数组成的元组,并且超参数值的选择是完全随机的,如下图所示。\n", "\n", "![随机搜索尝试随机组合](images/2.png)\n", "\n", "这种方法也很常见,所以 Scikit-learn 在`RandomizedSearchCV`中内置了这种功能。函数 API 与`GridSearchCV`类似。首先指定参数`C`和`gamma`以及参数值的候选样本的分布,如下所示:" ] }, { "cell_type": "code", "execution_count": 12, "id": "759d20d0", "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import RandomizedSearchCV\n", "import scipy.stats as stats\n", "from sklearn.utils.fixes import loguniform\n", "\n", "# 指定采样的参数和分布\n", "param_dist = {\n", " 'C': stats.uniform(0.1, 1e4),\n", " 'gamma': loguniform(1e-6, 1e+1),\n", "}" ] }, { "cell_type": "markdown", "id": "82082bd8", "metadata": {}, "source": [ "接下来创建一个`RandomizedSearchCV`带参数`n_iter_search`的对象,并将使用训练数据来训练模型。" ] }, { "cell_type": "code", "execution_count": 13, "id": "8cc207c5", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 20 candidates, totalling 100 fits\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06 ...............\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06, score=0.988, total= 0.0s\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06 ...............\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06, score=0.938, total= 0.0s\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06 ...............\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06, score=0.938, total= 0.0s\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06 ...............\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06, score=0.899, total= 0.0s\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06 ...............\n", "[CV] C=1774.343244541459, gamma=3.1892778886544583e-06, score=0.962, total= 0.0s\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606 .................\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606, score=0.600, total= 0.0s\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606 .................\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606, score=0.600, total= 0.0s\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606 .................\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606, score=0.600, total= 0.0s\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606 .................\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[CV] C=1503.2524541228202, gamma=0.10464256812063606, score=0.595, total= 0.0s\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606 .................\n", "[CV] C=1503.2524541228202, gamma=0.10464256812063606, score=0.608, total= 0.0s\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838 ................\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838, score=0.938, total= 0.0s\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838 ................\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838, score=0.925, total= 0.0s\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838 ................\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838, score=0.887, total= 0.0s\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838 ................\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838, score=0.835, total= 0.0s\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838 ................\n", "[CV] C=1175.6858596772884, gamma=0.001591903045908838, score=0.924, total= 0.0s\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187 ..................\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187, score=0.600, total= 0.0s\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187 ..................\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187, score=0.600, total= 0.0s\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187 ..................\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187, score=0.600, total= 0.0s\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187 ..................\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187, score=0.595, total= 0.0s\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187 ..................\n", "[CV] C=9949.091253584631, gamma=0.09694064523745187, score=0.608, total= 0.0s\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526 .................\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526, score=0.600, total= 0.0s\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526 .................\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526, score=0.600, total= 0.0s\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526 .................\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526, score=0.600, total= 0.0s\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526 .................\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526, score=0.595, total= 0.0s\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526 .................\n", "[CV] C=7062.033162389142, gamma=0.025314128570814526, score=0.608, total= 0.0s\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826 ...................\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826, score=0.600, total= 0.0s\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826 ...................\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826, score=0.600, total= 0.0s\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826 ...................\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826, score=0.600, total= 0.0s\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826 ...................\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826, score=0.595, total= 0.0s\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826 ...................\n", "[CV] C=8645.579554506498, gamma=2.8458471754376826, score=0.608, total= 0.0s\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812 ..................\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812, score=0.600, total= 0.0s\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812 ..................\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812, score=0.600, total= 0.0s\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812 ..................\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812, score=0.600, total= 0.0s\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812 ..................\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812, score=0.595, total= 0.0s\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812 ..................\n", "[CV] C=9172.917553983025, gamma=0.22508011539786812, score=0.608, total= 0.0s\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586 ...............\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586, score=0.975, total= 0.0s\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586 ...............\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586, score=0.925, total= 0.0s\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586 ...............\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586, score=0.887, total= 0.0s\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586 ...............\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586, score=0.899, total= 0.0s\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586 ...............\n", "[CV] C=4688.375600254233, gamma=0.00019280688754327586, score=0.886, total= 0.0s\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735 ...................\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735, score=0.600, total= 0.0s\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735 ...................\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735, score=0.600, total= 0.0s\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735 ...................\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735, score=0.600, total= 0.0s\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735 ...................\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735, score=0.595, total= 0.0s\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735 ...................\n", "[CV] C=5411.375675765696, gamma=0.2328071362790735, score=0.608, total= 0.0s\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275 ................\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275, score=0.963, total= 0.0s\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275 ................\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275, score=0.912, total= 0.0s\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275 ................\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275, score=0.912, total= 0.0s\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275 ................\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275, score=0.899, total= 0.0s\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275 ................\n", "[CV] C=5348.859402328399, gamma=0.0001332402277403275, score=0.899, total= 0.0s\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554 .....................\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554, score=0.600, total= 0.0s\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554 .....................\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554, score=0.600, total= 0.0s\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554 .....................\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554, score=0.600, total= 0.0s\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554 .....................\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554, score=0.595, total= 0.0s\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554 .....................\n", "[CV] C=3306.4862531240483, gamma=8.8599919277554, score=0.608, total= 0.0s\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522 ................\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522, score=0.963, total= 0.0s\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522 ................\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522, score=0.938, total= 0.0s\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522 ................\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522, score=0.887, total= 0.0s\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522 ................\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522, score=0.899, total= 0.0s\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522 ................\n", "[CV] C=6782.837501408192, gamma=0.0003840359158522522, score=0.924, total= 0.0s\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468 ....................\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468, score=0.600, total= 0.0s\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468 ....................\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468, score=0.600, total= 0.0s\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468 ....................\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[CV] C=8646.69312648182, gamma=0.0514422048188468, score=0.600, total= 0.0s\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468 ....................\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468, score=0.595, total= 0.0s\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468 ....................\n", "[CV] C=8646.69312648182, gamma=0.0514422048188468, score=0.608, total= 0.0s\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988 ...............\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988, score=0.950, total= 0.0s\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988 ...............\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988, score=0.938, total= 0.0s\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988 ...............\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988, score=0.912, total= 0.0s\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988 ...............\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988, score=0.911, total= 0.0s\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988 ...............\n", "[CV] C=298.3475123803619, gamma=0.00010372525056028988, score=0.924, total= 0.0s\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05 ...............\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05, score=0.938, total= 0.0s\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05 ...............\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05, score=0.900, total= 0.0s\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05 ...............\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05, score=0.938, total= 0.0s\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05 ...............\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05, score=0.911, total= 0.0s\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05 ...............\n", "[CV] C=8981.788000280414, gamma=1.7432010086816468e-05, score=0.911, total= 0.0s\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915 ...................\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915, score=0.600, total= 0.0s\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915 ...................\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915, score=0.600, total= 0.0s\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915 ...................\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915, score=0.600, total= 0.0s\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915 ...................\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915, score=0.595, total= 0.0s\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915 ...................\n", "[CV] C=5280.585313970376, gamma=3.9370744645847915, score=0.608, total= 0.0s\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846 ...................\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846, score=0.600, total= 0.0s\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846 ...................\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846, score=0.600, total= 0.0s\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846 ...................\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846, score=0.600, total= 0.0s\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846 ...................\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846, score=0.595, total= 0.0s\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846 ...................\n", "[CV] C=1796.0060616565647, gamma=4.069550456489846, score=0.608, total= 0.0s\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084 ..................\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084, score=0.600, total= 0.0s\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084 ..................\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084, score=0.600, total= 0.0s\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084 ..................\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084, score=0.600, total= 0.0s\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084 ..................\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084, score=0.595, total= 0.0s\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084 ..................\n", "[CV] C=9364.034934800327, gamma=0.05309835444229084, score=0.608, total= 0.0s\n", "[CV] C=1480.646519092858, gamma=3.768555228496674 ....................\n", "[CV] C=1480.646519092858, gamma=3.768555228496674, score=0.600, total= 0.0s\n", "[CV] C=1480.646519092858, gamma=3.768555228496674 ....................\n", "[CV] C=1480.646519092858, gamma=3.768555228496674, score=0.600, total= 0.0s\n", "[CV] C=1480.646519092858, gamma=3.768555228496674 ....................\n", "[CV] C=1480.646519092858, gamma=3.768555228496674, score=0.600, total= 0.0s\n", "[CV] C=1480.646519092858, gamma=3.768555228496674 ....................\n", "[CV] C=1480.646519092858, gamma=3.768555228496674, score=0.595, total= 0.0s\n", "[CV] C=1480.646519092858, gamma=3.768555228496674 ....................\n", "[CV] C=1480.646519092858, gamma=3.768555228496674, score=0.608, total= 0.0s\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05 ...............\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05, score=0.950, total= 0.0s\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05 ...............\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05, score=0.912, total= 0.0s\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05 ...............\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05, score=0.938, total= 0.0s\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05 ...............\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05, score=0.911, total= 0.0s\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05 ...............\n", "[CV] C=4859.453665124743, gamma=1.5771735759385928e-05, score=0.911, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed: 1.9s finished\n" ] }, { "data": { "text/plain": [ "RandomizedSearchCV(estimator=SVC(), n_iter=20,\n", " param_distributions={'C': ,\n", " 'gamma': },\n", " verbose=3)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_iter_search = 20\n", "random_search = RandomizedSearchCV(SVC(),\n", " param_distributions=param_dist,\n", " n_iter=n_iter_search,\n", " refit=True,\n", " verbose=3)\n", "random_search.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "id": "85bcee0b", "metadata": {}, "source": [ "同样,一旦训练完成后,我们可以通过`RandomizedSearchCV`的`best_params_`属性查看搜索到的最佳参数,并使用`best_estimator_`属性查看得到的最佳模型:" ] }, { "cell_type": "markdown", "id": "28cb102d", "metadata": {}, "source": [ "预测 RandomizedSearchCV 并创建报告。\n", "\n", "最后,我们采用最终确定的最佳随机搜索模型,并使用测试集进行预测,并创建分类报告和混淆矩阵查看模型效果。" ] }, { "cell_type": "code", "execution_count": 14, "id": "78c50204", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 45 8]\n", " [ 2 116]]\n" ] } ], "source": [ "# 使用最好的估计器进行预测\n", "random_predictions = random_search.predict(X_test)\n", "\n", "from sklearn.metrics import classification_report, confusion_matrix\n", "# Confusion matrics\n", "print(confusion_matrix(y_test, random_predictions))" ] }, { "cell_type": "code", "execution_count": 15, "id": "6cfff439", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.96 0.85 0.90 53\n", " 1 0.94 0.98 0.96 118\n", "\n", " accuracy 0.94 171\n", " macro avg 0.95 0.92 0.93 171\n", "weighted avg 0.94 0.94 0.94 171\n", "\n" ] } ], "source": [ "# 分类评价报告\n", "print(classification_report(y_test, random_predictions))" ] }, { "cell_type": "markdown", "id": "79ea2f3c", "metadata": {}, "source": [ "## 贝叶斯搜索\n", "\n", "贝叶斯搜索使用贝叶斯优化技术对搜索空间进行建模,以尽快获得优化的参数值。它使用搜索空间的结构来优化搜索时间。贝叶斯搜索方法使用过去的评估结果来采样最有可能提供更好结果的新候选参数(如下图所示)。\n", "\n", "![贝叶斯搜索](images/3.png)" ] }, { "cell_type": "markdown", "id": "5a05f83c", "metadata": {}, "source": [ "[Scikit-Optimize](undefined \"undefined\")库带有 BayesSearchCV 实现。\n", "\n", "首先指定参数 C 和 gamma 以及参数值的候选样本的分布,如下所示:" ] }, { "cell_type": "code", "execution_count": 16, "id": "59b43872", "metadata": {}, "outputs": [], "source": [ "# !pip install scikit-optimize" ] }, { "cell_type": "code", "execution_count": 17, "id": "15205836", "metadata": {}, "outputs": [], "source": [ "from skopt import BayesSearchCV\n", "# 参数范围由下面的一个指定\n", "from skopt.space import Real, Categorical, Integer\n", "\n", "search_spaces = {\n", " 'C': Real(0.1, 1e+4),\n", " 'gamma': Real(1e-6, 1e+1, 'log-uniform'),\n", "}" ] }, { "cell_type": "markdown", "id": "69fd861a", "metadata": {}, "source": [ "接下来创建一个 BayesSearchCV 带参数 n_iter_search 的对象,并将使用训练数据来训练模型。" ] }, { "cell_type": "code", "execution_count": 18, "id": "34c048c0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667 .................\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667, score=0.600, total= 0.0s\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667 .................\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667, score=0.600, total= 0.0s\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667 .................\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667, score=0.600, total= 0.0s\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667 .................\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667, score=0.595, total= 0.0s\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667 .................\n", "[CV] C=4977.397744560771, gamma=0.023829160053700667, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=8758.852767384362, gamma=4.852350395777465 ....................\n", "[CV] C=8758.852767384362, gamma=4.852350395777465, score=0.600, total= 0.0s\n", "[CV] C=8758.852767384362, gamma=4.852350395777465 ....................\n", "[CV] C=8758.852767384362, gamma=4.852350395777465, score=0.600, total= 0.0s\n", "[CV] C=8758.852767384362, gamma=4.852350395777465 ....................\n", "[CV] C=8758.852767384362, gamma=4.852350395777465, score=0.600, total= 0.0s\n", "[CV] C=8758.852767384362, gamma=4.852350395777465 ....................\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[CV] C=8758.852767384362, gamma=4.852350395777465, score=0.595, total= 0.0s\n", "[CV] C=8758.852767384362, gamma=4.852350395777465 ....................\n", "[CV] C=8758.852767384362, gamma=4.852350395777465, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031 ..................\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031, score=0.600, total= 0.0s\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031 ..................\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031, score=0.600, total= 0.0s\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031 ..................\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031, score=0.600, total= 0.0s\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031 ..................\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031, score=0.595, total= 0.0s\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031 ..................\n", "[CV] C=2910.2258495677825, gamma=0.9018344522099031, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06 ................\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06, score=0.988, total= 0.0s\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06 ................\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06, score=0.925, total= 0.0s\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06 ................\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06, score=0.938, total= 0.0s\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06 ................\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06, score=0.924, total= 0.0s\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06 ................\n", "[CV] C=6201.076450375709, gamma=4.284329825969826e-06, score=0.949, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=8519.566314535297, gamma=5.566534140000436 ....................\n", "[CV] C=8519.566314535297, gamma=5.566534140000436, score=0.600, total= 0.0s\n", "[CV] C=8519.566314535297, gamma=5.566534140000436 ....................\n", "[CV] C=8519.566314535297, gamma=5.566534140000436, score=0.600, total= 0.0s\n", "[CV] C=8519.566314535297, gamma=5.566534140000436 ....................\n", "[CV] C=8519.566314535297, gamma=5.566534140000436, score=0.600, total= 0.0s\n", "[CV] C=8519.566314535297, gamma=5.566534140000436 ....................\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[CV] C=8519.566314535297, gamma=5.566534140000436, score=0.595, total= 0.0s\n", "[CV] C=8519.566314535297, gamma=5.566534140000436 ....................\n", "[CV] C=8519.566314535297, gamma=5.566534140000436, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108 ...................\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108, score=0.600, total= 0.0s\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108 ...................\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108, score=0.600, total= 0.0s\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108 ...................\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108, score=0.600, total= 0.0s\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108 ...................\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108, score=0.595, total= 0.0s\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108 ...................\n", "[CV] C=4992.078124786961, gamma=1.7807839452844108, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337 ..................\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337, score=0.600, total= 0.0s\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337 ..................\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[CV] C=9053.010519680269, gamma=0.11986519331682337, score=0.600, total= 0.0s\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337 ..................\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337, score=0.600, total= 0.0s\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337 ..................\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337, score=0.595, total= 0.0s\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337 ..................\n", "[CV] C=9053.010519680269, gamma=0.11986519331682337, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658 ..................\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658, score=0.600, total= 0.0s\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658 ..................\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658, score=0.600, total= 0.0s\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658 ..................\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658, score=0.600, total= 0.0s\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658 ..................\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658, score=0.595, total= 0.0s\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658 ..................\n", "[CV] C=8641.531299019009, gamma=0.07861618342270658, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n", "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924 .................\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924, score=0.600, total= 0.0s\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924 .................\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924, score=0.600, total= 0.0s\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924 .................\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924, score=0.600, total= 0.0s\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924 .................\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924, score=0.595, total= 0.0s\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924 .................\n", "[CV] C=1959.6879176103353, gamma=0.47778823806501924, score=0.608, total= 0.0s\n", "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895 ..................\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895, score=0.600, total= 0.0s\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895 ..................\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895, score=0.600, total= 0.0s\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895 ..................\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895, score=0.600, total= 0.0s\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895 ..................\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895, score=0.595, total= 0.0s\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895 ..................\n", "[CV] C=5084.272437915128, gamma=0.09627687266838895, score=0.608, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=10000.0, gamma=1e-06 ..........................................\n", "[CV] .............. C=10000.0, gamma=1e-06, score=0.988, total= 0.0s\n", "[CV] C=10000.0, gamma=1e-06 ..........................................\n", "[CV] .............. C=10000.0, gamma=1e-06, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=1e-06 ..........................................\n", "[CV] .............. C=10000.0, gamma=1e-06, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=1e-06 ..........................................\n", "[CV] .............. C=10000.0, gamma=1e-06, score=0.937, total= 0.0s\n", "[CV] C=10000.0, gamma=1e-06 ..........................................\n", "[CV] .............. C=10000.0, gamma=1e-06, score=0.975, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=0.1, gamma=1.4905928536395696e-06 .............................\n", "[CV] . C=0.1, gamma=1.4905928536395696e-06, score=0.925, total= 0.0s\n", "[CV] C=0.1, gamma=1.4905928536395696e-06 .............................\n", "[CV] . C=0.1, gamma=1.4905928536395696e-06, score=0.950, total= 0.0s\n", "[CV] C=0.1, gamma=1.4905928536395696e-06 .............................\n", "[CV] . C=0.1, gamma=1.4905928536395696e-06, score=0.863, total= 0.0s\n", "[CV] C=0.1, gamma=1.4905928536395696e-06 .............................\n", "[CV] . C=0.1, gamma=1.4905928536395696e-06, score=0.823, total= 0.0s\n", "[CV] C=0.1, gamma=1.4905928536395696e-06 .............................\n", "[CV] . C=0.1, gamma=1.4905928536395696e-06, score=0.949, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=10000.0, gamma=3.152182182099272e-06 ..........................\n", "[CV] C=10000.0, gamma=3.152182182099272e-06, score=0.988, total= 0.0s\n", "[CV] C=10000.0, gamma=3.152182182099272e-06 ..........................\n", "[CV] C=10000.0, gamma=3.152182182099272e-06, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=3.152182182099272e-06 ..........................\n", "[CV] C=10000.0, gamma=3.152182182099272e-06, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=3.152182182099272e-06 ..........................\n", "[CV] C=10000.0, gamma=3.152182182099272e-06, score=0.924, total= 0.0s\n", "[CV] C=10000.0, gamma=3.152182182099272e-06 ..........................\n", "[CV] C=10000.0, gamma=3.152182182099272e-06, score=0.975, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06 ................\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06, score=0.988, total= 0.0s\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06 ................\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06, score=0.938, total= 0.0s\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06 ................\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06, score=0.938, total= 0.0s\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06 ................\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06, score=0.924, total= 0.0s\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06 ................\n", "[CV] C=8258.305079914611, gamma=1.618411578782942e-06, score=0.962, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=10000.0, gamma=1.56539453002737e-06 ...........................\n", "[CV] C=10000.0, gamma=1.56539453002737e-06, score=0.988, total= 0.0s\n", "[CV] C=10000.0, gamma=1.56539453002737e-06 ...........................\n", "[CV] C=10000.0, gamma=1.56539453002737e-06, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=1.56539453002737e-06 ...........................\n", "[CV] C=10000.0, gamma=1.56539453002737e-06, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=1.56539453002737e-06 ...........................\n", "[CV] C=10000.0, gamma=1.56539453002737e-06, score=0.911, total= 0.0s\n", "[CV] C=10000.0, gamma=1.56539453002737e-06 ...........................\n", "[CV] C=10000.0, gamma=1.56539453002737e-06, score=0.962, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05 .........................\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05 .........................\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05, score=0.900, total= 0.0s\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05 .........................\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05, score=0.938, total= 0.0s\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05 .........................\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05, score=0.911, total= 0.0s\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05 .........................\n", "[CV] C=10000.0, gamma=1.3871576029368295e-05, score=0.924, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05 ...............\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05, score=0.975, total= 0.0s\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05 ...............\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05, score=0.925, total= 0.0s\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05 ...............\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05, score=0.912, total= 0.0s\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05 ...............\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05, score=0.911, total= 0.0s\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05 ...............\n", "[CV] C=57.93860986876622, gamma=2.0990813445227366e-05, score=0.937, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265 ...............\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265, score=0.950, total= 0.0s\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265 ...............\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265, score=0.950, total= 0.0s\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265 ...............\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265, score=0.912, total= 0.0s\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265 ...............\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265, score=0.911, total= 0.0s\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265 ...............\n", "[CV] C=164.1340366211841, gamma=0.00010481680034418265, score=0.911, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254 ...............\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254, score=0.963, total= 0.0s\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254 ...............\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254, score=0.912, total= 0.0s\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254 ...............\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254, score=0.912, total= 0.0s\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254 ...............\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254, score=0.899, total= 0.0s\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254 ...............\n", "[CV] C=9914.701087332833, gamma=0.00013060755725868254, score=0.899, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Fitting 5 folds for each of 1 candidates, totalling 5 fits\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325 ...............\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325, score=0.963, total= 0.0s\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325 ...............\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325, score=0.938, total= 0.0s\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325 ...............\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325, score=0.875, total= 0.0s\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325 ...............\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325, score=0.886, total= 0.0s\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325 ...............\n", "[CV] C=34.78128435599578, gamma=0.00046475689572079325, score=0.924, total= 0.0s\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s finished\n" ] }, { "data": { "text/plain": [ "BayesSearchCV(cv=5, estimator=SVC(), n_iter=20,\n", " search_spaces={'C': Real(low=0.1, high=10000.0, prior='uniform', transform='normalize'),\n", " 'gamma': Real(low=1e-06, high=10.0, prior='log-uniform', transform='normalize')},\n", " verbose=3)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_iter_search = 20\n", "bayes_search = BayesSearchCV(SVC(),\n", " search_spaces,\n", " n_iter=n_iter_search,\n", " cv=5,\n", " verbose=3)\n", "bayes_search.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "id": "fde80ed7", "metadata": {}, "source": [ "同样,一旦训练完成后,我们可以通过检查发现的最佳参数`BayesSearchCV`的`best_params_`属性,并在最佳估计`best_estimator_`属性:" ] }, { "cell_type": "code", "execution_count": 19, "id": "535b2598", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "OrderedDict([('C', 10000.0), ('gamma', 1e-06)])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bayes_search.best_params_" ] }, { "cell_type": "code", "execution_count": 20, "id": "dbe7aae4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "SVC(C=10000.0, gamma=1e-06)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bayes_search.best_estimator_" ] }, { "cell_type": "markdown", "id": "8baf1839", "metadata": {}, "source": [ "最后,我们采用贝叶斯搜索模型并使用测试集创建一些预测,并为它们创建分类报告和混淆矩阵。" ] }, { "cell_type": "code", "execution_count": 21, "id": "33737a41", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 46 7]\n", " [ 1 117]]\n" ] } ], "source": [ "bayes_predictions = bayes_search.predict(X_test)\n", "from sklearn.metrics import classification_report,confusion_matrix\n", "# 混淆矩阵\n", "print(confusion_matrix(y_test, bayes_predictions))" ] }, { "cell_type": "code", "execution_count": 22, "id": "f94ba35c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.98 0.87 0.92 53\n", " 1 0.94 0.99 0.97 118\n", "\n", " accuracy 0.95 171\n", " macro avg 0.96 0.93 0.94 171\n", "weighted avg 0.95 0.95 0.95 171\n", "\n" ] } ], "source": [ "# 分类评价报告\n", "print(classification_report(y_test, bayes_predictions))" ] }, { "cell_type": "markdown", "id": "b3754a7d", "metadata": {}, "source": [ "## 写在最后\n", "\n", "\n", "在本文中,我们介绍了 3 种最流行的超参数优化技术,这些技术用于获得最佳超参数集,从而训练稳健的机器学习模型。\n", "\n", "一般来说,如果组合的数量足够有限,我们可以使用**网格搜索**技术。但是当组合数量增加时,我们应该尝试**随机搜索**或**贝叶斯搜索**,因为它们在计算上并不非常消耗资源。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }