{ "cells": [ { "cell_type": "markdown", "id": "c1ac362f", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "id": "fa3437b9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# code for loading the format for the notebook\n", "import os\n", "\n", "# path : store the current path to convert back to it later\n", "path = os.getcwd()\n", "os.chdir(os.path.join('..', 'notebook_format'))\n", "\n", "from formats import load_style\n", "load_style(plot_style=False)" ] }, { "cell_type": "code", "execution_count": 2, "id": "a9a14370", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Author: Ethen\n", "\n", "Last updated: 2022-07-10\n", "\n", "Python implementation: CPython\n", "Python version : 3.7.11\n", "IPython version : 7.27.0\n", "\n", "xgboost: 1.6.1\n", "sklearn: 1.0.2\n", "ray : 1.6.0\n", "\n" ] } ], "source": [ "os.chdir(path)\n", "\n", "# 1. magic for inline plot\n", "# 2. magic to print version\n", "# 3. magic so that the notebook will reload external python modules\n", "# 4. magic to enable retina (high resolution) plots\n", "# https://gist.github.com/minrk/3301035\n", "%matplotlib inline\n", "%load_ext watermark\n", "%load_ext autoreload\n", "%autoreload 2\n", "%config InlineBackend.figure_format='retina'\n", "\n", "import xgboost as xgb\n", "import sklearn.datasets\n", "import sklearn.metrics as metrics\n", "from ray import tune\n", "from sklearn.model_selection import train_test_split\n", "from ray.tune.integration.xgboost import TuneReportCallback\n", "\n", "%watermark -a 'Ethen' -u -d -v -iv" ] }, { "cell_type": "markdown", "id": "55b86b88", "metadata": {}, "source": [ "# HyperParameter Tuning Ray Tune and HyperBand" ] }, { "cell_type": "markdown", "id": "307bc375", "metadata": {}, "source": [ "One of steps in training a machine learning model involves hyperparameter tuning, and two most common hyper parameter tuning strategies that we might first come across are grid and random search.\n", "\n", "In this article, we will take a look at how we can perform hyperparameter tuning using **Ray Tune**, as well as explore another hyperparameter tuning strategy called **HyperBand**.\n", "\n", "We'll be using xgboost library as well as a sample dataset provided by scikit-learn in this example, there will be no feature preprocessing as that is not the focus of this post." ] }, { "cell_type": "code", "execution_count": 3, "id": "862cc9d2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of rows: 569, cols: 30\n" ] }, { "data": { "text/plain": [ "array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,\n", " 1.189e-01],\n", " [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,\n", " 8.902e-02],\n", " [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,\n", " 8.758e-02],\n", " ...,\n", " [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,\n", " 7.820e-02],\n", " [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,\n", " 1.240e-01],\n", " [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,\n", " 7.039e-02]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bunch = sklearn.datasets.load_breast_cancer(return_X_y=False)\n", "print(f'number of rows: {bunch.data.shape[0]}, cols: {bunch.data.shape[1]}')\n", "bunch.data" ] }, { "cell_type": "code", "execution_count": 4, "id": "50cc560a", "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(bunch.data, bunch.target, test_size=0.25)" ] }, { "cell_type": "markdown", "id": "eb1c0b79", "metadata": {}, "source": [ "We first train a model using the default parameters to get a baseline performance number." ] }, { "cell_type": "code", "execution_count": 5, "id": "92d395c6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-logloss:0.46159\tvalidation_1-logloss:0.49076\n", "[10]\tvalidation_0-logloss:0.04348\tvalidation_1-logloss:0.15069\n", "[20]\tvalidation_0-logloss:0.01570\tvalidation_1-logloss:0.12307\n", "[30]\tvalidation_0-logloss:0.01028\tvalidation_1-logloss:0.11609\n", "[40]\tvalidation_0-logloss:0.00816\tvalidation_1-logloss:0.11306\n", "[50]\tvalidation_0-logloss:0.00732\tvalidation_1-logloss:0.11206\n", "[60]\tvalidation_0-logloss:0.00679\tvalidation_1-logloss:0.11029\n", "[70]\tvalidation_0-logloss:0.00637\tvalidation_1-logloss:0.10848\n", "[80]\tvalidation_0-logloss:0.00603\tvalidation_1-logloss:0.10662\n", "[90]\tvalidation_0-logloss:0.00576\tvalidation_1-logloss:0.10662\n", "[99]\tvalidation_0-logloss:0.00556\tvalidation_1-logloss:0.10615\n" ] }, { "data": { "text/plain": [ "XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,\n", " colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,\n", " early_stopping_rounds=None, enable_categorical=False,\n", " eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',\n", " importance_type=None, interaction_constraints='',\n", " learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4,\n", " max_delta_step=0, max_depth=6, max_leaves=0, min_child_weight=1,\n", " missing=nan, monotone_constraints='()', n_estimators=100,\n", " n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0,\n", " reg_alpha=0, reg_lambda=1, ...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_xgb = xgb.XGBClassifier()\n", "\n", "eval_set = [(X_train, y_train), (X_test, y_test)]\n", "model_xgb.fit(X_train, y_train, eval_set=eval_set, verbose=10)" ] }, { "cell_type": "markdown", "id": "6f9af31d", "metadata": {}, "source": [ "## Hyperparameter Tuning" ] }, { "cell_type": "markdown", "id": "e0a55189", "metadata": {}, "source": [ "To use hyperparameter tuning with ray, we need to:\n", "\n", "- Have a config dictionary, so tune can choose from a range of valid options.\n", "- Use the config dictionary in our model object.\n", "- Once we are done training the model, report all the necessary metrics." ] }, { "cell_type": "code", "execution_count": 6, "id": "4226fad3", "metadata": {}, "outputs": [], "source": [ "config = {\n", " #\"n_estimators\": tune.randint(30, 100),\n", " \"max_depth\": tune.randint(2, 6),\n", " \"colsample_bytree\": tune.uniform(0.8, 1.0),\n", " \"subsample\": tune.uniform(0.8, 1.0),\n", " \"learning_rate\": tune.loguniform(1e-4, 1e-1)\n", "}\n", "\n", "\n", "def ray_train(config, X_train, y_train, X_test, y_test):\n", " model = xgb.XGBClassifier(**config)\n", " eval_set = [(X_train, y_train), (X_test, y_test)]\n", " model.fit(X_train, y_train, eval_set=eval_set, verbose=False)\n", "\n", " log_loss_test = metrics.log_loss(y_test, model.predict_proba(X_test)[:, 1])\n", " tune_report_metrics = {'validation_1-logloss': round(log_loss_test, 3)}\n", " tune.report(**tune_report_metrics)" ] }, { "cell_type": "markdown", "id": "6debe3ef", "metadata": {}, "source": [ "For running hyperparameter tuning:\n", "\n", "- We pass our training function/callable, `ray_train`, as the first parameter. Here we leverage [`with_parameters`](https://docs.ray.io/en/latest/tune/faq.html#how-can-i-use-large-datasets-in-tune) so we can broadcast large objects to our trainable.\n", "- We specify additional necessary arguments such as what metrics to optimize for as well as resources, and the hyperparameter tuning config space.\n", "- ray allows us to specify a time budget along with `num_samples` of -1, this allows us to train for an infinite sample of configurations until a time budget is met." ] }, { "cell_type": "code", "execution_count": 7, "id": "42fb9f39", "metadata": {}, "outputs": [], "source": [ "def ray_hyperparameter_tuning(config, time_budget_s: int):\n", " analysis = tune.run(\n", " tune.with_parameters(ray_train, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test),\n", " config=config,\n", " metric='validation_1-logloss',\n", " mode='min',\n", " num_samples=-1,\n", " resources_per_trial={'cpu': 8},\n", " time_budget_s=time_budget_s,\n", " verbose=1\n", " )\n", " return analysis" ] }, { "cell_type": "code", "execution_count": 8, "id": "d30a493f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "== Status ==
Current time: 2022-07-09 23:18:29 (running for 00:02:03.06)
Memory usage on this node: 204.9/1007.3 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/80 CPUs, 0/0 GPUs, 0.0/810.33 GiB heap, 0.0/186.26 GiB objects
Current best trial: 27502_00264 with validation_1-logloss=0.113 and parameters={'max_depth': 2, 'colsample_bytree': 0.8522675185448618, 'subsample': 0.8360198940451542, 'learning_rate': 0.09915622430990471}
Result logdir: /home/mingyuliu/ray_results/ray_train_2022-07-09_23-16-25
Number of trials: 1222/infinite (1222 TERMINATED)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-09 23:18:29,903\tINFO tune.py:748 -- Total run time: 124.52 seconds (121.16 seconds for the tuning loop).\n" ] } ], "source": [ "analysis = ray_hyperparameter_tuning(config, time_budget_s=120)" ] }, { "cell_type": "code", "execution_count": 9, "id": "1c2cf912", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ran 1124 hyperparameter tuning experiments\n", "best metric: 0.113\n", "best config: {'max_depth': 2, 'colsample_bytree': 0.8522675185448618, 'subsample': 0.8360198940451542, 'learning_rate': 0.09915622430990471}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/mingyuliu/.local/lib/python3.7/site-packages/ray/tune/analysis/experiment_analysis.py:304: UserWarning: Dataframes will use '/' instead of '.' to delimit nested result keys in future versions of Ray. For forward compatibility, set the environment variable TUNE_RESULT_DELIM='/'\n", " \"Dataframes will use '/' instead of '.' to delimit \"\n" ] } ], "source": [ "num_done_experiments = analysis.results_df[analysis.results_df['done'] == True].shape[0]\n", "\n", "print(f'ran {num_done_experiments} hyperparameter tuning experiments')\n", "print('best metric: ', analysis.best_result['validation_1-logloss'])\n", "print('best config: ', analysis.best_result['config'])" ] }, { "cell_type": "markdown", "id": "3b5490c1", "metadata": {}, "source": [ "## HyperParameter Tuning with HyperBand" ] }, { "cell_type": "markdown", "id": "41f9a9b4", "metadata": {}, "source": [ "Apart from grid or random search, ray tune offers multiple hyperparameter tuning strategies, here we will be looking at one of them called **Hyperband**.\n", "\n", "**Hyperband** can be seen as successive halving algorithm on steriods that focuses on speeding up configuration evaluation, where configuration refers to one specific set of hypereparameters. To elaborate, successive halving works by allocating a certain amount of budget to a set of hyper parameter configurations, i.e. it runs the configuration for a few iterations to get a sense of their performance, after that it will start allocating more resources to more promising configurations, while tossing away other non-performing configurations. This process repeats until one configuration remains. One of the potential drawback with successive halving is that given some finite budget $B$ (e.g. training time), and number of configurations $n$, it is not clear a priori whether we should either consider many configurations (large $n$), each with a small average training time, or the opposite, i.e. consider a small number of configurations (large $B / n$), each having a larger average training time. In other words, as practitioners, how do we decide whether we want more \"depth\" or more \"breadth\". Let's now take a look at how Hyperband aims to address this issue:\n", "\n", "\n", "\n", "Looking at the psuedocode above, Hyperband takes in two inputs:\n", "\n", "- R: The maximum resources that can be allocated to a single configuration, e.g. number of iterations to run the algorithm.\n", "- $\\eta$: Controls the proportion of configurations to be discarded for each round of successive halving.\n", "\n", "Then it essentially performs a grid search over different possible values of $n$, associated with $n$ is a minimum resource $r$ that is allocated to each configuration. Lines 1-2, the outer loop, iterates over different values of $n$ and $r$, whereas the inner loop, lines 3–9, runs successive halving for the fixed $n$ and $r$.\n", "\n", "The following code chunk provides a vanilla implementation, and returns the resource allocation table." ] }, { "cell_type": "code", "execution_count": 10, "id": "4bb8304b", "metadata": {}, "outputs": [], "source": [ "from random import random\n", "from math import log, ceil\n", "import heapq\n", "import pandas as pd\n", "\n", "\n", "def hyperband(R, eta):\n", " s_max = int(log(R) / log(eta))\n", " B = (s_max + 1) * R\n", "\n", " rows = []\n", " for s in reversed(range(s_max + 1)):\n", " # initial number of configurations\n", " n = int(ceil(B / R / (s + 1) * eta ** s))\n", " # initial number of iterations per config\n", " r = R * eta ** (-s)\n", "\n", " # get hyperparameter configurations,\n", " # we use a random value to represent a sampled configuration\n", " # from a defined hyperparameter search space\n", " T = [random() for _ in range(n)]\n", " for i in range(s + 1):\n", " n_configs = n * eta ** (-i)\n", " n_iterations = r * eta ** (i)\n", " \n", " # run then return validation loss, here\n", " # we use a random value to represent the algorithm's\n", " # perform after taking in n_iterations and config as inputs\n", " losses = [(random(), t) for t in T]\n", "\n", " # return top k configurations, if we are minimizing the loss\n", " # then we pick the top k smallest\n", " top_k_losses = heapq.nsmallest(int(n_configs / eta), losses)\n", " T = [t for loss, t in top_k_losses]\n", "\n", " row = [s, n_configs, n_iterations]\n", " rows.append(row)\n", "\n", " return pd.DataFrame(rows, columns=['s', 'n_configs', 'n_iterations'])" ] }, { "cell_type": "code", "execution_count": 11, "id": "6f2cda78", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sn_configsn_iterations
0481.0000001.0
1427.0000003.0
249.0000009.0
343.00000027.0
441.00000081.0
5334.0000003.0
6311.3333339.0
733.77777827.0
831.25925981.0
9215.0000009.0
1025.00000027.0
1121.66666781.0
1218.00000027.0
1312.66666781.0
1405.00000081.0
\n", "
" ], "text/plain": [ " s n_configs n_iterations\n", "0 4 81.000000 1.0\n", "1 4 27.000000 3.0\n", "2 4 9.000000 9.0\n", "3 4 3.000000 27.0\n", "4 4 1.000000 81.0\n", "5 3 34.000000 3.0\n", "6 3 11.333333 9.0\n", "7 3 3.777778 27.0\n", "8 3 1.259259 81.0\n", "9 2 15.000000 9.0\n", "10 2 5.000000 27.0\n", "11 2 1.666667 81.0\n", "12 1 8.000000 27.0\n", "13 1 2.666667 81.0\n", "14 0 5.000000 81.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "R = 81\n", "eta = 3\n", "hyperband(R, eta)" ] }, { "cell_type": "markdown", "id": "6a04378c", "metadata": {}, "source": [ "Notice in the last row, $s = 0$, where every configuration is allocated $R$ resources, this setting is essentially performing our good old random search. On the other extreme end of things, the first row $s = 4$, we are essentially running 81 configurations each for only 1 iteration, then proceeding on to dropping 2/3 of the bottom performing configurations. By performing a mix of more exploration and more exploitation search strategies, it automatically accomodates for scenarios where an iterative training algorithm converges very slowly and require more resources to show differentiating performance (in these scenarios, we should consider smaller $n$), as well as the opposite end of the story, where we perform aggresive early stopping to provide massive speedups and scan many different combinations.\n", "\n", "To leverage hyperband tuning algorithm, we'll use Ray Tune's scheduler `ASHAScheduler` (recommended over the standard hyperband scheduler). We will also need to report our model's loss for every iteration back to tune. Ray Tune already comes with a callback class `TuneReportCallback` that does this without us having to implement it ourselves." ] }, { "cell_type": "code", "execution_count": 12, "id": "da72d054", "metadata": {}, "outputs": [], "source": [ "config = {\n", " #\"n_estimators\": tune.randint(30, 100),\n", " \"max_depth\": tune.randint(2, 6),\n", " \"colsample_bytree\": tune.uniform(0.8, 1.0),\n", " \"subsample\": tune.uniform(0.8, 1.0),\n", " \"learning_rate\": tune.loguniform(1e-4, 1e-1),\n", " \"callbacks\": [TuneReportCallback()]\n", "}\n", "\n", "\n", "def ray_train(config, X_train, y_train, X_test, y_test):\n", " model = xgb.XGBClassifier(**config)\n", " eval_set = [(X_train, y_train), (X_test, y_test)]\n", " model.fit(X_train, y_train, eval_set=eval_set, verbose=False)" ] }, { "cell_type": "code", "execution_count": 13, "id": "23a01809", "metadata": {}, "outputs": [], "source": [ "def ray_hyperparameter_tuning(config, time_budget_s: int):\n", " scheduler = tune.schedulers.ASHAScheduler(\n", " max_t=100,\n", " grace_period=10,\n", " reduction_factor=2\n", " )\n", " analysis = tune.run(\n", " tune.with_parameters(ray_train, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test),\n", " config=config,\n", " metric='validation_1-logloss',\n", " mode='min',\n", " num_samples=-1,\n", " scheduler=scheduler,\n", " resources_per_trial={'cpu': 8},\n", " time_budget_s=time_budget_s,\n", " verbose=1\n", " )\n", " return analysis" ] }, { "cell_type": "code", "execution_count": 14, "id": "e48a99c3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "== Status ==
Current time: 2022-07-09 23:20:37 (running for 00:02:02.53)
Memory usage on this node: 205.1/1007.3 GiB
Using AsyncHyperBand: num_stopped=1045\n", "Bracket: Iter 80.000: -0.13818851038748464 | Iter 40.000: -0.22655531805712026 | Iter 20.000: -0.4710694114853452 | Iter 10.000: -0.66420287727476
Resources requested: 0/80 CPUs, 0/0 GPUs, 0.0/810.33 GiB heap, 0.0/186.26 GiB objects
Current best trial: 745b9_00477 with validation_1-logloss=0.11607122477355668 and parameters={'max_depth': 2, 'colsample_bytree': 0.9223334898321227, 'subsample': 0.8638757189475169, 'learning_rate': 0.08456338948443516, 'callbacks': []}
Result logdir: /home/mingyuliu/ray_results/ray_train_2022-07-09_23-18-34
Number of trials: 1143/infinite (1143 TERMINATED)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-09 23:20:37,527\tINFO tune.py:748 -- Total run time: 122.87 seconds (120.41 seconds for the tuning loop).\n" ] } ], "source": [ "analysis = ray_hyperparameter_tuning(config, time_budget_s=120)" ] }, { "cell_type": "code", "execution_count": 15, "id": "9fba1ad0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ran 1045 hyperparameter tuning experiments\n", "best metric: 0.11607122477355668\n", "best config: {'max_depth': 2, 'colsample_bytree': 0.9223334898321227, 'subsample': 0.8638757189475169, 'learning_rate': 0.08456338948443516, 'callbacks': []}\n" ] } ], "source": [ "num_done_experiments = analysis.results_df[analysis.results_df['done'] == True].shape[0]\n", "\n", "print(f'ran {num_done_experiments} hyperparameter tuning experiments')\n", "print('best metric: ', analysis.best_result['validation_1-logloss'])\n", "print('best config: ', analysis.best_result['config'])" ] }, { "cell_type": "markdown", "id": "05b1826e", "metadata": {}, "source": [ "We can retrieve the best config, and re-train our model to check if we get similar performance numbers." ] }, { "cell_type": "code", "execution_count": 16, "id": "a0ea7514", "metadata": {}, "outputs": [], "source": [ "best_config = analysis.best_result['config']\n", "del best_config['callbacks']" ] }, { "cell_type": "code", "execution_count": 17, "id": "b25fcf52", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0]\tvalidation_0-logloss:0.62644\tvalidation_1-logloss:0.63045\n", "[10]\tvalidation_0-logloss:0.28323\tvalidation_1-logloss:0.31413\n", "[20]\tvalidation_0-logloss:0.16305\tvalidation_1-logloss:0.21245\n", "[30]\tvalidation_0-logloss:0.10716\tvalidation_1-logloss:0.17367\n", "[40]\tvalidation_0-logloss:0.07839\tvalidation_1-logloss:0.14843\n", "[50]\tvalidation_0-logloss:0.06050\tvalidation_1-logloss:0.13471\n", "[60]\tvalidation_0-logloss:0.04858\tvalidation_1-logloss:0.13113\n", "[70]\tvalidation_0-logloss:0.03952\tvalidation_1-logloss:0.12595\n", "[80]\tvalidation_0-logloss:0.03287\tvalidation_1-logloss:0.12086\n", "[90]\tvalidation_0-logloss:0.02801\tvalidation_1-logloss:0.11741\n", "[99]\tvalidation_0-logloss:0.02380\tvalidation_1-logloss:0.11607\n" ] }, { "data": { "text/plain": [ "XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,\n", " colsample_bylevel=1, colsample_bynode=1,\n", " colsample_bytree=0.9223334898321227, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric=None, gamma=0, gpu_id=-1,\n", " grow_policy='depthwise', importance_type=None,\n", " interaction_constraints='', learning_rate=0.08456338948443516,\n", " max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=2,\n", " max_leaves=0, min_child_weight=1, missing=nan,\n", " monotone_constraints='()', n_estimators=100, n_jobs=0,\n", " num_parallel_tree=1, predictor='auto', random_state=0,\n", " reg_alpha=0, reg_lambda=1, ...)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_xgb = xgb.XGBClassifier(**best_config)\n", "\n", "eval_set = [(X_train, y_train), (X_test, y_test)]\n", "model_xgb.fit(X_train, y_train, eval_set=eval_set, verbose=10)" ] }, { "cell_type": "markdown", "id": "f64dbfbe", "metadata": {}, "source": [ "Ray Tune provides different hyperparameter tuning algorithms other than the classic grid or random search, here we only looked at one of them, Hyperband.\n", "\n", "Caveat: If learning rate is a hyperparameter, smaller values will likely result in inferior performance at the beginning, but may outperform other configurations if given sufficent amount of time. Hence, when using hyperhand like hyperparameter tuning methods, it might not be able to find the small learning rate and many iterations combinations that can squeeze out performance." ] }, { "cell_type": "markdown", "id": "e271d46a", "metadata": {}, "source": [ "# Reference" ] }, { "cell_type": "markdown", "id": "6f1c6338", "metadata": {}, "source": [ "- [Ray Documentation: Tuning XGBoost parameters](https://docs.ray.io/en/latest/tune/examples/tune-xgboost.html)\n", "- [Blog: Tuning hyperparams fast with Hyperband](http://fastml.com/tuning-hyperparams-fast-with-hyperband/)\n", "- [Blog: A (Slightly) Better Budget Allocation for Hyperband](https://blog.dataiku.com/a-slightly-better-budget-allocation-for-hyperband)\n", "- [Blog: Hyper-parameter optimization algorithms: a short review](https://medium.com/criteo-engineering/hyper-parameter-optimization-algorithms-2fe447525903#124a)\n", "- [Blog: HyperBand and BOHB: Understanding State of the Art Hyperparameter Optimization Algorithms](https://neptune.ai/blog/hyperband-and-bohb-understanding-state-of-the-art-hyperparameter-optimization-algorithms)\n", "- [Paper: L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar - Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization (2016)](https://arxiv.org/abs/1603.06560)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "237.8px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 5 }