{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Testing different Hyperparameters and Benchmarking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we will cover how to test different hyperparameters for a particular dataset and how to benchmark different parameters across a group of datasets using AzureML. We assume familiarity with the basic concepts and parameters, which are discussed in the [01_training_introduction.ipynb](01_training_introduction.ipynb), [02_mask_rcnn.ipynb](02_mask_rcnn.ipynb) and [03_training_accuracy_vs_speed.ipynb](03_training_accuracy_vs_speed.ipynb) notebooks. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will be using a Faster R-CNN model with ResNet-50 backbone to find all objects in an image belonging to 4 categories: 'can', 'carton', 'milk_bottle', 'water_bottle'. We will then conduct hyper-parameter tuning to find the best set of parameters for this model. For this, we present an overall process of utilizing AzureML, specifically [Hyperdrive](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) which can train and evaluate many different parameter combinations in parallel. We demonstrate the following key steps: \n",
"* Configure AzureML Workspace\n",
"* Create Remote Compute Target (GPU cluster)\n",
"* Prepare Data\n",
"* Prepare Training Script\n",
"* Setup and Run Hyperdrive Experiment\n",
"* Model Import, Re-train and Test\n",
"\n",
"This notebook is very similar to the [24_exploring_hyperparameters_on_azureml.ipynb](../../classification/notebooks/24_exploring_hyperparameters_on_azureml.ipynb) hyperdrive notebook used for image classification. For key concepts of AzureML see this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml?view=azure-ml-py&toc=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fpython%2Fapi%2Fazureml_py_toc%2Ftoc.json%3Fview%3Dazure-ml-py&bc=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fpython%2Fazureml_py_breadcrumb%2Ftoc.json%3Fview%3Dazure-ml-py) on model training and evaluation."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import os\n",
"import sys\n",
"from distutils.dir_util import copy_tree\n",
"import numpy as np\n",
"import scrapbook as sb\n",
"import uuid\n",
"\n",
"import azureml.core\n",
"from azureml.core import Workspace, Experiment\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"import azureml.data\n",
"from azureml.train.estimator import Estimator\n",
"from azureml.train.hyperdrive import (\n",
" RandomParameterSampling, GridParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal, choice, uniform\n",
")\n",
"import azureml.widgets as widgets\n",
"\n",
"sys.path.append(\"../../\")\n",
"from utils_cv.common.azureml import get_or_create_workspace\n",
"from utils_cv.common.data import unzip_url\n",
"from utils_cv.detection.data import Urls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ensure edits to libraries are loaded and plotting is shown in the notebook."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
"%autoreload 2\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now define some parameters which will be used in this notebook:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true,
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"# Azure resources\n",
"subscription_id = \"YOUR_SUBSCRIPTION_ID\"\n",
"resource_group = \"YOUR_RESOURCE_GROUP_NAME\" \n",
"workspace_name = \"YOUR_WORKSPACE_NAME\" \n",
"workspace_region = \"YOUR_WORKSPACE_REGION\" #Possible values eastus, eastus2, etc.\n",
"\n",
"# Choose a size for our cluster and the maximum number of nodes\n",
"VM_SIZE = \"STANDARD_NC6\" #STANDARD_NC6S_V3\"\n",
"MAX_NODES = 8\n",
"\n",
"# Hyperparameter grid search space\n",
"IM_MAX_SIZES = [600] #Default is 1333 pixels, defining small values here to speed up training\n",
"LEARNING_RATES = [1e-4, 3e-4, 1e-3, 3e-3, 1e-2]\n",
"\n",
"# Image data\n",
"DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)\n",
"\n",
"# Path to utils_cv library\n",
"UTILS_DIR = os.path.join('..', '..', 'utils_cv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Config AzureML workspace\n",
"Below we setup (or load an existing) AzureML workspace, and get all its details as follows. Note that the resource group and workspace will get created if they do not yet exist. For more information regaring the AzureML workspace see also the [20_azure_workspace_setup.ipynb](../../classification/notebooks/20_azure_workspace_setup.ipynb) notebook in the image classification folder.\n",
"\n",
"To simplify clean-up (see end of this notebook), we recommend creating a new resource group to run this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"ws = get_or_create_workspace(\n",
" subscription_id, resource_group, workspace_name, workspace_region\n",
")\n",
"\n",
"# Print the workspace attributes\n",
"print(\n",
" \"Workspace name: \" + ws.name,\n",
" \"Workspace region: \" + ws.location,\n",
" \"Subscription id: \" + ws.subscription_id,\n",
" \"Resource group: \" + ws.resource_group,\n",
" sep=\"\\n\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Create Remote Target\n",
"We create a GPU cluster as our remote compute target. If a cluster with the same name already exists in our workspace, the script will load it instead. This [link](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#compute-targets-for-training) provides more information about how to set up a compute target on different locations.\n",
"\n",
"By default, the VM size is set to use STANDARD\\_NC6 machines. However, if quota is available, our recommendation is to use STANDARD\\_NC6S\\_V3 machines which come with the much faster V100 GPU. We set the minimum number of nodes to zero so that the cluster won't incur additional compute charges when not in use."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating a new compute target...\n",
"Creating\n",
"Succeeded\n",
"AmlCompute wait for completion finished\n",
"Minimum number of nodes requested have been provisioned\n",
"{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-09-30T18:20:25.067000+00:00', 'errors': None, 'creationTime': '2019-09-30T18:18:06.217384+00:00', 'modifiedTime': '2019-09-30T18:20:38.458332+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 8, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}\n"
]
}
],
"source": [
"CLUSTER_NAME = \"gpu-cluster\"\n",
"\n",
"try:\n",
" # Retrieve if a compute target with the same cluster name already exists\n",
" compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)\n",
" print(\"Found existing compute target.\")\n",
"\n",
"except ComputeTargetException:\n",
" # If it doesn't already exist, we create a new one with the name provided\n",
" print(\"Creating a new compute target...\")\n",
" compute_config = AmlCompute.provisioning_configuration(\n",
" vm_size=VM_SIZE, min_nodes=0, max_nodes=MAX_NODES\n",
" )\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n",
"# we can use get_status() to get a detailed status for the current cluster.\n",
"print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The compute cluster and its status can be seen in the portal. For example in the screenshot below, its automatically resizing (eventually to 0 nodes) to adjust to the number of open runs:\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Prepare data\n",
"In this notebook, we'll use the Fridge Objects dataset, which is already stored in the correct format. We then upload our data to the AzureML workspace.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": true
},
"outputs": [],
"source": [
"# Retrieving default datastore that got automatically created when we setup a workspace\n",
"ds = ws.get_default_datastore()\n",
"\n",
"# We now upload the data to a unique sub-folder to avoid accidentially training/evaluating also including older images.\n",
"data_subfolder = str(uuid.uuid4())\n",
"ds.upload(\n",
" src_dir=DATA_PATH, target_path=data_subfolder, overwrite=False, show_progress=True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Here's where you can see the data in your portal: \n",
"\n",
"\n",
"### 4. Prepare training script\n",
"\n",
"Next step is to prepare scripts that AzureML Hyperdrive will use to train and evaluate models with selected hyperparameters."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Create a folder for the training script and copy the utils_cv library into that folder\n",
"script_folder = os.path.join(os.getcwd(), \"hyperdrive\")\n",
"os.makedirs(script_folder, exist_ok=True)\n",
"_ = copy_tree(UTILS_DIR, os.path.join(script_folder, 'utils_cv'))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting C:\\Users\\pabuehle\\Desktop\\ComputerVision\\scenarios\\detection\\hyperdrive/train.py\n"
]
}
],
"source": [
"%%writefile $script_folder/train.py\n",
"\n",
"# Use different matplotlib backend to avoid error during remote execution\n",
"import matplotlib \n",
"matplotlib.use(\"Agg\") \n",
"import matplotlib.pyplot as plt\n",
"\n",
"import os\n",
"import sys\n",
"import argparse\n",
"import numpy as np\n",
"from pathlib import Path\n",
"from azureml.core import Run\n",
"from utils_cv.detection.dataset import DetectionDataset\n",
"from utils_cv.detection.model import DetectionLearner, get_pretrained_fasterrcnn\n",
"from utils_cv.common.gpu import which_processor\n",
"which_processor()\n",
"\n",
"# Parse arguments passed by Hyperdrive\n",
"parser = argparse.ArgumentParser()\n",
"parser.add_argument('--data-folder', type=str, dest='data_dir')\n",
"parser.add_argument('--data-subfolder', type=str, dest='data_subfolder')\n",
"parser.add_argument('--epochs', type=int, dest='epochs', default=20) \n",
"parser.add_argument('--batch_size', type=int, dest='batch_size', default=2)\n",
"parser.add_argument('--learning_rate', type=float, dest='learning_rate', default=1e-4)\n",
"parser.add_argument('--min_size', type=int, dest='min_size', default=800)\n",
"parser.add_argument('--max_size', type=int, dest='max_size', default=1333)\n",
"parser.add_argument('--rpn_pre_nms_top_n_train', type=int, dest='rpn_pre_nms_top_n_train', default=2000)\n",
"parser.add_argument('--rpn_pre_nms_top_n_test', type=int, dest='rpn_pre_nms_top_n_test', default=1000)\n",
"parser.add_argument('--rpn_post_nms_top_n_train', type=int, dest='rpn_post_nms_top_n_train', default=2000)\n",
"parser.add_argument('--rpn_post_nms_top_n_test', type=int, dest='rpn_post_nms_top_n_test', default=1000)\n",
"parser.add_argument('--rpn_nms_thresh', type=float, dest='rpn_nms_thresh', default=0.7)\n",
"parser.add_argument('--box_score_thresh', type=float, dest='box_score_thresh', default=0.05)\n",
"parser.add_argument('--box_nms_thresh', type=float, dest='box_nms_thresh', default=0.5)\n",
"parser.add_argument('--box_detections_per_img', type=int, dest='box_detections_per_img', default=100)\n",
"args = parser.parse_args()\n",
"params = vars(args)\n",
"print(f\"params = {params}\")\n",
"\n",
"# Get training and validation data\n",
"data_path = os.path.join(params['data_dir'], params[\"data_subfolder\"])\n",
"print(f\"data_path={data_path}\")\n",
"data = DetectionDataset(data_path, train_pct=0.5, batch_size = params[\"batch_size\"])\n",
"print(\n",
" f\"Training dataset: {len(data.train_ds)} | Training DataLoader: {data.train_dl} \\n \\\n",
" Testing dataset: {len(data.test_ds)} | Testing DataLoader: {data.test_dl}\"\n",
")\n",
"\n",
"# Get model\n",
"model = get_pretrained_fasterrcnn(\n",
" num_classes = len(data.labels)+1,\n",
" min_size = params[\"min_size\"],\n",
" max_size = params[\"max_size\"],\n",
" rpn_pre_nms_top_n_train = params[\"rpn_pre_nms_top_n_train\"],\n",
" rpn_pre_nms_top_n_test = params[\"rpn_pre_nms_top_n_test\"],\n",
" rpn_post_nms_top_n_train = params[\"rpn_post_nms_top_n_train\"], \n",
" rpn_post_nms_top_n_test = params[\"rpn_post_nms_top_n_test\"],\n",
" rpn_nms_thresh = params[\"rpn_nms_thresh\"],\n",
" box_score_thresh = params[\"box_score_thresh\"], \n",
" box_nms_thresh = params[\"box_nms_thresh\"],\n",
" box_detections_per_img = params[\"box_detections_per_img\"]\n",
")\n",
"detector = DetectionLearner(data, model)\n",
"\n",
"# Run Training\n",
"detector.fit(params[\"epochs\"], lr=params[\"learning_rate\"], print_freq=30)\n",
"print(f\"Average precision after each epoch: {detector.ap}\")\n",
"\n",
"# Get accuracy on test set at IOU=0.5:0.95\n",
"acc = float(detector.ap[-1][\"bbox\"])\n",
"\n",
"# Add log entries\n",
"run = Run.get_context()\n",
"run.log(\"accuracy\", float(acc)) # Logging our primary metric 'accuracy'\n",
"run.log(\"data_dir\", params[\"data_dir\"])\n",
"run.log(\"epochs\", params[\"epochs\"])\n",
"run.log(\"batch_size\", params[\"batch_size\"])\n",
"run.log(\"learning_rate\", params[\"learning_rate\"])\n",
"run.log(\"min_size\", params[\"min_size\"])\n",
"run.log(\"max_size\", params[\"max_size\"])\n",
"run.log(\"rpn_pre_nms_top_n_train\", params[\"rpn_pre_nms_top_n_train\"])\n",
"run.log(\"rpn_pre_nms_top_n_test\", params[\"rpn_pre_nms_top_n_test\"])\n",
"run.log(\"rpn_post_nms_top_n_train\", params[\"rpn_post_nms_top_n_train\"])\n",
"run.log(\"rpn_post_nms_top_n_test\", params[\"rpn_post_nms_top_n_test\"])\n",
"run.log(\"rpn_nms_thresh\", params[\"rpn_nms_thresh\"])\n",
"run.log(\"box_score_thresh\", params[\"box_score_thresh\"])\n",
"run.log(\"box_nms_thresh\", params[\"box_nms_thresh\"])\n",
"run.log(\"box_detections_per_img\", params[\"box_detections_per_img\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. Setup and run Hyperdrive experiment\n",
"\n",
"#### 5.1 Create Experiment \n",
"Experiment is the main entry point into experimenting with AzureML. To create new Experiment or get the existing one, we pass our experimentation name 'hyperparameter-tuning'.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"exp = Experiment(workspace=ws, name=\"hyperparameter-tuning\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 5.2. Define search space\n",
"\n",
"Now we define the search space of hyperparameters. To test discrete parameter values use 'choice()', and for uniform sampling use 'uniform()'. For more options, see [Hyperdrive parameter expressions](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.parameter_expressions?view=azure-ml-py).\n",
"\n",
"Hyperdrive provides three different parameter sampling methods: 'RandomParameterSampling', 'GridParameterSampling', and 'BayesianParameterSampling'. Details about each method can be found [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the 'GridParameterSampling'."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Grid-search\n",
"param_sampling = GridParameterSampling(\n",
" {\"--learning_rate\": choice(LEARNING_RATES), \"--max_size\": choice(IM_MAX_SIZES)}\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"AzureML Estimator is the building block for training. An Estimator encapsulates the training code and parameters, the compute resources and runtime environment for a particular training scenario.\n",
"We create one for our experimentation with the dependencies our model requires as follows:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"script_params = {\"--data-folder\": ds.as_mount(), \"--data-subfolder\": data_subfolder}\n",
"\n",
"est = Estimator(\n",
" source_directory=script_folder,\n",
" script_params=script_params,\n",
" compute_target=compute_target,\n",
" entry_script=\"train.py\",\n",
" use_gpu=True,\n",
" pip_packages=[\"nvidia-ml-py3\", \"fastai\"],\n",
" conda_packages=[\n",
" \"scikit-learn\",\n",
" \"pycocotools>=2.0\",\n",
" \"torchvision==0.3\",\n",
" \"cudatoolkit==9.0\",\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now create a HyperDriveConfig object which includes information about parameter space sampling, termination policy, primary metric, estimator and the compute target to execute the experiment runs on."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"hyperdrive_run_config = HyperDriveConfig(\n",
" estimator=est,\n",
" hyperparameter_sampling=param_sampling,\n",
" policy=None, # Do not use any early termination\n",
" primary_metric_name=\"accuracy\",\n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
" max_total_runs=None, # Set to none to run all possible grid parameter combinations,\n",
" max_concurrent_runs=MAX_NODES,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 5.3 Run Experiment\n",
"\n",
"We now run the parameter sweep and visualize the experiment progress using the `RunDetails` widget:\n",
"\n",
"\n",
"Once completed, the accuracy for the different runs can be analyzed via the widget, for example below is a plot of the accuracy versus learning rate below (for two different image sizes)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Url to hyperdrive run on the Azure portal: https://mlworkspace.azure.ai/portal/subscriptions/989b90f7-da4f-41f9-84c9-44848802052d/resourceGroups/pabuehle_delme2_hyperdrive/providers/Microsoft.MachineLearningServices/workspaces/pabuehle_ws/experiments/hyperparameter-tuning/runs/hyperparameter-tuning_1569867670036119\n"
]
}
],
"source": [
"hyperdrive_run = exp.submit(config=hyperdrive_run_config)\n",
"print(f\"Url to hyperdrive run on the Azure portal: {hyperdrive_run.get_portal_url()}\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0f08d8354768463788969180b5d031ab",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"widgets.RunDetails(hyperdrive_run).show()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'runId': 'hyperparameter-tuning_1569867670036119',\n",
" 'target': 'gpu-cluster',\n",
" 'status': 'Completed',\n",
" 'startTimeUtc': '2019-09-30T18:21:10.209419Z',\n",
" 'endTimeUtc': '2019-09-30T18:55:14.128089Z',\n",
" 'properties': {'primary_metric_config': '{\"name\": \"accuracy\", \"goal\": \"maximize\"}',\n",
" 'runTemplate': 'HyperDrive',\n",
" 'azureml.runsource': 'hyperdrive',\n",
" 'platform': 'AML',\n",
" 'baggage': 'eyJvaWQiOiAiNWFlYTJmMzAtZjQxZC00ZDA0LWJiOGUtOWU0NGUyZWQzZGQ2IiwgInRpZCI6ICI3MmY5ODhiZi04NmYxLTQxYWYtOTFhYi0yZDdjZDAxMWRiNDciLCAidW5hbWUiOiAiMDRiMDc3OTUtOGRkYi00NjFhLWJiZWUtMDJmOWUxYmY3YjQ2In0',\n",
" 'ContentSnapshotId': '0218d18a-3557-4fdf-8c29-8d43297621ed'},\n",
" 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://pabuehlestorage579709b90.blob.core.windows.net/azureml/ExperimentRun/dcid.hyperparameter-tuning_1569867670036119/azureml-logs/hyperdrive.txt?sv=2018-11-09&sr=b&sig=PCMArksPFcTc1rk1DMhFP6wvoZbhrpmnZbDCV8uInWw%3D&st=2019-09-30T18%3A45%3A14Z&se=2019-10-01T02%3A55%3A14Z&sp=r'}}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hyperdrive_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To load an existing Hyperdrive Run instead of start new one, we can use \n",
"```python\n",
"hyperdrive_run = azureml.train.hyperdrive.HyperDriveRun(exp, , hyperdrive_run_config=hyperdrive_run_config)\n",
"```\n",
"We also can cancel the Run with \n",
"```python \n",
"hyperdrive_run.cancel().\n",
"```\n",
"\n",
"Once all the child-runs are finished, we can get the best run and the metrics."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Best Run Id:hyperparameter-tuning_1569867670036119_4\n",
"Run(Experiment: hyperparameter-tuning,\n",
"Id: hyperparameter-tuning_1569867670036119_4,\n",
"Type: azureml.scriptrun,\n",
"Status: Completed)\n",
"\n",
"* Best hyperparameters:\n",
"{'--data-folder': '$AZUREML_DATAREFERENCE_workspaceblobstore', '--data-subfolder': '01679d79-1c47-49b8-88c3-d657f36b0c0f', '--learning_rate': '0.01', '--max_size': '600'}\n",
"Accuracy = 0.8918015856432082\n",
"Learning Rate = 0.01\n"
]
}
],
"source": [
"# Get best run and print out metrics\n",
"best_run = hyperdrive_run.get_best_run_by_primary_metric()\n",
"best_run_metrics = best_run.get_metrics()\n",
"parameter_values = best_run.get_details()[\"runDefinition\"][\"arguments\"]\n",
"best_parameters = dict(zip(parameter_values[::2], parameter_values[1::2]))\n",
"\n",
"print(f\"* Best Run Id:{best_run.id}\")\n",
"print(best_run)\n",
"print(\"\\n* Best hyperparameters:\")\n",
"print(best_parameters)\n",
"print(f\"Accuracy = {best_run_metrics['accuracy']}\")\n",
"print(\"Learning Rate =\", best_run_metrics[\"learning_rate\"])"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'run_id': 'hyperparameter-tuning_1569867670036119_4',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.01, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.8918015856432082,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_3',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.003, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.8760658534573615,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_2',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.001, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.8282478586888209,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_1',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.0003, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.7405032357605712,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_0',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.0001, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.47537724312149304,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_preparation',\n",
" 'hyperparameters': None,\n",
" 'best_primary_metric': None,\n",
" 'status': 'Completed'}]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hyperdrive_run.get_children_sorted_by_primary_metric()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. Clean up\n",
"\n",
"To avoid unnecessary expenses, all resources which were created in this notebook need to get deleted once parameter search is concluded. To simplify this clean-up step, we recommended creating a new resource group to run this notebook. This resource group can then be deleted, e.g. using the Azure Portal, which will remove all created resources."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Log some outputs using scrapbook which are used during testing to verify correct notebook execution\n",
"sb.glue(\"best_accuracy\", best_run_metrics[\"accuracy\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Concluding Remark\n",
"\n",
"In this notebook, we showed how to tune hyperparameters by utilizing Azure Machine Learning service. Complex and powerful models often have many hyperparameters that affect on the model's accuracy, and it is not practical to tune the model without using a GPU cluster. \n",
"\n",
"For example, a training and evaluation loop of a model on a single Standard NC6 VM could take a long time depending on the number of images, epochs, and image size. If a single run took 10 minute, a thorough investigation of 100 different combinations of hyperparameters would take over 16 hours on the single VM. \n",
"\n",
"With AzureML, as we shown in this notebook, we can easily setup different size of GPU cluster fits to our problem and utilize different sampling techniques to navigate through the huge search space efficiently, and tweak the experiment with different criteria and algorithms for further research."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (cv)",
"language": "python",
"name": "cv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}