{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Building Models for Accuracy vs. Speed\n",
"\n",
"The goal of this notebook is to understand how to train a model with different parameters to achieve either a highly accurate but slow during inference model, or a model with fast inference but lower accuracy.\n",
"\n",
"For example, in IoT settings the inferencing device has limited computational capabilities. This means we need to design our models to have a small memory footprint. In contrast, medical scenarios often require the highest possible accuracy because the cost of mis-classification could impact the well-being of a patient. In this scenario, the accuracy of the model can not be compromised. \n",
"\n",
"We have conducted various experiments on diverse datasets to find parameters which work well in a wide variety of settings balancing high accuracy or fast inference. In this notebook, we provide these parameters so that your initial models can be trained without any parameter tuning. For most datasets, these parameters are close to optimal. In the second part of the notebook, we provide guidelines on how to fine-tune these parameters based on how they impact the model.\n",
"\n",
"We recommend first training your model with the default parameters, evaluating the results, and then fine-tuning parameters to achieve better results as necessary."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents:\n",
"* [Training a High Accuracy, Fast Inference, or Small Size Classifier](#model)\n",
" * [Choosing between two types of models](#choosing)\n",
" * [Pre-processing](#preprocessing)\n",
" * [Training](#training)\n",
" * [Evaluation](#evaluation)\n",
"* [Fine tuning our models](#finetuning)\n",
" * [DNN architectures](#dnn)\n",
" * [Key parameters](#key-parameters)\n",
" * [Additional parameters](#other-parameters)\n",
" * [Testing parameters](#testing-parameters)\n",
"* [Appendix](#appendix)\n",
" * [Learning rate](#appendix-learning-rate)\n",
" * [Image size](#appendix-imsize)\n",
" * [How we found good parameters](#appendix-good-parameters)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Training a High Accuracy, Fast Inference, or Small Size Classifier "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's first verify our fast.ai version:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.0.57'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import fastai\n",
"fastai.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ensure edits to libraries are loaded and plotting is shown in the notebook."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
"%autoreload 2\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import all the functions we need."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append(\"../../\")\n",
"import os\n",
"from pathlib import Path\n",
"import scrapbook as sb\n",
"\n",
"from fastai.metrics import accuracy\n",
"from fastai.vision import (\n",
" models, ImageList, imagenet_stats, cnn_learner, get_transforms, open_image, partial\n",
")\n",
"\n",
"from utils_cv.classification.data import Urls, is_data_multilabel\n",
"from utils_cv.classification.model import hamming_accuracy, TrainMetricsRecorder\n",
"from utils_cv.common.data import unzip_url\n",
"from utils_cv.common.gpu import db_num_workers, which_processor"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fast.ai version = 1.0.57\n",
"Torch is using GPU: Tesla V100-PCIE-16GB\n"
]
}
],
"source": [
"print(f\"Fast.ai version = {fastai.__version__}\")\n",
"which_processor()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we've set up our notebook, let's set the hyperparameters based on which model type was selected."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Choosing between types of models "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For most scenarios, computer vision practitioners want to create a high accuracy model, a fast-inference model or a small size model. Set your `MODEL_TYPE` variable to one of the following: `\"high_accuracy\"`, `\"fast_inference\"`, or `\"small_size\"`.\n",
"\n",
"We will use the `FridgeObjects` dataset from a [previous notebook](01_training_introduction.ipynb) again. You can replace the `DATA_PATH` variable with your own data.\n",
"\n",
"When choosing the batch size, remember that even mid-level GPUs run out of memory when training a deeper ResNet model with larger image resolutions. If you get an _out of memory_ error, try reducing the batch size by a factor of 2."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"# Choose between \"high_accuracy\", \"fast_inference\", or \"small_size\"\n",
"MODEL_TYPE = \"fast_inference\"\n",
"\n",
"# Path to your data\n",
"DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)\n",
"\n",
"# Epochs to train for\n",
"EPOCHS_HEAD = 4\n",
"EPOCHS_BODY = 12\n",
"LEARNING_RATE = 1e-4\n",
"BATCH_SIZE = 16 \n",
"\n",
"#Set parameters based on your selected model.\n",
"assert MODEL_TYPE in [\"high_accuracy\", \"fast_inference\", \"small_size\"]\n",
"if MODEL_TYPE == \"high_accuracy\":\n",
" ARCHITECTURE = models.resnet50\n",
" IM_SIZE = 500 \n",
" \n",
"if MODEL_TYPE == \"fast_inference\":\n",
" ARCHITECTURE = models.resnet18\n",
" IM_SIZE = 300 \n",
"\n",
"if MODEL_TYPE == \"small_size\":\n",
" ARCHITECTURE = models.squeezenet1_1\n",
" IM_SIZE = 300 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll automatically determine if your dataset is a multi-label or traditional (single-label) classification problem. To do so, we'll use the `is_data_multilabel` helper function. In order to detect whether or not a dataset is multi-label, the helper function will check to see if the datapath contains a csv file that has a column 'labels' where the values are space-delimited. You can inspect the function by calling `is_data_multilabel??`.\n",
"\n",
"This function assumes that your multi-label dataset is structured in the recommended format shown in the [multilabel notebook](02_multilabel_classification.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"multilabel = is_data_multilabel(DATA_PATH)\n",
"metric = accuracy if not multilabel else hamming_accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pre-processing \n",
"\n",
"JPEG decoding represents a performance bottleneck on systems with powerful GPUs which can slow down training significantly. We recommend creating a down-sized copy of the dataset if training takes too long, or if you require multiple training runs to evaluate different parameters. \n",
"\n",
"The following function will automate image downsizing.\n",
"```python\n",
"from utils_cv.classification.data import downsize_imagelist\n",
"\n",
"downsize_imagelist(\n",
" im_list = ImageList.from_folder(Path(DATA_PATH)),\n",
" out_dir = \"downsized_images\", \n",
" max_dim = IM_SIZE\n",
")\n",
"```\n",
"\n",
"Once complete, update the `DATA_PATH` variable to point to `out_dir` so that this notebook uses these resized images. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training \n",
"\n",
"We'll now re-apply the same steps we did in the [01_training_introduction](01_training_introduction.ipynb) notebook here."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the data:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"label_list = (\n",
" (\n",
" ImageList.from_folder(Path(DATA_PATH))\n",
" .split_by_rand_pct(valid_pct=0.2, seed=10)\n",
" .label_from_folder()\n",
" )\n",
" if not multilabel\n",
" else (\n",
" ImageList.from_csv(Path(DATA_PATH), \"labels.csv\", folder=\"images\")\n",
" .split_by_rand_pct(valid_pct=0.2, seed=10)\n",
" .label_from_df(label_delim=\" \")\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"data = (\n",
" label_list.transform(tfms=get_transforms(), size=IM_SIZE)\n",
" .databunch(bs=BATCH_SIZE, num_workers = db_num_workers())\n",
" .normalize(imagenet_stats)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create the learner."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"learn = cnn_learner(data, ARCHITECTURE, metrics=metric, \n",
" callback_fns=[partial(TrainMetricsRecorder, show_graph=True)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Train the last layer for a few epochs, this can use a larger rate since most of the DNN is fixed."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" \n",
" \n",
" epoch \n",
" train_loss \n",
" valid_loss \n",
" train_accuracy \n",
" valid_accuracy \n",
" time \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 2.419261 \n",
" 1.657503 \n",
" 0.250000 \n",
" 0.307692 \n",
" 00:14 \n",
" \n",
" \n",
" 1 \n",
" 1.778618 \n",
" 0.761048 \n",
" 0.583333 \n",
" 0.653846 \n",
" 00:09 \n",
" \n",
" \n",
" 2 \n",
" 1.316656 \n",
" 0.523479 \n",
" 0.781250 \n",
" 0.769231 \n",
" 00:11 \n",
" \n",
" \n",
" 3 \n",
" 1.052359 \n",
" 0.491936 \n",
" 0.833333 \n",
" 0.807692 \n",
" 00:09 \n",
" \n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(EPOCHS_HEAD, 10 * LEARNING_RATE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unfreeze the layers."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"learn.unfreeze()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fine-tune the network for the remaining epochs."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" epoch \n",
" train_loss \n",
" valid_loss \n",
" train_accuracy \n",
" valid_accuracy \n",
" time \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 0.367272 \n",
" 0.446086 \n",
" 0.843750 \n",
" 0.807692 \n",
" 00:11 \n",
" \n",
" \n",
" 1 \n",
" 0.401298 \n",
" 0.381388 \n",
" 0.854167 \n",
" 0.807692 \n",
" 00:11 \n",
" \n",
" \n",
" 2 \n",
" 0.326690 \n",
" 0.309591 \n",
" 0.906250 \n",
" 0.846154 \n",
" 00:11 \n",
" \n",
" \n",
" 3 \n",
" 0.280659 \n",
" 0.304625 \n",
" 0.906250 \n",
" 0.884615 \n",
" 00:11 \n",
" \n",
" \n",
" 4 \n",
" 0.268848 \n",
" 0.249857 \n",
" 0.927083 \n",
" 0.846154 \n",
" 00:11 \n",
" \n",
" \n",
" 5 \n",
" 0.229108 \n",
" 0.192554 \n",
" 0.968750 \n",
" 0.923077 \n",
" 00:11 \n",
" \n",
" \n",
" 6 \n",
" 0.208633 \n",
" 0.224482 \n",
" 0.979167 \n",
" 0.923077 \n",
" 00:10 \n",
" \n",
" \n",
" 7 \n",
" 0.191411 \n",
" 0.206568 \n",
" 0.968750 \n",
" 0.923077 \n",
" 00:10 \n",
" \n",
" \n",
" 8 \n",
" 0.169821 \n",
" 0.233692 \n",
" 0.979167 \n",
" 0.923077 \n",
" 00:09 \n",
" \n",
" \n",
" 9 \n",
" 0.157189 \n",
" 0.247892 \n",
" 0.989583 \n",
" 0.884615 \n",
" 00:05 \n",
" \n",
" \n",
" 10 \n",
" 0.146495 \n",
" 0.253730 \n",
" 0.979167 \n",
" 0.923077 \n",
" 00:04 \n",
" \n",
" \n",
" 11 \n",
" 0.131543 \n",
" 0.267920 \n",
" 0.989583 \n",
" 0.884615 \n",
" 00:04 \n",
" \n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(EPOCHS_BODY, LEARNING_RATE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluation "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In [01_training introduction](01_training_introduction.ipynb), we demonstrated evaluating a CV model using the performance metrics for precision, recall and ROC. In this section, we will evaluate our model using the following characteristics:\n",
"- accuracy (performance)\n",
"- inference speed\n",
"- parameter export size / memory footprint required"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Performance \n",
"To keep things simple, we just look at the final evaluation metric on the validation set."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy on validation set: 0.88\n"
]
}
],
"source": [
"_, validation_accuracy = learn.validate(learn.data.valid_dl, metrics=[metric])\n",
"print(f\"{metric.__name__} on validation set: {float(validation_accuracy):2.2f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Inference speed\n",
"\n",
"Time model inference speed."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"im_folder = learn.data.classes[0] if not multilabel else 'images'\n",
"im = open_image(f\"{(Path(DATA_PATH)/im_folder).ls()[0]}\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"23.1 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"learn.predict(im)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Memory footprint\n",
"\n",
"Export the model to inspect the size of the model file."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"learn.export(f\"{MODEL_TYPE}\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'fast_inference' is 44.77MB.\n"
]
}
],
"source": [
"size_in_mb = os.path.getsize(Path(DATA_PATH)/MODEL_TYPE) / (1024*1024.)\n",
"print(f\"'{MODEL_TYPE}' is {round(size_in_mb, 2)}MB.\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"application/scrapbook.scrap.json+json": {
"data": [
0.807692289352417,
0.807692289352417,
0.8461538553237915,
0.8846153616905212,
0.8461538553237915,
0.9230769276618958,
0.9230769276618958,
0.9230769276618958,
0.9230769276618958,
0.8846153616905212,
0.9230769276618958,
0.8846153616905212
],
"encoder": "json",
"name": "training_accuracies",
"version": 1
}
},
"metadata": {
"scrapbook": {
"data": true,
"display": false,
"name": "training_accuracies"
}
},
"output_type": "display_data"
},
{
"data": {
"application/scrapbook.scrap.json+json": {
"data": 0.8846153616905212,
"encoder": "json",
"name": "validation_accuracy",
"version": 1
}
},
"metadata": {
"scrapbook": {
"data": true,
"display": false,
"name": "validation_accuracy"
}
},
"output_type": "display_data"
}
],
"source": [
"# Preserve some of the notebook outputs\n",
"training_accuracies = [x[0].numpy().ravel()[0] for x in learn.recorder.metrics]\n",
"sb.glue(\"training_accuracies\", training_accuracies)\n",
"sb.glue(\"validation_accuracy\", float(validation_accuracy))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fine-tuning parameters \n",
"\n",
"If you use the default parameters we have provided, you can get good results across a wide variety of datasets. However, as in most machine learning projects, getting the best possible results for a new dataset often requires tuning the parameters further. The following section provides guidelines on optimizing for accuracy, inference speed, or model size for a given dataset. We'll go through the parameters that will make the largest impact on your model as well as the parameters that may not be worth modifying.\n",
"\n",
"Generally speaking, models for image classification come with a trade-off between training time versus model accuracy. The four parameters that have the biggest impact on this trade-off are the DNN architecture, image resolution, learning rate, and number of epochs. DNN architecture and image resolution will additionally affect the model's inference time and memory footprint. As a rule of thumb, deeper networks with high image resolution will achieve higher accuracy at the cost of large model sizes and low training and inference speeds. Shallow networks with low image resolution will result in models with fast inference speed, fast training speeds and low model sizes at the cost of the model accuracy. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DNN architectures \n",
"\n",
"When choosing an architecture, we want to make sure it fits our requirements for accuracy, memory footprint, inference speed and training speeds. Some DNNs have hundreds of layers and end up with a large memory footprint and millions of parameters to tune, while others are compact and small enough to fit onto memory limited edge devices. \n",
"\n",
"Lets take a __squeezenet1_1__ model, a __resnet18__ model and __resnet50__ model and compare these using an experiment over diverse set of 6 datasets. (More about the datasets in the appendix below.)\n",
"\n",
"![architecture_comparisons](media/architecture_comparisons.png)\n",
"\n",
"As you can see from the graph, there is a clear trade-off when deciding between the models. \n",
"\n",
"In terms of accuracy, __resnet50__ outperforms the rest, but it also suffers from having the highest memory footprint, and the longest training and inference times. Alternatively, __squeezenet1_1__ performs the worst in terms of accuracy, but has the smallest memory footprint.\n",
"\n",
"Generally speaking, given enough data, the deeper DNN and the higher the image resolution, the higher the accuracy you'll be able to achieve with your model.\n",
"\n",
"---\n",
"\n",
"See the code to generate the graphs \n",
"\n",
"\n",
"### Code snippet to generate graphs in this cell\n",
"```python\n",
"import pandas as pd\n",
"from utils_cv.classification.parameter_sweeper import add_value_labels\n",
"\n",
"%matplotlib inline\n",
"\n",
"df = pd.DataFrame(\n",
" {\n",
" \"accuracy\": [0.9472, 0.9190, 0.8251],\n",
" \"training_duration\": [385.3, 280.5, 272.5],\n",
" \"inference_duration\": [34.2, 27.8, 27.6],\n",
" \"memory\": [99, 45, 4.9],\n",
" \"model\": [\"resnet50\", \"resnet18\", \"squeezenet1_1\"],\n",
" }\n",
").set_index(\"model\")\n",
"\n",
"ax1, ax2, ax3, ax4 = df.plot.bar(\n",
" rot=90, subplots=True, legend=False, figsize=(8, 10)\n",
")\n",
"\n",
"for ax in [ax1, ax2, ax3, ax4]:\n",
" for i in [0, 1, 2]:\n",
" if i == 0:\n",
" ax.get_children()[i].set_color(\"r\")\n",
" if i == 1:\n",
" ax.get_children()[i].set_color(\"g\")\n",
" if i == 2:\n",
" ax.get_children()[i].set_color(\"b\")\n",
"\n",
"ax1.set_title(\"Accuracy (%)\")\n",
"ax2.set_title(\"Training Duration (seconds)\")\n",
"ax3.set_title(\"Inference Time (seconds)\")\n",
"ax4.set_title(\"Memory Footprint (mb)\")\n",
"\n",
"ax1.set_ylabel(\"%\")\n",
"ax2.set_ylabel(\"seconds\")\n",
"ax3.set_ylabel(\"seconds\")\n",
"ax4.set_ylabel(\"mb\")\n",
"\n",
"ax1.set_ylim(top=df[\"accuracy\"].max() * 1.3)\n",
"ax2.set_ylim(top=df[\"training_duration\"].max() * 1.3)\n",
"ax3.set_ylim(top=df[\"inference_duration\"].max() * 1.3)\n",
"ax4.set_ylim(top=df[\"memory\"].max() * 1.3)\n",
"\n",
"add_value_labels(ax1, percentage=True)\n",
"add_value_labels(ax2)\n",
"add_value_labels(ax3)\n",
"add_value_labels(ax4)\n",
"```\n",
"\n",
"
\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Key parameters \n",
"This section examines some of the key parameters when training a deep learning model for image classification. The table below shows default parameters we recommend using.\n",
"## Key Parameters \n",
"This section examines some of the key parameters used in training a deep learning model for image classification. The table below shows default parameters:\n",
"\n",
"| Parameter | Default Value |\n",
"| --- | --- |\n",
"| Learning Rate | 1e-4 |\n",
"| Epochs | 15 |\n",
"| Batch Size | 16 |\n",
"| Image Size | 300 X 300 |\n",
"\n",
"__Learning rate__ \n",
"\n",
"Learning rate or the step size is used when optimizing your model with gradient descent and tends to be one of the most important parameters to set when training your model. If your learning rate is set too low, training will progress very slowly since we're only making tiny updates to the weights in your network. However, if your learning rate is too high, it can cause undesirable divergent behavior in your loss function. Generally speaking, choosing a learning rate of 1e-4 was shown to work pretty well for most datasets. If you want to reduce training time (by training for fewer epochs), you can try setting the learning rate to 5e-3, but if you notice a spike in the training or validation loss, you may want to try reducing your learning rate.\n",
"\n",
"The learning rate section of [appendix below](#appendix-learning-rate) has more detail.\n",
"\n",
"__Epochs__\n",
"\n",
"An _epoch_ is a full gradient descent iteration cycle across the DNN architecture. Unless your are working with small datasets, using around 15 epochs tends to work well in most cases. When it comes to choosing the number of epochs, a common question is - _Won't too many epochs cause overfitting_? It turns out that the accuracy on the test set typically does not get worse, even if training for too many epochs. Unless your are working with small datasets, using around 15 epochs tends to work pretty well in most cases.\n",
"\n",
"\n",
"__Batch Size__\n",
"\n",
"Batch size is the number of training samples you use in order to make one update to the model parameters. A batch size of 16 or 32 works well for most cases. Larger batch sizes help speed training time, but at the expense of an increased DNN memory consumption. Depending on your dataset and the GPU you have, you can start with a batch size of 32, and move down to 16 if your GPU doesn't have enough memory. After a certain batch size, improvements to training speed become marginal, hence we found 16 (or 32) to be a good trade-off between training speed and memory consumption. If you reduce the batch size, you may also have to reduce the learning rate.\n",
"\n",
"__Image size__ \n",
"\n",
"The default image size is __300 X 300__ pixels. Using higher image resolutions can help improve model accuracy but will result in longer training and inference times.\n",
"\n",
"The [appendix below](#appendix-imsize) discussed impact of image resolution in detail.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional parameters \n",
"\n",
"There are many hyperparameters used to tune DNNs, though in our experience the exact value of these parameters does not have a large impact on model performance, training/inference speed, or memory footprint. \n",
"\n",
"| Parameter | Good Default Value |\n",
"| --- | --- |\n",
"| Dropout | 0.5 or (0.5 on the final layer and 0.25 on all previous layers) |\n",
"| Weight Decay | 0.01 |\n",
"| Momentum | 0.9 or (min=0.85 and max=0.95 when using cyclical momentum) |\n",
"\n",
"__Dropout__\n",
"\n",
"Dropout is used to discard activations at random when training your model. It is a way to keep the model from over-fitting on the training data. In fast.ai, dropout is set to 0.5 by default on the final layer, and 0.25 on all other layer. Unless there is clear evidence of over-fitting, this dropout tends to work well.\n",
"\n",
"__Weight decay (L2 regularization)__\n",
"\n",
"Weight decay is a regularization term applied to help minimize the network loss function. We can think of it as a penalty applied to the weights after an update to prevent the weights from growing too large (the model may not converge if the weights get too large). In fast.ai, the default weight decay is 0.1, which we find to be almost always acceptable. \n",
"\n",
"__Momentum__\n",
"\n",
"Momentum is a way to accelerate convergence when training a model. Momentum uses a weighted average of the most recent updates applied to the current update. Fast.ai implements cyclical momentum when calling `fit_one_cycle()`, so the momentum will fluctuate over the course of the training cycle. We control this by setting a min and max value for the momentum. \n",
"\n",
"When using `fit_one_cycle()`, the default values of max=0.95 and min=0.85 are known to work well. If using `fit()`, the default value of 0.9 has been shown to work well. These defaults represent a good trade-off between training speed and the ability of the model to converge to a good solution."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing parameters \n",
"The `ParameterSweeper` module can be used to search over the parameter space to locate the \"best\" value for that parameter. See the [exploring hyperparameters notebook](./11_exploring_hyperparameters.ipynb) for more information. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Appendix "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning rate \n",
"\n",
"Setting a low learning rate requires training for many epochs to reach convergence. However, each additional epoch directly increases the model training time in a linear fashion. To efficiently build a model, it helps to set the learning rate in the correct range. To demonstrate this, we've tested various learning rates on 6 different datasets, training the full network for 3 or 15 epochs.\n",
"\n",
"![lr_comparisons](media/lr_comparisons.png)\n",
"\n",
"Understanding the diagram \n",
"\n",
" \n",
"> The figure on the left shows results of different learning rates on different datasets at 15 epochs. We see that a learning rate of 1e-4 results in the the best overall accuracy for the datasets we have tested. Notice there is a pretty significant variance between the datasets and a learning rate of 1-e3 may work better for some datasets. \n",
"In the figure on the right, at 15 epochs, the results of 1e-4 are only slightly better than that of 1e-3. However, at only 3 epochs, a learning rate of 1e-3 out performs the smaller learning rates. This makes sense since we're limiting the training to only 3 epochs so a model that updates weights more quickly should perform better. Effectively a larger learning rate gets closer to the model convergence. This result indicates higher learning rates (such as 1e-3) may help minimize the training time, and lower learning rates (such as 1e-5) may be better if training time is not constrained. \n",
"\n",
"
\n",
" \n",
"\n",
"In both figures, we can see that a learning rate of 1e-3 and 1e-4 tends to workin general. We observe that training with 3 epochs results in lower accuracy compared to 15 epochs. And in some cases, smaller learning rates may prevent the DNN from converging. \n",
"\n",
"Fast.ai has implemented [one cycle policy with cyclical momentum](https://arxiv.org/abs/1803.09820) which adaptively optimizes the learning rate. This function takes a maximum learning rate value as an argument to help the method avoid the convergence problem. Replace the `fit()` method with `fit_one_cycle()` to use this capability.\n",
"\n",
"---\n",
"\n",
"See the code to generate the graphs \n",
"\n",
"\n",
"### Code snippet to generate graphs in this cell\n",
"\n",
"```python\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"df_dataset_comp = pd.DataFrame(\n",
" {\n",
" \"fashionTexture\": [0.8749, 0.8481, 0.2491, 0.670318, 0.1643],\n",
" \"flickrLogos32Subset\": [0.9069, 0.9064, 0.2179, 0.7175, 0.1073],\n",
" \"food101Subset\": [0.9294, 0.9127, 0.6891, 0.9090, 0.555827],\n",
" \"fridgeObjects\": [0.9591, 0.9727, 0.272727, 0.6136, 0.181818],\n",
" \"lettuce\": [0.8992, 0.9104, 0.632, 0.8192, 0.5120],\n",
" \"recycle_v3\": [0.9527, 0.9581, 0.766, 0.8591, 0.2876],\n",
" \"learning_rate\": [0.000100, 0.001000, 0.010000, 0.000010, 0.000001],\n",
" }\n",
").set_index(\"learning_rate\")\n",
"\n",
"df_epoch_comp = pd.DataFrame(\n",
" {\n",
" \"3_epochs\": [0.823808, 0.846394, 0.393808, 0.455115, 0.229120],\n",
" \"15_epochs\": [0.920367, 0.918067, 0.471138, 0.764786, 0.301474],\n",
" \"learning_rate\": [0.000100, 0.001000, 0.010000, 0.000010, 0.000001],\n",
" }\n",
").set_index(\"learning_rate\")\n",
"\n",
"plt.figure(1)\n",
"ax1 = plt.subplot(121)\n",
"ax2 = plt.subplot(122)\n",
"\n",
"vals = ax2.get_yticks()\n",
"\n",
"df_dataset_comp.sort_index().plot(kind=\"bar\", rot=0, figsize=(15, 6), ax=ax1)\n",
"vals = ax1.get_yticks()\n",
"ax1.set_yticklabels([\"{:,.2%}\".format(x) for x in vals])\n",
"ax1.set_ylim(0, 1)\n",
"ax1.set_ylabel(\"Accuracy (%)\")\n",
"ax1.set_title(\"Accuracy of Learning Rates by Datasets @ 15 Epochs\")\n",
"ax1.legend(loc=2)\n",
"\n",
"df_epoch_comp.sort_index().plot(kind=\"bar\", rot=0, figsize=(15, 6), ax=ax2)\n",
"ax2.set_yticklabels([\"{:,.2%}\".format(x) for x in vals])\n",
"ax2.set_ylim(0, 1)\n",
"ax2.set_title(\"Accuracy of Learning Rates by Epochs\")\n",
"ax2.legend(loc=2)\n",
"```\n",
"\n",
"
\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Image resolution \n",
"\n",
"A model's input image resolution also impacts model accuracy. Usually, convolutional neural networks are able to take advantage of higher resolution images, especially if the object-of-interest is small in the overall image. But how does image size impact other model aspects? \n",
"\n",
"We find that image size doesn't significantly affect the model's memory footprint given the same network architecture, but it has a huge effect on GPU memory. Image size also impacts training and inference speeds.\n",
"\n",
"![imsize_comparisons](media/imsize_comparisons.png)\n",
"\n",
"From the results, we can see that an increase in image resolution from __300 X 300__ to __500 X 500__ will increase the performance marginally at the cost of a longer training duration and slower inference speed.\n",
"\n",
"---\n",
"\n",
"See the code to generate the graphs \n",
"\n",
"\n",
"### Code snippet to generate graphs in this cell\n",
"\n",
"```python\n",
"import pandas as pd\n",
"from utils_cv.classification.parameter_sweeper import add_value_labels\n",
"%matplotlib inline\n",
"\n",
"df = pd.DataFrame(\n",
" {\n",
" \"accuracy\": [0.9472, 0.9394, 0.9190, 0.9164, 0.8366, 0.8251],\n",
" \"training_duration\": [385.3, 218.8, 280.5, 184.9, 272.5, 182.3],\n",
" \"inference_duration\": [34.2, 23.2, 27.8, 17.8, 27.6, 17.3],\n",
" \"model\": [\n",
" \"resnet50 X 499\",\n",
" \"resnet50 X 299\",\n",
" \"resnet18 X 499\",\n",
" \"resnet18 X 299\",\n",
" \"squeezenet1_1 X 499\",\n",
" \"squeezenet1_1 X 299\",\n",
" ],\n",
" }\n",
").set_index(\"model\")\n",
"df\n",
"\n",
"ax1, ax2, ax3 = df.plot.bar(\n",
" rot=90, subplots=True, legend=False, figsize=(12, 12)\n",
")\n",
"\n",
"for i in range(len(df)):\n",
" if i < len(df) / 3:\n",
" ax1.get_children()[i].set_color(\"r\")\n",
" ax2.get_children()[i].set_color(\"r\")\n",
" ax3.get_children()[i].set_color(\"r\")\n",
" if i >= len(df) / 3 and i < 2 * len(df) / 3:\n",
" ax1.get_children()[i].set_color(\"g\")\n",
" ax2.get_children()[i].set_color(\"g\")\n",
" ax3.get_children()[i].set_color(\"g\")\n",
" if i >= 2 * len(df) / 3:\n",
" ax1.get_children()[i].set_color(\"b\")\n",
" ax2.get_children()[i].set_color(\"b\")\n",
" ax3.get_children()[i].set_color(\"b\")\n",
"\n",
"ax1.set_title(\"Accuracy (%)\")\n",
"ax2.set_title(\"Training Duration (seconds)\")\n",
"ax3.set_title(\"Inference Speed (seconds)\")\n",
"\n",
"ax1.set_ylabel(\"%\")\n",
"ax2.set_ylabel(\"seconds\")\n",
"ax3.set_ylabel(\"seconds\")\n",
"\n",
"ax1.set_ylim(top=df[\"accuracy\"].max() * 1.2)\n",
"ax2.set_ylim(top=df[\"training_duration\"].max() * 1.2)\n",
"ax3.set_ylim(top=df[\"inference_duration\"].max() * 1.2)\n",
"\n",
"add_value_labels(ax1, percentage=True)\n",
"add_value_labels(ax2)\n",
"add_value_labels(ax3)\n",
"```\n",
"\n",
"
\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How we found good default parameters \n",
"\n",
"We conducted various experiments to explore the impact of different hyperparameters on a model's _accuracy_, _training duration_, _inference speed_, and _memory footprint_. \n",
"\n",
"### Datasets \n",
"\n",
"For our experiments, we relied on a set of six different classification datasets. When selecting these datasets, we wanted to have a variety of image types with different amounts of data and number of classes. \n",
"\n",
"| Dataset Name | Number of Images | Number of Classes | \n",
"| --- | --- | --- |\n",
"| food101Subset | 5000 | 5 | \n",
"| flickrLogos32Subset | 2740 | 33 | \n",
"| fashionTexture | 1716 | 11 | \n",
"| recycle_v3 | 564 | 11 | \n",
"| lettuce | 380 | 2 |\n",
"| fridgeObjects | 134 | 4 | \n",
"\n",
"### Model Characteristics \n",
"\n",
"In our experiment, we look at these characteristics to evaluate the impact of various parameters. Here is how we calculated each of the following metrics:\n",
"\n",
"- __Accuracy__ metric is averaged over 5 runs for each dataset. \n",
"\n",
"\n",
"- __Training Duration__ metric is the average duration over 5 runs for each dataset.\n",
"\n",
"\n",
"- __Inference Speed__ is the time it takes the model to run 1000 predictions.\n",
"\n",
"\n",
"- __Memory Footprint__ is the size of the model pickle file output from the `learn.export(...)` method. \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (cv)",
"language": "python",
"name": "cv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}