{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Building Models for Accuracy vs. Speed\n", "\n", "The goal of this notebook is to understand how to train a model with different parameters to achieve either a highly accurate but slow during inference model, or a model with fast inference but lower accuracy.\n", "\n", "For example, in IoT settings the inferencing device has limited computational capabilities. This means we need to design our models to have a small memory footprint. In contrast, medical scenarios often require the highest possible accuracy because the cost of mis-classification could impact the well-being of a patient. In this scenario, the accuracy of the model can not be compromised. \n", "\n", "We have conducted various experiments on diverse datasets to find parameters which work well in a wide variety of settings balancing high accuracy or fast inference. In this notebook, we provide these parameters so that your initial models can be trained without any parameter tuning. For most datasets, these parameters are close to optimal. In the second part of the notebook, we provide guidelines on how to fine-tune these parameters based on how they impact the model.\n", "\n", "We recommend first training your model with the default parameters, evaluating the results, and then fine-tuning parameters to achieve better results as necessary." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents:\n", "* [Training a High Accuracy, Fast Inference, or Small Size Classifier](#model)\n", " * [Choosing between two types of models](#choosing)\n", " * [Pre-processing](#preprocessing)\n", " * [Training](#training)\n", " * [Evaluation](#evaluation)\n", "* [Fine tuning our models](#finetuning)\n", " * [DNN architectures](#dnn)\n", " * [Key parameters](#key-parameters)\n", " * [Additional parameters](#other-parameters)\n", " * [Testing parameters](#testing-parameters)\n", "* [Appendix](#appendix)\n", " * [Learning rate](#appendix-learning-rate)\n", " * [Image size](#appendix-imsize)\n", " * [How we found good parameters](#appendix-good-parameters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Training a High Accuracy, Fast Inference, or Small Size Classifier " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first verify our fast.ai version:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1.0.57'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastai\n", "fastai.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ensure edits to libraries are loaded and plotting is shown in the notebook." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import all the functions we need." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import sys\n", "sys.path.append(\"../../\")\n", "import os\n", "from pathlib import Path\n", "import scrapbook as sb\n", "\n", "from fastai.metrics import accuracy\n", "from fastai.vision import (\n", " models, ImageList, imagenet_stats, cnn_learner, get_transforms, open_image, partial\n", ")\n", "\n", "from utils_cv.classification.data import Urls, is_data_multilabel\n", "from utils_cv.classification.model import hamming_accuracy, TrainMetricsRecorder\n", "from utils_cv.common.data import unzip_url\n", "from utils_cv.common.gpu import db_num_workers, which_processor" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fast.ai version = 1.0.57\n", "Torch is using GPU: Tesla V100-PCIE-16GB\n" ] } ], "source": [ "print(f\"Fast.ai version = {fastai.__version__}\")\n", "which_processor()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've set up our notebook, let's set the hyperparameters based on which model type was selected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Choosing between types of models " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For most scenarios, computer vision practitioners want to create a high accuracy model, a fast-inference model or a small size model. Set your `MODEL_TYPE` variable to one of the following: `\"high_accuracy\"`, `\"fast_inference\"`, or `\"small_size\"`.\n", "\n", "We will use the `FridgeObjects` dataset from a [previous notebook](01_training_introduction.ipynb) again. You can replace the `DATA_PATH` variable with your own data.\n", "\n", "When choosing the batch size, remember that even mid-level GPUs run out of memory when training a deeper ResNet model with larger image resolutions. If you get an _out of memory_ error, try reducing the batch size by a factor of 2." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "# Choose between \"high_accuracy\", \"fast_inference\", or \"small_size\"\n", "MODEL_TYPE = \"fast_inference\"\n", "\n", "# Path to your data\n", "DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)\n", "\n", "# Epochs to train for\n", "EPOCHS_HEAD = 4\n", "EPOCHS_BODY = 12\n", "LEARNING_RATE = 1e-4\n", "BATCH_SIZE = 16 \n", "\n", "#Set parameters based on your selected model.\n", "assert MODEL_TYPE in [\"high_accuracy\", \"fast_inference\", \"small_size\"]\n", "if MODEL_TYPE == \"high_accuracy\":\n", " ARCHITECTURE = models.resnet50\n", " IM_SIZE = 500 \n", " \n", "if MODEL_TYPE == \"fast_inference\":\n", " ARCHITECTURE = models.resnet18\n", " IM_SIZE = 300 \n", "\n", "if MODEL_TYPE == \"small_size\":\n", " ARCHITECTURE = models.squeezenet1_1\n", " IM_SIZE = 300 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll automatically determine if your dataset is a multi-label or traditional (single-label) classification problem. To do so, we'll use the `is_data_multilabel` helper function. In order to detect whether or not a dataset is multi-label, the helper function will check to see if the datapath contains a csv file that has a column 'labels' where the values are space-delimited. You can inspect the function by calling `is_data_multilabel??`.\n", "\n", "This function assumes that your multi-label dataset is structured in the recommended format shown in the [multilabel notebook](02_multilabel_classification.ipynb)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "multilabel = is_data_multilabel(DATA_PATH)\n", "metric = accuracy if not multilabel else hamming_accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pre-processing \n", "\n", "JPEG decoding represents a performance bottleneck on systems with powerful GPUs which can slow down training significantly. We recommend creating a down-sized copy of the dataset if training takes too long, or if you require multiple training runs to evaluate different parameters. \n", "\n", "The following function will automate image downsizing.\n", "```python\n", "from utils_cv.classification.data import downsize_imagelist\n", "\n", "downsize_imagelist(\n", " im_list = ImageList.from_folder(Path(DATA_PATH)),\n", " out_dir = \"downsized_images\", \n", " max_dim = IM_SIZE\n", ")\n", "```\n", "\n", "Once complete, update the `DATA_PATH` variable to point to `out_dir` so that this notebook uses these resized images. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training \n", "\n", "We'll now re-apply the same steps we did in the [01_training_introduction](01_training_introduction.ipynb) notebook here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the data:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "label_list = (\n", " (\n", " ImageList.from_folder(Path(DATA_PATH))\n", " .split_by_rand_pct(valid_pct=0.2, seed=10)\n", " .label_from_folder()\n", " )\n", " if not multilabel\n", " else (\n", " ImageList.from_csv(Path(DATA_PATH), \"labels.csv\", folder=\"images\")\n", " .split_by_rand_pct(valid_pct=0.2, seed=10)\n", " .label_from_df(label_delim=\" \")\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "data = (\n", " label_list.transform(tfms=get_transforms(), size=IM_SIZE)\n", " .databunch(bs=BATCH_SIZE, num_workers = db_num_workers())\n", " .normalize(imagenet_stats)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create the learner." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "learn = cnn_learner(data, ARCHITECTURE, metrics=metric, \n", " callback_fns=[partial(TrainMetricsRecorder, show_graph=True)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the last layer for a few epochs, this can use a larger rate since most of the DNN is fixed." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| epoch | \n", "train_loss | \n", "valid_loss | \n", "train_accuracy | \n", "valid_accuracy | \n", "time | \n", "
|---|---|---|---|---|---|
| 0 | \n", "2.419261 | \n", "1.657503 | \n", "0.250000 | \n", "0.307692 | \n", "00:14 | \n", "
| 1 | \n", "1.778618 | \n", "0.761048 | \n", "0.583333 | \n", "0.653846 | \n", "00:09 | \n", "
| 2 | \n", "1.316656 | \n", "0.523479 | \n", "0.781250 | \n", "0.769231 | \n", "00:11 | \n", "
| 3 | \n", "1.052359 | \n", "0.491936 | \n", "0.833333 | \n", "0.807692 | \n", "00:09 | \n", "
| epoch | \n", "train_loss | \n", "valid_loss | \n", "train_accuracy | \n", "valid_accuracy | \n", "time | \n", "
|---|---|---|---|---|---|
| 0 | \n", "0.367272 | \n", "0.446086 | \n", "0.843750 | \n", "0.807692 | \n", "00:11 | \n", "
| 1 | \n", "0.401298 | \n", "0.381388 | \n", "0.854167 | \n", "0.807692 | \n", "00:11 | \n", "
| 2 | \n", "0.326690 | \n", "0.309591 | \n", "0.906250 | \n", "0.846154 | \n", "00:11 | \n", "
| 3 | \n", "0.280659 | \n", "0.304625 | \n", "0.906250 | \n", "0.884615 | \n", "00:11 | \n", "
| 4 | \n", "0.268848 | \n", "0.249857 | \n", "0.927083 | \n", "0.846154 | \n", "00:11 | \n", "
| 5 | \n", "0.229108 | \n", "0.192554 | \n", "0.968750 | \n", "0.923077 | \n", "00:11 | \n", "
| 6 | \n", "0.208633 | \n", "0.224482 | \n", "0.979167 | \n", "0.923077 | \n", "00:10 | \n", "
| 7 | \n", "0.191411 | \n", "0.206568 | \n", "0.968750 | \n", "0.923077 | \n", "00:10 | \n", "
| 8 | \n", "0.169821 | \n", "0.233692 | \n", "0.979167 | \n", "0.923077 | \n", "00:09 | \n", "
| 9 | \n", "0.157189 | \n", "0.247892 | \n", "0.989583 | \n", "0.884615 | \n", "00:05 | \n", "
| 10 | \n", "0.146495 | \n", "0.253730 | \n", "0.979167 | \n", "0.923077 | \n", "00:04 | \n", "
| 11 | \n", "0.131543 | \n", "0.267920 | \n", "0.989583 | \n", "0.884615 | \n", "00:04 | \n", "
\n", "\n", "### Code snippet to generate graphs in this cell\n", "```python\n", "import pandas as pd\n", "from utils_cv.classification.parameter_sweeper import add_value_labels\n", "\n", "%matplotlib inline\n", "\n", "df = pd.DataFrame(\n", " {\n", " \"accuracy\": [0.9472, 0.9190, 0.8251],\n", " \"training_duration\": [385.3, 280.5, 272.5],\n", " \"inference_duration\": [34.2, 27.8, 27.6],\n", " \"memory\": [99, 45, 4.9],\n", " \"model\": [\"resnet50\", \"resnet18\", \"squeezenet1_1\"],\n", " }\n", ").set_index(\"model\")\n", "\n", "ax1, ax2, ax3, ax4 = df.plot.bar(\n", " rot=90, subplots=True, legend=False, figsize=(8, 10)\n", ")\n", "\n", "for ax in [ax1, ax2, ax3, ax4]:\n", " for i in [0, 1, 2]:\n", " if i == 0:\n", " ax.get_children()[i].set_color(\"r\")\n", " if i == 1:\n", " ax.get_children()[i].set_color(\"g\")\n", " if i == 2:\n", " ax.get_children()[i].set_color(\"b\")\n", "\n", "ax1.set_title(\"Accuracy (%)\")\n", "ax2.set_title(\"Training Duration (seconds)\")\n", "ax3.set_title(\"Inference Time (seconds)\")\n", "ax4.set_title(\"Memory Footprint (mb)\")\n", "\n", "ax1.set_ylabel(\"%\")\n", "ax2.set_ylabel(\"seconds\")\n", "ax3.set_ylabel(\"seconds\")\n", "ax4.set_ylabel(\"mb\")\n", "\n", "ax1.set_ylim(top=df[\"accuracy\"].max() * 1.3)\n", "ax2.set_ylim(top=df[\"training_duration\"].max() * 1.3)\n", "ax3.set_ylim(top=df[\"inference_duration\"].max() * 1.3)\n", "ax4.set_ylim(top=df[\"memory\"].max() * 1.3)\n", "\n", "add_value_labels(ax1, percentage=True)\n", "add_value_labels(ax2)\n", "add_value_labels(ax3)\n", "add_value_labels(ax4)\n", "```\n", "\n", "
\n", "\n", " \n", "> The figure on the left shows results of different learning rates on different datasets at 15 epochs. We see that a learning rate of 1e-4 results in the the best overall accuracy for the datasets we have tested. Notice there is a pretty significant variance between the datasets and a learning rate of 1-e3 may work better for some datasets. \n", "In the figure on the right, at 15 epochs, the results of 1e-4 are only slightly better than that of 1e-3. However, at only 3 epochs, a learning rate of 1e-3 out performs the smaller learning rates. This makes sense since we're limiting the training to only 3 epochs so a model that updates weights more quickly should perform better. Effectively a larger learning rate gets closer to the model convergence. This result indicates higher learning rates (such as 1e-3) may help minimize the training time, and lower learning rates (such as 1e-5) may be better if training time is not constrained. \n", "\n", "
\n", "\n", "\n", "### Code snippet to generate graphs in this cell\n", "\n", "```python\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "df_dataset_comp = pd.DataFrame(\n", " {\n", " \"fashionTexture\": [0.8749, 0.8481, 0.2491, 0.670318, 0.1643],\n", " \"flickrLogos32Subset\": [0.9069, 0.9064, 0.2179, 0.7175, 0.1073],\n", " \"food101Subset\": [0.9294, 0.9127, 0.6891, 0.9090, 0.555827],\n", " \"fridgeObjects\": [0.9591, 0.9727, 0.272727, 0.6136, 0.181818],\n", " \"lettuce\": [0.8992, 0.9104, 0.632, 0.8192, 0.5120],\n", " \"recycle_v3\": [0.9527, 0.9581, 0.766, 0.8591, 0.2876],\n", " \"learning_rate\": [0.000100, 0.001000, 0.010000, 0.000010, 0.000001],\n", " }\n", ").set_index(\"learning_rate\")\n", "\n", "df_epoch_comp = pd.DataFrame(\n", " {\n", " \"3_epochs\": [0.823808, 0.846394, 0.393808, 0.455115, 0.229120],\n", " \"15_epochs\": [0.920367, 0.918067, 0.471138, 0.764786, 0.301474],\n", " \"learning_rate\": [0.000100, 0.001000, 0.010000, 0.000010, 0.000001],\n", " }\n", ").set_index(\"learning_rate\")\n", "\n", "plt.figure(1)\n", "ax1 = plt.subplot(121)\n", "ax2 = plt.subplot(122)\n", "\n", "vals = ax2.get_yticks()\n", "\n", "df_dataset_comp.sort_index().plot(kind=\"bar\", rot=0, figsize=(15, 6), ax=ax1)\n", "vals = ax1.get_yticks()\n", "ax1.set_yticklabels([\"{:,.2%}\".format(x) for x in vals])\n", "ax1.set_ylim(0, 1)\n", "ax1.set_ylabel(\"Accuracy (%)\")\n", "ax1.set_title(\"Accuracy of Learning Rates by Datasets @ 15 Epochs\")\n", "ax1.legend(loc=2)\n", "\n", "df_epoch_comp.sort_index().plot(kind=\"bar\", rot=0, figsize=(15, 6), ax=ax2)\n", "ax2.set_yticklabels([\"{:,.2%}\".format(x) for x in vals])\n", "ax2.set_ylim(0, 1)\n", "ax2.set_title(\"Accuracy of Learning Rates by Epochs\")\n", "ax2.legend(loc=2)\n", "```\n", "\n", "
\n", "\n", "\n", "### Code snippet to generate graphs in this cell\n", "\n", "```python\n", "import pandas as pd\n", "from utils_cv.classification.parameter_sweeper import add_value_labels\n", "%matplotlib inline\n", "\n", "df = pd.DataFrame(\n", " {\n", " \"accuracy\": [0.9472, 0.9394, 0.9190, 0.9164, 0.8366, 0.8251],\n", " \"training_duration\": [385.3, 218.8, 280.5, 184.9, 272.5, 182.3],\n", " \"inference_duration\": [34.2, 23.2, 27.8, 17.8, 27.6, 17.3],\n", " \"model\": [\n", " \"resnet50 X 499\",\n", " \"resnet50 X 299\",\n", " \"resnet18 X 499\",\n", " \"resnet18 X 299\",\n", " \"squeezenet1_1 X 499\",\n", " \"squeezenet1_1 X 299\",\n", " ],\n", " }\n", ").set_index(\"model\")\n", "df\n", "\n", "ax1, ax2, ax3 = df.plot.bar(\n", " rot=90, subplots=True, legend=False, figsize=(12, 12)\n", ")\n", "\n", "for i in range(len(df)):\n", " if i < len(df) / 3:\n", " ax1.get_children()[i].set_color(\"r\")\n", " ax2.get_children()[i].set_color(\"r\")\n", " ax3.get_children()[i].set_color(\"r\")\n", " if i >= len(df) / 3 and i < 2 * len(df) / 3:\n", " ax1.get_children()[i].set_color(\"g\")\n", " ax2.get_children()[i].set_color(\"g\")\n", " ax3.get_children()[i].set_color(\"g\")\n", " if i >= 2 * len(df) / 3:\n", " ax1.get_children()[i].set_color(\"b\")\n", " ax2.get_children()[i].set_color(\"b\")\n", " ax3.get_children()[i].set_color(\"b\")\n", "\n", "ax1.set_title(\"Accuracy (%)\")\n", "ax2.set_title(\"Training Duration (seconds)\")\n", "ax3.set_title(\"Inference Speed (seconds)\")\n", "\n", "ax1.set_ylabel(\"%\")\n", "ax2.set_ylabel(\"seconds\")\n", "ax3.set_ylabel(\"seconds\")\n", "\n", "ax1.set_ylim(top=df[\"accuracy\"].max() * 1.2)\n", "ax2.set_ylim(top=df[\"training_duration\"].max() * 1.2)\n", "ax3.set_ylim(top=df[\"inference_duration\"].max() * 1.2)\n", "\n", "add_value_labels(ax1, percentage=True)\n", "add_value_labels(ax2)\n", "add_value_labels(ax3)\n", "```\n", "\n", "
\n", "