{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Schedulers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In `timm`, essentially we have a total of four different schedulers: \n", "\n", "1. [SGDR: Stochastic Gradient Descent with Warm Restarts](https://arxiv.org/abs/1608.03983)\n", "2. [Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification](https://arxiv.org/abs/1806.01593)\n", "3. [StepLR](https://github.com/rwightman/pytorch-image-models/blob/master/timm/scheduler/step_lr.py#L13)\n", "4. [PlateauLRScheduler](https://github.com/rwightman/pytorch-image-models/blob/master/timm/scheduler/plateau_lr.py#L12)\n", "\n", "In this tutorial we are going to look at each one of them in detail and also look at how we can train our models using these schedulers using the `timm` training script or use them as standalone schedulers for custom PyTorch training scripts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Available Schedulers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section we will look at the various available schedulers in `timm`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SGDR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's look at the `SGDR` scheduler also referred to as the `cosine` scheduler in `timm`. \n", "\n", "The `SGDR` scheduler, or the `Stochastic Gradient Descent with Warm Restarts` scheduler schedules the learning rate using a cosine schedule but with a tweak. It resets the learning rate to the initial value after some number of epochs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"SGDR\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> NOTE: Unlike the builtin PyTorch schedulers, this is intended to be consistently called at the END of each epoch, before incrementing the epoch count, to calculate next epoch's value & at the END of each optimizer update, after incrementing the update count, to calculate next update's value." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### StepLR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `StepLR` is a basic step LR schedule with warmup, noise. \n", "\n", "> NOTE: PyTorch's implementation does not support warmup or noise. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The schedule for `StepLR` annealing looks something like: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"StepLR\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After a certain number `decay_epochs`, the learning rate is updated to be `lr * decay_rate`. In the above `StepLR` schedule, `decay_epochs` is set to 30 and `decay_rate` is set to 0.5 with an initial `lr` of 1e-4." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is also referred to as the `tanh` annealing. `tanh` stands for hyperbolic tangent decay. The annealing using this scheduler looks something like: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Tanh\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is similar to the `SGDR` in the sense that the learning rate is set to the initial `lr` after a certain number of epochs but the annealing is done using the `tanh` function. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### PlateauLRScheduler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This scheduler is very similar to PyTorch's [ReduceLROnPlateau](https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#ReduceLROnPlateau) scheduler. The basic idea is to track an eval metric and based on the evaluation metric's value, the `lr` is reduced using `StepLR` if the eval metric is stagnant for a certain number of epochs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the various schedulers in the `timm` training script" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is very easy to train our models using the `timm`'s training script. Essentially, we simply pass in a parameter using the `--sched` flag to specify which scheduler to use and the various hyperparameters alongside. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- For `SGDR`, we pass in `--sched cosine`. \n", "- For `PlatueLRScheduler` we pass in `--sched plateau`. \n", "- For `TanhLRScheduler`, we pass in `--sched tanh`.\n", "- For `StepLR`, we pass in `--sched step`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus the call to the training script looks something like: \n", "\n", "```python \n", "python train.py --sched cosine --epochs 200 --min-lr 1e-5 --lr-cycle-mul 2 --lr-cycle-limit 2 \n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }