{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to create groups of layers and each one with a different Learning Rate? | fastai v2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objective of the notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The objective of this notebook is to explain:\n", "\n", "- how to create parameters groups for a model with fastai v2 \n", "- how to train each parameters group with a different Learning Rate and the impact of each method on the model performance\n", "- how to check the Learning Rates effectively used by the Optimizer during the training\n", "\n", "\n", "Author: [Pierre Guillou](https://www.linkedin.com/in/pierreguillou/) on May 25, 2020." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview of the results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to change the fastai2 default configuration (3 parameters groups), you need to define a ```splitter``` function and pass it to the ```Learner```." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example**\n", "\n", "```\n", "def splitter(m):\n", " groups = [group for group in m.children()]\n", " groups = L(groups)\n", " return groups.map(params)\n", "\n", "learn = Learner(dls, my_model, splitter=splitter, metrics=error_rate)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "([source](https://dev.fast.ai/learner#Learner)) ```splitter``` is a function that takes ```self.model``` and returns a list of parameter groups (or just one parameter group if there are no different parameter groups). The default is [trainable_params](https://dev.fast.ai/torch_core#trainable_params), which returns all trainable parameters of the model.\n", "```\n", "def trainable_params(m):\n", " \"Return all trainable parameters of `m`\"\n", " return [p for p in m.parameters() if p.requires_grad]\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### One Learning Rate (max) by parameters group" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 4 possiblities. \n", "\n", "For example with 3 parameters groups, you can do the following (for a ```Learner``` unfrozen and the use of ```learn.fit_one_cycle()```):\n", "\n", "```\n", "1. if lr_max = 1e-3 -> [0.001,0.001,0.001]\n", "2. if lr_max = slice(1e-3) -> [0.0001,0.0001,0.001]\n", "3. if lr_max = slice(1e-5,1e-3) -> array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly geometrically spaced\n", "4. if lr_max = [1e-5, 1e-4, 1e-3] -> array([1.e-05, 1.e-04, 1.e-03]) #LRs evenly linearly spaced or not\n", "```\n", "\n", "1. All parameters groups will use **the same Learning Rate** (and the same Optimizer method like Adam + 1cycle policy for all).\n", "2. The last layer group’s Learning Rate (max) value is setup to lr, and all previous parameters groups ones to **lr/10**.\n", "3. Train the very first layers at a Learning Rate of 1e-5, the very last at 1e-3, and the Learning Rates of other parameters groups are **evenly geometrically spaced** between theses two values.\n", "4. Train the very first layers at a Learning Rate of 1e-5, the very last at 1e-3, and the Learning Rates of other parameters groups are **evenly linearly spaced between theses two values or you can pass as a list any Learning Rate values**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### WARNING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Points 3 and 4 are not equivalent for a number of parameters groups greater than 3!!!**\n", "\n", "- point 3: Learning Rates are calculated geometrically.\n", "- point 4: you can pass an array with the Learning Rate values you want.\n", "\n", "Check the following graph to understand what means passing a slice with 2 values as list of Learning Rates (point 3)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Image\n", "Image(\"images/lrs.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#hide\n", "from utils import *" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cuda device: 0\n", "cuda device name: Tesla V100-PCIE-32GB\n" ] } ], "source": [ "print(f'cuda device: {torch.cuda.current_device()}')\n", "print(f'cuda device name: {torch.cuda.get_device_name(0)}')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## fastai documentation\n", "\n", "- class [Learner](https://github.com/fastai/fastai2/blob/master/fastai2/learner.py#L81)\n", " - [defaults.lr](https://github.com/fastai/fastai2/blob/master/fastai2/learner.py#L22) = 1e-3\n", " - [create_opt()](https://github.com/fastai/fastai2/blob/master/fastai2/learner.py#L140)\n", " - [fit_one_cycle](https://github.com/fastai/fastai2/blob/master/fastai2/callback/schedule.py#L103)\n", " - [combined_cos()](https://dev.fast.ai/callback.schedule#combined_cos)\n", "- class [Optimizer](https://github.com/fastai/fastai2/blob/master/fastai2/optimizer.py#L64)\n", " - [hypers](https://github.com/fastai/fastai2/blob/master/fastai2/optimizer.py#L72)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### combined_cos()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(0.9899) 4.574851048952042e-06\n", "tensor(1.) 9.999999999940612e-08\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "lr_max=1e-2\n", "div=25.\n", "div_final=1e5\n", "pct_start=0.25\n", "\n", "p = torch.linspace(0.,1,100)\n", "f = combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final)\n", "plt.plot(p, [f(o) for o in p]);\n", "\n", "# last values of cosine annealing for the 1cycle policy for lr_max = 1e-2\n", "print(p[-2],f(p[-2]))\n", "print(p[-1],f(p[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from fastai2.vision.all import *\n", "path = untar_data(URLs.PETS)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "#hide\n", "Path.BASE_PATH = path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataloaders" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "pets = DataBlock(blocks = (ImageBlock, CategoryBlock),\n", " get_items=get_image_files, \n", " splitter=RandomSplitter(seed=42),\n", " get_y=using_attr(RegexLabeller(r'(.+)_\\d+.jpg$'), 'name'),\n", " item_tfms=Resize(460),\n", " batch_tfms=aug_transforms(size=224, min_scale=0.75))\n", "dls = pets.dataloaders(path/\"images\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "dls.show_batch(nrows=1, ncols=3)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#37) ['Abyssinian','Bengal','Birman','Bombay','British_Shorthair','Egyptian_Mau','Maine_Coon','Persian','Ragdoll','Russian_Blue'...]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dls.vocab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Change pretrained resnet18 with an output layer of 1000 classes to 37." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Linear(in_features=512, out_features=1000, bias=True)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# model resnet18\n", "m = resnet18(pretrained=True)\n", "\n", "# last layer of m\n", "# list(m.children())[-1]\n", "m.fc" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "37" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# number of classes of my dls\n", "num_classes = len(dls.vocab)\n", "num_classes" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Linear(in_features=512, out_features=37, bias=True)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# HEAD of the model: remplace last linear layer of 1000 classes by one with 37 classes\n", "# source: https://discuss.pytorch.org/t/resnet-last-layer-modification/33530/2\n", "my_model = m\n", "my_model.fc = nn.Linear(512, num_classes)\n", "\n", "# last layer of m\n", "list(my_model.children())[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learner" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Splitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "([source](https://dev.fast.ai/learner#Learner)) ```splitter``` is a function that takes ```self.model``` and returns a list of parameter groups (or just one parameter group if there are no different parameter groups). The default is [trainable_params](https://dev.fast.ai/torch_core#trainable_params), which returns all trainable parameters of the model.\n", "```\n", "def trainable_params(m):\n", " \"Return all trainable parameters of `m`\"\n", " return [p for p in m.parameters() if p.requires_grad]\n", "```" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# number of layers groups of my_model\n", "len(list(my_model.children()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Therefore, if we don't pass a splitter function to our ```Learner```, only one Learning Rate will be applied to all layers of my_model. \n", "\n", "As there are 10 layers groups in my_model, let's create a ```splitter``` function that distributes the parameters of my_model to 10 parameters groups." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def splitter(m):\n", " groups = [group for group in m.children()]\n", " groups = L(groups)\n", " return groups.map(params)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# number of parameters groups of my_model\n", "len(splitter(my_model))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Learner without splitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without splitter, there is 1 parameters group by default and therefore, 1 Learning Rate (1e-3 by default)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "learn = Learner(dls, my_model, metrics=error_rate)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 1\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "learn.create_opt()\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (before its atualization by the Optimizer of the function fit_one_cycle())\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Learner with splitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With our ```splitter```, there are 10 parameters groups and automatically, the same Learning Rate (1e-3 by default) for each group." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "learn = Learner(dls, my_model, splitter=splitter, metrics=error_rate)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 10\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "learn.create_opt()\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (before its atualization by the Optimizer of the function fit_one_cycle())\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training with splitter" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "learn = Learner(dls, my_model, splitter=splitter, metrics=error_rate)\n", "learn.freeze()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ResNet (Input shape: ['64 x 3 x 224 x 224'])\n", "================================================================\n", "Layer (type) Output Shape Param # Trainable \n", "================================================================\n", "Conv2d 64 x 64 x 112 x 112 9,408 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 64 x 112 x 112 128 True \n", "________________________________________________________________\n", "ReLU 64 x 64 x 112 x 112 0 False \n", "________________________________________________________________\n", "MaxPool2d 64 x 64 x 56 x 56 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 64 x 56 x 56 36,864 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 64 x 56 x 56 128 True \n", "________________________________________________________________\n", "ReLU 64 x 64 x 56 x 56 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 64 x 56 x 56 36,864 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 64 x 56 x 56 128 True \n", "________________________________________________________________\n", "Conv2d 64 x 64 x 56 x 56 36,864 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 64 x 56 x 56 128 True \n", "________________________________________________________________\n", "ReLU 64 x 64 x 56 x 56 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 64 x 56 x 56 36,864 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 64 x 56 x 56 128 True \n", "________________________________________________________________\n", "Conv2d 64 x 128 x 28 x 28 73,728 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 128 x 28 x 28 256 True \n", "________________________________________________________________\n", "ReLU 64 x 128 x 28 x 28 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 128 x 28 x 28 147,456 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 128 x 28 x 28 256 True \n", "________________________________________________________________\n", "Conv2d 64 x 128 x 28 x 28 8,192 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 128 x 28 x 28 256 True \n", "________________________________________________________________\n", "Conv2d 64 x 128 x 28 x 28 147,456 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 128 x 28 x 28 256 True \n", "________________________________________________________________\n", "ReLU 64 x 128 x 28 x 28 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 128 x 28 x 28 147,456 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 128 x 28 x 28 256 True \n", "________________________________________________________________\n", "Conv2d 64 x 256 x 14 x 14 294,912 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 256 x 14 x 14 512 True \n", "________________________________________________________________\n", "ReLU 64 x 256 x 14 x 14 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 256 x 14 x 14 589,824 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 256 x 14 x 14 512 True \n", "________________________________________________________________\n", "Conv2d 64 x 256 x 14 x 14 32,768 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 256 x 14 x 14 512 True \n", "________________________________________________________________\n", "Conv2d 64 x 256 x 14 x 14 589,824 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 256 x 14 x 14 512 True \n", "________________________________________________________________\n", "ReLU 64 x 256 x 14 x 14 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 256 x 14 x 14 589,824 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 256 x 14 x 14 512 True \n", "________________________________________________________________\n", "Conv2d 64 x 512 x 7 x 7 1,179,648 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 512 x 7 x 7 1,024 True \n", "________________________________________________________________\n", "ReLU 64 x 512 x 7 x 7 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 512 x 7 x 7 2,359,296 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 512 x 7 x 7 1,024 True \n", "________________________________________________________________\n", "Conv2d 64 x 512 x 7 x 7 131,072 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 512 x 7 x 7 1,024 True \n", "________________________________________________________________\n", "Conv2d 64 x 512 x 7 x 7 2,359,296 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 512 x 7 x 7 1,024 True \n", "________________________________________________________________\n", "ReLU 64 x 512 x 7 x 7 0 False \n", "________________________________________________________________\n", "Conv2d 64 x 512 x 7 x 7 2,359,296 False \n", "________________________________________________________________\n", "BatchNorm2d 64 x 512 x 7 x 7 1,024 True \n", "________________________________________________________________\n", "AdaptiveAvgPool2d 64 x 512 x 1 x 1 0 False \n", "________________________________________________________________\n", "Linear 64 x 37 18,981 True \n", "________________________________________________________________\n", "\n", "Total params: 11,195,493\n", "Total trainable params: 28,581\n", "Total non-trainable params: 11,166,912\n", "\n", "Optimizer used: \n", "Loss function: FlattenedLoss of CrossEntropyLoss()\n", "\n", "Model frozen up to parameter group number 9\n", "\n", "Callbacks:\n", " - TrainEvalCallback\n", " - Recorder\n", " - ProgressCallback" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**List of Learning Rates before its atualization by the Optimizer of the function fit_one_cycle()**" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 10\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n", "9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 0.001, 'mom': 0.9, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "learn.create_opt()\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (before its atualization by the Optimizer of the function fit_one_cycle())\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "SuggestedLRs(lr_min=0.00831763744354248, lr_steep=0.010964781977236271)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "learn.lr_find()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_losserror_ratetime
00.7046560.3336630.11502000:13
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn.fit_one_cycle(1, lr_max=1e-2)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "learn.save('my_resnet18_finetuned')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = 1e-2**" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 10\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n", "9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.281577479700553e-06, 'mom': 0.94994818370704, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = 1e-2)\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Conclusion**\n", "\n", "We can verify than the Learning Rates of all parameters groups are identical but only the last parameters group has been updated as the ```Learner``` was frozen." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test 1 | lr_max = 1e-3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Hypothese**: all parameters groups with the same (max) Learning Rate (lr_max = 1e-3)." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "learn.load('my_resnet18_finetuned')\n", "learn.unfreeze()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_losserror_ratetime
00.8613773.3785910.57780800:16
10.6151140.6784420.22327500:16
20.3073240.3218830.11028400:16
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn.fit_one_cycle(3, lr_max=1e-3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = 1e-3**" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 10\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = 1e-3)\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Conclusion**\n", "\n", "We can verify than the Learning Rates of all parameters groups are identical." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test 2 | lr_max = slice(1e-3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Hypothese**\n", "\n", "- The last parameters group has a Learning Rate lr_max of 1e-3.\n", "- All the (previous) others have a Learning Rate lr_max of **lr_max/10**." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "learn.load('my_resnet18_finetuned')\n", "learn.unfreeze()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_losserror_ratetime
00.3461020.3209280.10690100:16
10.2426300.2491660.08660400:16
20.1511280.2333830.08795700:16
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn.fit_one_cycle(3, lr_max=slice(1e-3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-3)**" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 10\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103824835e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-3))\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Conclusion**\n", "\n", "- We can verify than the Learning Rates of the 9 first parameters groups are 10 times less than the one of the last parameters group.\n", "- With a smaller Learning Rate for the parameters groups but the last one, the error rate is smaller than the test 1 (same Learning Rate for all parameters groups)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test 3 | lr_max = slice(1e-5,1e-3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Hypothese**\n", "\n", "- Train first parameters group with lr_max = 1e-5 and last one with lr_max = 1e-3. \n", "- Between these 2 parameters groups, the Learning Rate (max) of the others are evenly geometrically spaced between 1e-5 and 1e-3." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "learn.load('my_resnet18_finetuned')\n", "learn.unfreeze()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_losserror_ratetime
00.4353870.5426960.17862000:17
10.3069880.3198510.10622500:17
20.1510530.2345330.07577800:16
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn.fit_one_cycle(3, lr_max=slice(1e-3/100,1e-3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-5,1e-3)**" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 10\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103818059e-10, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.1273265478161222e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.880494020012536e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 3.1368530849825837e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.232586316170941e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 8.7284800449677e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.4559982251924255e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 2.428751421610113e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 4.051401551101539e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = slice(1e-5,1e-3))\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "lr_list = list()\n", "for h in learn.opt.hypers:\n", " lr_list.append(h['lr'])\n", "\n", "lr_first = lr_list[0]\n", "lr_last = lr_list[-1]\n", "inter = (lr_last - lr_first) / (len(learn.opt.param_groups) - 1)\n", "lr_list_calculated = [lr_first+i*inter for i in range(len(learn.opt.param_groups))]\n", "\n", "fig, ax = plt.subplots()\n", "p = np.linspace(0,9,10)\n", "ax.plot(p, lr_list, label='last lr values')\n", "ax.plot(p, lr_list_calculated, label='calculated last lr values')\n", "leg = ax.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Conclusion**\n", "\n", "- We can verify that the Learning Rates of the parameters groups are **evenly geometrically spaced** (it does not mean linearly spaced) from the smallest value (first group) to the largest one (last group).\n", "- With an incremental Learning Rate from the smallest to the highest value (each parameters groups with a different one), the error rate is smaller than the ones of the previous test 1 and 2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test 4 | array of 10 learning rates | lr_max = [lr1,lr2,lr3,lr4,lr5,lr6,lr7,lr8,lr9,lr10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each parameters group has a Learning Rate different from the smallest to the highest value given by a list." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "learn.load('my_resnet18_finetuned')\n", "learn.unfreeze()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10,\n", " [1e-05,\n", " 0.00012,\n", " 0.00023,\n", " 0.00034,\n", " 0.00045000000000000004,\n", " 0.0005600000000000001,\n", " 0.00067,\n", " 0.0007800000000000001,\n", " 0.0008900000000000001,\n", " 0.001])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lr_lastlayer = 1e-3\n", "lr_firstlayer = lr_lastlayer / 100\n", "inter = (lr_lastlayer - lr_firstlayer) / (len(learn.opt.param_groups) - 1) # 9 intervals\n", "lr_max = [lr_firstlayer + i*inter for i in range(len(learn.opt.param_groups) - 1)]\n", "lr_max.append(lr_lastlayer)\n", "len(lr_max), lr_max" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_losserror_ratetime
00.6883181.2707680.34709100:16
10.5016380.4227700.14073100:16
20.2396550.2729790.08863300:16
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn.fit_one_cycle(3, lr_max=lr_max)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**List of Learning Rates: last values of cosine annealing for the 1cycle policy with lr_max = [lr1,lr2,lr3,lr4,lr5,lr6,lr7,lr8,lr9,lr10]**" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of parameters groups: 10\n", "0 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103818059e-10, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "1 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 8.10977412458167e-09, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "2 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 1.554373373877137e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "3 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 2.297769335300173e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "4 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 3.041165296717788e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "5 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 3.7845612581299815e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "6 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 4.527957219563859e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "7 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 5.271353180976053e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "8 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.014749142399089e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n", "9 {'wd': 0.01, 'sqr_mom': 0.99, 'lr': 6.758145103822125e-08, 'mom': 0.9499942417973141, 'eps': 1e-05}\n" ] } ], "source": [ "# Check the number of parameters groups...\n", "print(f'number of parameters groups: {len(learn.opt.param_groups)}')\n", "\n", "# ... and the list of Learning Rates (last values of cosine annealing for the 1cycle policy with lr_max = [lr1,lr2,lr3,lr4,lr5,lr6,lr7,lr8,lr9,lr10])\n", "for i,h in enumerate(learn.opt.hypers):\n", " print(i,h)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "lr_list = list()\n", "for h in learn.opt.hypers:\n", " lr_list.append(h['lr'])\n", "\n", "lr_first = lr_list[0]\n", "lr_last = lr_list[-1]\n", "inter = (lr_last - lr_first) / (len(learn.opt.param_groups) - 1)\n", "lr_list_calculated = [lr_first+i*inter for i in range(len(learn.opt.param_groups))]\n", "\n", "fig, ax = plt.subplots()\n", "p = np.linspace(0,9,10)\n", "ax.plot(p, lr_list, 'o--', label='last lr values')\n", "ax.plot(p, lr_list_calculated, label='calculated last lr values')\n", "leg = ax.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Conclusion**\n", " \n", "- We can verify that the Learning Rates of the groups of parameters are distributed linearly from the smallest value (first group) to the largest (last group) exactly as we passed them.\n", "- However, the error rate is higher that the one of the test 3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test 5 | array of 9 learning rates | lr_max = [lr1,lr2,lr3,lr4,lr5,lr6,lr7,lr8,lr9]" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "learn.load('my_resnet18_finetuned')\n", "learn.unfreeze()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(9,\n", " [1e-05,\n", " 0.00013375,\n", " 0.0002575,\n", " 0.00038125,\n", " 0.000505,\n", " 0.00062875,\n", " 0.0007525,\n", " 0.0008762500000000001,\n", " 0.001])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lr_lastlayer = 1e-3\n", "lr_firstlayer = lr_lastlayer / 100\n", "inter = (lr_lastlayer - lr_firstlayer) / ( len(learn.opt.param_groups) - 2 ) # 8 intervals\n", "lr_max = [lr_firstlayer + i*inter for i in range(len(learn.opt.param_groups) - 2)]\n", "lr_max.append(lr_lastlayer)\n", "len(lr_max), lr_max" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "Trying to set 9 values for lr but there are 10 parameter groups.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlearn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit_one_cycle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlr_max\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlr_max\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/.conda/envs/fastai2/lib/python3.7/site-packages/fastcore/utils.py\u001b[0m in \u001b[0;36m_f\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 429\u001b[0m \u001b[0minit_args\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlog\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 430\u001b[0m \u001b[0msetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minst\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'init_args'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minit_args\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 431\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0minst\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mto_return\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 432\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0m_f\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 433\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/.conda/envs/fastai2/lib/python3.7/site-packages/fastai2/callback/schedule.py\u001b[0m in \u001b[0;36mfit_one_cycle\u001b[0;34m(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)\u001b[0m\n\u001b[1;32m 107\u001b[0m \u001b[0;34m\"Fit `self.model` for `n_epoch` using the 1cycle policy.\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 108\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopt\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate_opt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 109\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mset_hyper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'lr'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlr\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlr_max\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mlr_max\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 110\u001b[0m \u001b[0mlr_max\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'lr'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mh\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhypers\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 111\u001b[0m scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),\n", "\u001b[0;32m~/.conda/envs/fastai2/lib/python3.7/site-packages/fastai2/optimizer.py\u001b[0m in \u001b[0;36mset_hyper\u001b[0;34m(self, k, v)\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0mv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mL\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muse_list\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m==\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mparam_lists\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 44\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhypers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34mf\"Trying to set {len(v)} values for {k} but there are {len(self.param_lists)} parameter groups.\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 45\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_set_hyper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 46\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: Trying to set 9 values for lr but there are 10 parameter groups." ] } ], "source": [ "learn.fit_one_cycle(3, lr_max=lr_max)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**AssertionError Explication**: we can not pass an array of 9 learning rates because there are 10 parameters groups." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## END" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }