Concept

Optimizer Warmup

To address the dilemma of choosing an initial learning rate that is either too small (causing slow progress) or too large (causing divergence), a simple strategy called optimizer warmup is used. During a warmup period, the learning rate gradually increases—typically linearly—from a small value to its initial maximum, after which it cools down until the end of the optimization process.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L