Learn Before
Concept

Intuition Behind Optimizer Warmup

The fundamental intuition behind using an optimizer warmup phase is that random parameter initialization, especially in advanced or very deep neural networks, often leads to unstable optimization and significant early divergence. A warmup period mitigates this by starting with a small learning rate, which effectively limits the amount of parameter divergence in the parts of the network that take the most time to make initial progress. Once the parameters have stabilized, the learning rate can be safely increased to avoid slow training.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L