Learn Before
Theory
Effect of Warmup on Parameter Divergence
Research has shown that applying a warmup phase during optimization limits the amount of parameter divergence in very deep neural networks. Because the network weights are randomly initialized, the parts of the network that require the most time to make progress are highly susceptible to significant divergence early in training. Gradually increasing the learning rate during a warmup period mitigates this instability, leading to better initial convergence.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L