Learn Before
Theory

Effect of Warmup on Parameter Divergence

Research has shown that applying a warmup phase during optimization limits the amount of parameter divergence in very deep neural networks. Because the network weights are randomly initialized, the parts of the network that require the most time to make progress are highly susceptible to significant divergence early in training. Gradually increasing the learning rate during a warmup period mitigates this instability, leading to better initial convergence.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L