Learn Before
Concept
Dilemma of Initial Learning Rate
When training advanced neural network designs, initializing the parameters is sometimes insufficient to guarantee stable optimization. This creates an optimization dilemma: choosing a sufficiently small initial learning rate prevents early divergence but results in extremely slow progress, whereas choosing a large initial learning rate leads to immediate divergence.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L