Learn Before
Theory
Overfitting Reduction via Learning Rate Scheduling
Using a learning rate scheduler during training results in less overfitting compared to using a constant learning rate. Although the exact theoretical reason is not fully resolved, one argument posits that a smaller step size leads to model parameters that are closer to zero and therefore simpler. However, this argument does not completely explain the phenomenon, as the training does not stop early but simply reduces the learning rate gently.
0
1
Updated 2026-05-16
Tags
D2L
Dive into Deep Learning @ D2L