Theory

Overfitting Reduction via Learning Rate Scheduling

Using a learning rate scheduler during training results in less overfitting compared to using a constant learning rate. Although the exact theoretical reason is not fully resolved, one argument posits that a smaller step size leads to model parameters that are closer to zero and therefore simpler. However, this argument does not completely explain the phenomenon, as the training does not stop early but simply reduces the learning rate gently.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L