Learn Before
Concept
Insufficiency of Weight Decay for Preventing Interpolation
In deep learning, typical strengths of weight decay (such as regularization) are usually insufficient to prevent highly parameterized networks from fully interpolating the training data. While these regularizers were classically thought to restrict models from fitting arbitrary labels, their effectiveness in deep architectures is often contingent on being paired with an early stopping criterion.
0
1
Updated 2026-05-07
Tags
D2L
Dive into Deep Learning @ D2L