Learn Before
Concept

Insufficiency of Weight Decay for Preventing Interpolation

In deep learning, typical strengths of weight decay (such as 2\ell_2 regularization) are usually insufficient to prevent highly parameterized networks from fully interpolating the training data. While these regularizers were classically thought to restrict models from fitting arbitrary labels, their effectiveness in deep architectures is often contingent on being paired with an early stopping criterion.

0

1

Updated 2026-05-07

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L