Concept

Generalization Paradox in Deep Learning

While guarantees from classical learning theory (such as bounds based on the Vapnik-Chervonenkis (VC) dimension or Rademacher complexity) can be conservative even for classical models, they appear powerless to explain why deep neural networks generalize. For classification problems, deep models are typically expressive enough to perfectly fit arbitrary labels even for datasets consisting of millions of examples. In the classical picture, such extreme model complexity—even when utilizing familiar methods like 2\ell_2 regularization—should lead to severe overfitting. Paradoxically, despite perfectly fitting the training data with zero training error, these highly expressive models often generalize remarkably well to unseen data, contradicting traditional complexity-based generalization bounds.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Learn After