Generalization Paradox in Deep Learning
While guarantees from classical learning theory (such as bounds based on the Vapnik-Chervonenkis (VC) dimension or Rademacher complexity) can be conservative even for classical models, they appear powerless to explain why deep neural networks generalize. For classification problems, deep models are typically expressive enough to perfectly fit arbitrary labels even for datasets consisting of millions of examples. In the classical picture, such extreme model complexity—even when utilizing familiar methods like regularization—should lead to severe overfitting. Paradoxically, despite perfectly fitting the training data with zero training error, these highly expressive models often generalize remarkably well to unseen data, contradicting traditional complexity-based generalization bounds.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Generalization Paradox in Deep Learning
How to tell when your model is overfitting
Overfitting/Underfitting vs. Bias/Variance in Supervised Machine Learning
Which of the following would be the best choice for the next ridge regression model you train?
How to avoid overfitting
Generalization Paradox in Deep Learning