Concept

Double Descent Phenomenon

In deep learning, the relationship between a model's complexity (such as network width or depth) and its generalization gap can be non-monotonic. This counterintuitive behavior is known as the 'double-descent' pattern. Strangely, for many tasks where models perfectly fit the training data and achieve zero training error, practitioners can often reduce the generalization error further by making the model even more expressive—such as by adding layers, nodes, or training for more epochs. In this pattern, greater model complexity initially hurts generalization performance but subsequently helps mitigate overfitting.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related