Learn Before
Concept
Double Descent Phenomenon
In deep learning, the relationship between a model's complexity (such as network width or depth) and its generalization gap can be non-monotonic. This counterintuitive behavior is known as the 'double-descent' pattern. Strangely, for many tasks where models perfectly fit the training data and achieve zero training error, practitioners can often reduce the generalization error further by making the model even more expressive—such as by adding layers, nodes, or training for more epochs. In this pattern, greater model complexity initially hurts generalization performance but subsequently helps mitigate overfitting.
0
1
Updated 2026-05-06
Tags
D2L
Dive into Deep Learning @ D2L