Learn Before
Concept
Adam Convergence Failure
A known limitation of the Adam optimization algorithm is its potential failure to converge, even in convex optimization settings. This divergence typically occurs when the second moment estimate, denoted as , blows up. Specifically, when the squared gradient exhibits high variance or when parameter updates are sparse, the state variable may forget its past values too rapidly, which destabilizes the learning process. These convergence issues can be amended by either increasing the size of the minibatches during training or by switching to an optimization algorithm that provides an improved estimate for , such as the Yogi optimizer.
0
1
Updated 2026-05-16
Tags
D2L
Dive into Deep Learning @ D2L