Concept

Adam Convergence Failure

A known limitation of the Adam optimization algorithm is its potential failure to converge, even in convex optimization settings. This divergence typically occurs when the second moment estimate, denoted as st\mathbf{s}_t, blows up. Specifically, when the squared gradient gt2\mathbf{g}_t^2 exhibits high variance or when parameter updates are sparse, the state variable st\mathbf{s}_t may forget its past values too rapidly, which destabilizes the learning process. These convergence issues can be amended by either increasing the size of the minibatches during training or by switching to an optimization algorithm that provides an improved estimate for st\mathbf{s}_t, such as the Yogi optimizer.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L