1Cademy - Adam State Variables

Learn Before

Adam (Deep Learning Optimization Algorithm)

Formula

Adam State Variables

A key component of the Adam optimization algorithm is its use of exponential weighted moving averages, or leaky averaging, to estimate both the momentum and the second moment of the gradient. At each time step $t$ , it maintains two state variables: $\mathbf{v}_t \leftarrow \beta_1 \mathbf{v}_{t-1} + (1 - \beta_1) \mathbf{g}_t$ and $\mathbf{s}_t \leftarrow \beta_2 \mathbf{s}_{t-1} + (1 - \beta_2) \mathbf{g}_t^2$ . The terms $\beta_1$ and $\beta_2$ are nonnegative weighting parameters. Common default choices are $\beta_1 = 0.9$ and $\beta_2 = 0.999$ , which ensures that the variance estimate $\mathbf{s}_t$ adapts much more slowly than the momentum term $\mathbf{v}_t$ .

Updated 2026-05-16

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Adam Bias Correction
Adam Convergence Failure

Learn Before

Related

Learn After