Learn Before
Formula
Adam State Variables
A key component of the Adam optimization algorithm is its use of exponential weighted moving averages, or leaky averaging, to estimate both the momentum and the second moment of the gradient. At each time step , it maintains two state variables: and . The terms and are nonnegative weighting parameters. Common default choices are and , which ensures that the variance estimate adapts much more slowly than the momentum term .
0
1
Updated 2026-05-16
Tags
D2L
Dive into Deep Learning @ D2L