Concept

Adam Optimizer Update Rule

After computing the bias-corrected state variables, the Adam optimization algorithm calculates its final parameter updates. First, it rescales the gradient to obtain gt=ηv^ts^t+ϵ\mathbf{g}_t' = \frac{\eta \hat{\mathbf{v}}_t}{\sqrt{\hat{\mathbf{s}}_t} + \epsilon}. While similar to RMSProp, this rescaling uses the debiased momentum v^t\hat{\mathbf{v}}_t rather than the raw gradient, and the ϵ\epsilon parameter (typically 10610^{-6} for numerical stability) is added outside the square root. Finally, the model parameters are updated using the explicit learning rate η\eta, which controls the step length, via the simple rule xtxt1gt\mathbf{x}_t \leftarrow \mathbf{x}_{t-1} - \mathbf{g}_t'.

0

2

Updated 2026-05-16

Tags

Data Science

D2L

Dive into Deep Learning @ D2L