1Cademy - Adam Optimizer Update Rule

Learn Before

Concept

Adam Optimizer Update Rule

After computing the bias-corrected state variables, the Adam optimization algorithm calculates its final parameter updates. First, it rescales the gradient to obtain $\mathbf{g}_t' = \frac{\eta \hat{\mathbf{v}}_t}{\sqrt{\hat{\mathbf{s}}_t} + \epsilon}$ . While similar to RMSProp, this rescaling uses the debiased momentum $\hat{\mathbf{v}}_t$ rather than the raw gradient, and the $\epsilon$ parameter (typically $10^{-6}$ for numerical stability) is added outside the square root. Finally, the model parameters are updated using the explicit learning rate $\eta$ , which controls the step length, via the simple rule $\mathbf{x}_t \leftarrow \mathbf{x}_{t-1} - \mathbf{g}_t'$ .