Concept
Adam Optimizer Update Rule
After computing the bias-corrected state variables, the Adam optimization algorithm calculates its final parameter updates. First, it rescales the gradient to obtain . While similar to RMSProp, this rescaling uses the debiased momentum rather than the raw gradient, and the parameter (typically for numerical stability) is added outside the square root. Finally, the model parameters are updated using the explicit learning rate , which controls the step length, via the simple rule .
0
2
Updated 2026-05-16
Tags
Data Science
D2L
Dive into Deep Learning @ D2L