Learn Before
Formula
AdaGrad Update Rule
The AdaGrad algorithm updates the parameters of a model by maintaining a state variable that accumulates the element-wise squares of past gradients. At each step , the gradient is computed. The state variable is updated as , initialized with . The parameter vector is then updated coordinate-wise according to the rule: , where is the initial learning rate and is a small additive constant used to prevent division by zero. This formulation ensures that each coordinate has its own adaptive learning rate based on its historical gradient variance.
0
2
Updated 2026-05-15
Contributors are:
Who are from:
Tags
Deep Learning (in Machine learning)
Data Science
D2L
Dive into Deep Learning @ D2L
Computing Sciences