Learn Before
Concept
Computational Cost of AdaGrad
Although the AdaGrad algorithm requires maintaining an auxiliary state variable to allow for an individual learning rate per coordinate, this additional operation does not significantly increase its computational cost relative to standard stochastic gradient descent (SGD). The storage and element-wise arithmetic required to update are relatively inexpensive, simply because the primary computational expense in optimizing deep learning models remains the forward pass to evaluate the objective function and the backward pass to compute its derivative.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L