Concept

RMSprop (Deep Learning Optimization Algorithm)

  • Stands for Root Mean Square Propagation
  • RMSProp is an optimization algorithm closely related to AdaGrad, as both employ the square of the gradient to scale the update coefficients on a per-coordinate basis. However, RMSProp overcomes AdaGrad's tendency for radically diminishing learning rates by using a leaky (exponentially weighted) average of squared gradients rather than a cumulative sum.
  • RMSProp also shares the leaky averaging mechanism with the momentum method, but applies it differently: whereas momentum uses leaky averaging to smooth the gradient direction, RMSProp uses the technique to adjust the coefficient-wise preconditioner that rescales the learning rate independently for each parameter.
  • Because RMSProp does not automatically schedule the learning rate (unlike AdaGrad, whose learning rate decays implicitly through accumulation), the learning rate must be explicitly scheduled by the practitioner in practice.
  • The decay coefficient γ\gamma governs how long the gradient history is retained when adjusting the per-coordinate scale: a larger γ\gamma produces a longer memory, while a smaller γ\gamma makes the algorithm more responsive to recent gradients.

0

2

Updated 2026-05-15

Tags

Data Science

D2L

Dive into Deep Learning @ D2L