Learn Before
Formula
RMSprop (Deep Learning Optimization Algorithm) Mathematical Implementations
The RMSProp update rule maintains a state variable that tracks the exponentially weighted average of squared gradients, and uses it to adaptively scale the learning rate:
abla J(W^{t}))^2$$ $$W^{t} = W^{t-1} - \frac{\alpha}{\sqrt{G^{t} + \epsilon}} abla J(W^{t})$$ The same principle applies to the bias parameters. - $$G^{t}$$: helper matrix for the algorithm - $$\beta$$: the decay factor controlling how quickly the running average forgets old observations (typically around $$0.9$$) - $$W^{t}$$: the model parameters - $$\alpha$$: the initial learning rate (typically around $$0.01$$) - $$\epsilon$$: a small constant to prevent division by zero (typically around $$10^{-6}$$ or $$10^{-8}$$)0
2
Updated 2026-05-15
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
RMSprop (Deep Learning Optimization Algorithm) Python implementation
Adam vs. SGD vs. RMSProp vs. SWA vs. AdaTune
RMSprop (Deep Learning Optimization Algorithm) Pseudocode
Adam (Deep Learning Optimization Algorithm)
RMSProp Optimization Trajectory in 2D
RMSProp Optimizer From-Scratch Implementation
Effective Observation Window of RMSProp
RMSprop (Deep Learning Optimization Algorithm) Mathematical Implementations
Adadelta