Learn Before
Effective Observation Window of RMSProp
When the weighting term is set to , the state variable in RMSProp effectively aggregates information over the past observations of the squared gradient. This applies the general exponentially weighted average principle—where a decay factor yields an effective window of —specifically to RMSProp's squared gradient state variable. The quantity defines the effective observation window: a larger produces a longer memory (smoother average), while a smaller makes the algorithm more responsive to recent gradients.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
RMSprop (Deep Learning Optimization Algorithm) Python implementation
Adam vs. SGD vs. RMSProp vs. SWA vs. AdaTune
RMSprop (Deep Learning Optimization Algorithm) Pseudocode
Adam (Deep Learning Optimization Algorithm)
RMSProp Optimization Trajectory in 2D
RMSProp Optimizer From-Scratch Implementation
Effective Observation Window of RMSProp
RMSprop (Deep Learning Optimization Algorithm) Mathematical Implementations
Adadelta