Concept

Effective Observation Window of RMSProp

When the weighting term γ\gamma is set to 0.90.9, the state variable s\mathbf{s} in RMSProp effectively aggregates information over the past 11γ=110.9=10\frac{1}{1 - \gamma} = \frac{1}{1 - 0.9} = 10 observations of the squared gradient. This applies the general exponentially weighted average principle—where a decay factor β\beta yields an effective window of 11β\frac{1}{1 - \beta}—specifically to RMSProp's squared gradient state variable. The quantity 11γ\frac{1}{1 - \gamma} defines the effective observation window: a larger γ\gamma produces a longer memory (smoother average), while a smaller γ\gamma makes the algorithm more responsive to recent gradients.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L