Formula

RMSprop (Deep Learning Optimization Algorithm) Mathematical Implementations

The RMSProp update rule maintains a state variable GtG^{t} that tracks the exponentially weighted average of squared gradients, and uses it to adaptively scale the learning rate:

abla J(W^{t}))^2$$ $$W^{t} = W^{t-1} - \frac{\alpha}{\sqrt{G^{t} + \epsilon}} abla J(W^{t})$$ The same principle applies to the bias parameters. - $$G^{t}$$: helper matrix for the algorithm - $$\beta$$: the decay factor controlling how quickly the running average forgets old observations (typically around $$0.9$$) - $$W^{t}$$: the model parameters - $$\alpha$$: the initial learning rate (typically around $$0.01$$) - $$\epsilon$$: a small constant to prevent division by zero (typically around $$10^{-6}$$ or $$10^{-8}$$)

0

2

Updated 2026-05-15

Tags

Data Science

D2L

Dive into Deep Learning @ D2L