1Cademy - RMSProp Optimizer From-Scratch Implementation

Learn Before

RMSprop (Deep Learning Optimization Algorithm)

Code

RMSProp Optimizer From-Scratch Implementation

A from-scratch implementation of the RMSProp optimizer for deep networks requires maintaining an auxiliary state variable for each parameter tensor, initialized to zeros with the same shape. During each update step, the state is updated as a leaky average of squared gradients: $\mathbf{s} \leftarrow \gamma \mathbf{s} + (1 - \gamma) \mathbf{g}^2$ , where $\gamma$ is the decay factor and $\mathbf{g}$ is the current gradient. The parameter is then decremented by the learning rate times the gradient divided by the square root of the state plus a numerical stability constant ( $\epsilon = 10^{-6}$ ). Finally, the parameter gradients are zeroed out.

In PyTorch, this can be implemented as follows:

python def init_rmsprop_states(feature_dim): s_w = torch.zeros((feature_dim, 1)) s_b = torch.zeros(1) return (s_w, s_b)

def rmsprop(params, states, hyperparams): gamma, eps = hyperparams['gamma'], 1e-6 for p, s in zip(params, states): with torch.no_grad(): s[:] = gamma * s + (1 - gamma) * torch.square(p.grad) p[:] -= hyperparams['lr'] * p.grad / torch.sqrt(s + eps) p.grad.data.zero_()

0

1

Updated 2026-05-15

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

RMSProp Training on Airfoil Dataset

Learn Before

Related

Learn After