Code

RMSProp Optimizer From-Scratch Implementation

A from-scratch implementation of the RMSProp optimizer for deep networks requires maintaining an auxiliary state variable for each parameter tensor, initialized to zeros with the same shape. During each update step, the state is updated as a leaky average of squared gradients: sγs+(1γ)g2\mathbf{s} \leftarrow \gamma \mathbf{s} + (1 - \gamma) \mathbf{g}^2, where γ\gamma is the decay factor and g\mathbf{g} is the current gradient. The parameter is then decremented by the learning rate times the gradient divided by the square root of the state plus a numerical stability constant (ϵ=106\epsilon = 10^{-6}). Finally, the parameter gradients are zeroed out.

In PyTorch, this can be implemented as follows:

python def init_rmsprop_states(feature_dim): s_w = torch.zeros((feature_dim, 1)) s_b = torch.zeros(1) return (s_w, s_b)

def rmsprop(params, states, hyperparams): gamma, eps = hyperparams['gamma'], 1e-6 for p, s in zip(params, states): with torch.no_grad(): s[:] = gamma * s + (1 - gamma) * torch.square(p.grad) p[:] -= hyperparams['lr'] * p.grad / torch.sqrt(s + eps) p.grad.data.zero_()

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L