Learn Before
RMSProp Optimizer From-Scratch Implementation
A from-scratch implementation of the RMSProp optimizer for deep networks requires maintaining an auxiliary state variable for each parameter tensor, initialized to zeros with the same shape. During each update step, the state is updated as a leaky average of squared gradients: , where is the decay factor and is the current gradient. The parameter is then decremented by the learning rate times the gradient divided by the square root of the state plus a numerical stability constant (). Finally, the parameter gradients are zeroed out.
In PyTorch, this can be implemented as follows:
python def init_rmsprop_states(feature_dim): s_w = torch.zeros((feature_dim, 1)) s_b = torch.zeros(1) return (s_w, s_b)
def rmsprop(params, states, hyperparams): gamma, eps = hyperparams['gamma'], 1e-6 for p, s in zip(params, states): with torch.no_grad(): s[:] = gamma * s + (1 - gamma) * torch.square(p.grad) p[:] -= hyperparams['lr'] * p.grad / torch.sqrt(s + eps) p.grad.data.zero_()
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
RMSprop (Deep Learning Optimization Algorithm) Python implementation
Adam vs. SGD vs. RMSProp vs. SWA vs. AdaTune
RMSprop (Deep Learning Optimization Algorithm) Pseudocode
Adam (Deep Learning Optimization Algorithm)
RMSProp Optimization Trajectory in 2D
RMSProp Optimizer From-Scratch Implementation
Effective Observation Window of RMSProp
RMSprop (Deep Learning Optimization Algorithm) Mathematical Implementations
Adadelta