Code

Concise RMSProp Implementation

As a widely adopted optimization algorithm, RMSProp is available as a built-in optimizer in all major deep learning frameworks, allowing practitioners to use it without manually coding the state variable updates. In PyTorch, the optimizer is instantiated via torch.optim.RMSprop, with the decay parameter γ\gamma passed as alpha and the learning rate as lr. In MXNet's Gluon API, the algorithm is specified by the string 'rmsprop', with the decay parameter assigned to gamma1 and the learning rate to learning_rate. In TensorFlow, the optimizer is created using tf.keras.optimizers.RMSprop, where the decay parameter is named rho and the learning rate is learning_rate. Despite the differing parameter names across frameworks, the underlying algorithm is identical: each maintains the exponentially weighted average of squared gradients internally and performs the adaptive learning rate scaling automatically. When trained on the Airfoil Self-Noise dataset with a learning rate of 0.010.01 and γ=0.9\gamma = 0.9, all three implementations converge to a training loss of approximately 0.2450.245, matching the from-scratch implementation's performance.

python

PyTorch

trainer = torch.optim.RMSprop d2l.train_concise_ch11(trainer, {'lr': 0.01, 'alpha': 0.9}, data_iter)

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L