Learn Before
Concise RMSProp Implementation
As a widely adopted optimization algorithm, RMSProp is available as a built-in optimizer in all major deep learning frameworks, allowing practitioners to use it without manually coding the state variable updates. In PyTorch, the optimizer is instantiated via torch.optim.RMSprop, with the decay parameter passed as alpha and the learning rate as lr. In MXNet's Gluon API, the algorithm is specified by the string 'rmsprop', with the decay parameter assigned to gamma1 and the learning rate to learning_rate. In TensorFlow, the optimizer is created using tf.keras.optimizers.RMSprop, where the decay parameter is named rho and the learning rate is learning_rate. Despite the differing parameter names across frameworks, the underlying algorithm is identical: each maintains the exponentially weighted average of squared gradients internally and performs the adaptive learning rate scaling automatically. When trained on the Airfoil Self-Noise dataset with a learning rate of and , all three implementations converge to a training loss of approximately , matching the from-scratch implementation's performance.
python
PyTorch
trainer = torch.optim.RMSprop d2l.train_concise_ch11(trainer, {'lr': 0.01, 'alpha': 0.9}, data_iter)
0
1
Tags
D2L
Dive into Deep Learning @ D2L