Example

RMSProp Training on Airfoil Dataset

When training a linear regression model from scratch on the Airfoil Self-Noise dataset using the RMSProp optimizer with an initial learning rate of 0.010.01, a decay parameter γ=0.9\gamma = 0.9, and a batch size of 1010, the training loss converges to approximately 0.2450.245. This demonstrates that RMSProp can effectively train deep network models when the learning rate and decay factor are configured appropriately. The typical hyperparameter configuration uses a modest learning rate paired with a high decay factor, contrasting with AdaGrad which often demands a larger initial learning rate to counteract its aggressive decay.

Image 0

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L