Learn Before
RMSProp Optimization Trajectory in 2D
To visualize RMSProp's convergence behavior, the algorithm is applied to the two-dimensional quadratic function with a learning rate of and decay parameter . The coordinate-wise implementation computes gradients and , updates the leaky averages of squared gradients as , and adjusts each coordinate by . After epochs, the variables converge near the origin (, ). Unlike AdaGrad, which stalls in later iterations because the learning rate decreases too quickly, RMSProp maintains effective progress throughout training because is controlled independently from the state variable rescaling.
python def rmsprop_2d(x1, x2, s1, s2): g1, g2, eps = 0.2 * x1, 4 * x2, 1e-6 s1 = gamma * s1 + (1 - gamma) * g1 ** 2 s2 = gamma * s2 + (1 - gamma) * g2 ** 2 x1 -= eta / math.sqrt(s1 + eps) * g1 x2 -= eta / math.sqrt(s2 + eps) * g2 return x1, x2, s1, s2
eta, gamma = 0.4, 0.9
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
RMSprop (Deep Learning Optimization Algorithm) Python implementation
Adam vs. SGD vs. RMSProp vs. SWA vs. AdaTune
RMSprop (Deep Learning Optimization Algorithm) Pseudocode
Adam (Deep Learning Optimization Algorithm)
RMSProp Optimization Trajectory in 2D
RMSProp Optimizer From-Scratch Implementation
Effective Observation Window of RMSProp
RMSprop (Deep Learning Optimization Algorithm) Mathematical Implementations
Adadelta