1Cademy - RMSProp Optimization Trajectory in 2D

Learn Before

RMSprop (Deep Learning Optimization Algorithm)

Example

RMSProp Optimization Trajectory in 2D

To visualize RMSProp's convergence behavior, the algorithm is applied to the two-dimensional quadratic function $f(\mathbf{x}) = 0.1 x_1^2 + 2 x_2^2$ with a learning rate of $\eta = 0.4$ and decay parameter $\gamma = 0.9$ . The coordinate-wise implementation computes gradients $g_1 = 0.2 x_1$ and $g_2 = 4 x_2$ , updates the leaky averages of squared gradients as $s_i = \gamma s_i + (1 - \gamma) g_i^2$ , and adjusts each coordinate by $x_i \leftarrow x_i - \frac{\eta}{\sqrt{s_i + \epsilon}} g_i$ . After 20 epochs, the variables converge near the origin ( $x_1 \approx -0.0106$ , $x_2 \approx 0$ ). Unlike AdaGrad, which stalls in later iterations because the learning rate decreases too quickly, RMSProp maintains effective progress throughout training because $\eta$ is controlled independently from the state variable rescaling.

def rmsprop_2d(x1, x2, s1, s2):
    g1, g2, eps = 0.2 * x1, 4 * x2, 1e-6
    s1 = gamma * s1 + (1 - gamma) * g1 ** 2
    s2 = gamma * s2 + (1 - gamma) * g2 ** 2
    x1 -= eta / math.sqrt(s1 + eps) * g1
    x2 -= eta / math.sqrt(s2 + eps) * g2
    return x1, x2, s1, s2

eta, gamma = 0.4, 0.9

0

1

Updated 2026-06-26

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related