1Cademy - AdaGrad Optimization Trajectory in 2D

Learn Before

Code

AdaGrad Optimization Trajectory in 2D

To observe the behavior of AdaGrad in a quadratic convex problem, we can apply it to the two-dimensional function $f(\mathbf{x}) = 0.1 x_1^2 + 2 x_2^2$ . In Python, the coordinate-wise update can be expressed as:

import math

def adagrad_2d(x1, x2, s1, s2, eta):
    eps = 1e-6
    g1, g2 = 0.2 * x1, 4 * x2
    s1 += g1 ** 2
    s2 += g2 ** 2
    x1 -= eta / math.sqrt(s1 + eps) * g1
    x2 -= eta / math.sqrt(s2 + eps) * g2
    return x1, x2, s1, s2

When optimized with a standard learning rate (e.g., $\eta = 0.4$ ), the trajectory is initially smooth, but the independent variables stop moving early due to the cumulative effect of the state variable $\mathbf{s}_t$ continuously decaying the learning rate. Increasing the initial learning rate to a much larger value (e.g., $\eta = 2$ ) yields better convergence behavior, demonstrating that AdaGrad's learning rate decrease can be quite aggressive and may require careful hyperparameter selection.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

University of California, Berkeley

References

Learn Before

Related