1Cademy - Momentum Convergence on a Scalar Quadratic

Learn Before

Theory

Momentum Convergence on a Scalar Quadratic

To analyze the convergence of the momentum method on a scalar quadratic function $f(x) = \frac{\lambda}{2} x^2$ , the update equations for the position $x$ and velocity $v$ can be formulated as a coupled system: $\begin{bmatrix} v_{t+1} x_{t+1} \end{bmatrix} = \begin{bmatrix} \beta & \lambda -\eta \beta & (1 - \eta \lambda) \end{bmatrix} \begin{bmatrix} v_{t} x_{t} \end{bmatrix}$ . The convergence behavior is entirely governed by the eigenvalues of this $2 \times 2$ transition matrix. Analysis of this matrix reveals that the velocity converges when the hyperparameters satisfy $0 < \eta \lambda < 2 + 2 \beta$ . This feasible range is substantially larger than the $0 < \eta \lambda < 2$ constraint required for standard gradient descent, confirming that large momentum coefficients ( $\beta$ ) safely permit much larger learning rates without divergence.

0

1

Updated 2026-06-25

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related