Learn Before
Theory

Momentum Convergence on a Scalar Quadratic

To mathematically analyze the convergence of the momentum method on a scalar quadratic function f(x)=λ2x2f(x) = \frac{\lambda}{2} x^2, the update equations for both the position xx and velocity vv can be formulated as a coupled system: [vt+1 xt+1]=[βλ ηβ(1ηλ)][vt xt]\begin{bmatrix} v_{t+1} \ x_{t+1} \end{bmatrix} = \begin{bmatrix} \beta & \lambda \ -\eta \beta & (1 - \eta \lambda) \end{bmatrix} \begin{bmatrix} v_{t} \ x_{t} \end{bmatrix}. The convergence behavior is entirely governed by the eigenvalues of this 2imes22 imes 2 transition matrix. Mathematical analysis of this matrix shows that the velocity converges when the hyperparameters satisfy 0<ηλ<2+2β0 < \eta \lambda < 2 + 2 \beta. This feasible range is substantially larger than the 0<ηλ<20 < \eta \lambda < 2 constraint required for standard gradient descent, mathematically confirming that large momentum coefficients (β\beta) safely permit much larger learning rates without divergence.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L