Learn Before
Momentum Convergence on a Scalar Quadratic
To mathematically analyze the convergence of the momentum method on a scalar quadratic function , the update equations for both the position and velocity can be formulated as a coupled system: . The convergence behavior is entirely governed by the eigenvalues of this transition matrix. Mathematical analysis of this matrix shows that the velocity converges when the hyperparameters satisfy . This feasible range is substantially larger than the constraint required for standard gradient descent, mathematically confirming that large momentum coefficients () safely permit much larger learning rates without divergence.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Intuition behind Gradient Descent with Momentum
These plots were generated with gradient descent; with gradient descent with momentum (β = 0.5) and gradient descent with momentum (β = 0.9). Which curve corresponds to which algorithm?
Adam (Deep Learning Optimization Algorithm)
Origin of the Momentum Method
Velocity Initialization in Momentum Method
Momentum Convergence on a Scalar Quadratic
Gradient Descent with Momentum Pseudocode