Learn Before
Code
Gradient Descent with Momentum Pseudocode
On iteration : Compute , on the current mini-batch
Note that now we have two parameters and .
0
1
Updated 2026-05-17
Contributors are:
Who are from:
Tags
Data Science
Related
Intuition behind Gradient Descent with Momentum
These plots were generated with gradient descent; with gradient descent with momentum (β = 0.5) and gradient descent with momentum (β = 0.9). Which curve corresponds to which algorithm?
Adam (Deep Learning Optimization Algorithm)
Origin of the Momentum Method
Velocity Initialization in Momentum Method
Momentum Convergence on a Scalar Quadratic
Gradient Descent with Momentum Pseudocode