Learn Before
Intuition behind Gradient Descent with Momentum
As shown in the picture below, for Gradient descent optimizer, we will have ups and downs in the vertical direction, but it continues to go right in the horizontal direction. By taking the average of the few previous gradients, you will decrease oscillations in the vertical direction by averaging out positive and negative values. And since all gradients point to the same direction horizontally, the result in the horizontal direction will remain a large value in the right direction.
0
3
Contributors are:
Who are from:
Tags
Data Science
Related
Intuition behind Gradient Descent with Momentum
These plots were generated with gradient descent; with gradient descent with momentum (β = 0.5) and gradient descent with momentum (β = 0.9). Which curve corresponds to which algorithm?
Gradient Descent with Momentum Pseudocode
Adam (Deep Learning Optimization Algorithm)