Learn Before
Nesterov momentum (Deep Learning Optimization Algorithm)
In the momentum method we basically first moved our weight in the direction of the current gradient and then moved in the direction of momentum (weighted sum of all previous steps). Now in the new method we first move in the direction of the momentum and then calculate the gradient at the new point. Using this gradient we move in the direction of the new gradient.
The update rules are as follows:
0
1
Tags
Data Science
Related
Mini-Batch Gradient Descent
Gradient Descent with Momentum
An overview of gradient descent optimization algorithms
Learning Rate Decay
Gradient Descent
AdaDelta (Deep Learning Optimization Algorithm)
Adam (Deep Learning Optimization Algorithm)
RMSprop (Deep Learning Optimization Algorithm)
AdaGrad (Deep Learning Optimization Algorithm)
Nesterov momentum (Deep Learning Optimization Algorithm)
Challenges with Deep Learning Optimizer Algorithms
Adam optimization algorithm
Difference between Adam and SGD