Learn Before
Concept

Nesterov momentum (Deep Learning Optimization Algorithm)

In the momentum method we basically first moved our weight in the direction of the current gradient and then moved in the direction of momentum (weighted sum of all previous steps). Now in the new method we first move in the direction of the momentum and then calculate the gradient at the new point. Using this gradient we move in the direction of the new gradient.

The update rules are as follows: vαvϵθ[1mi=1mL(f(x(i);θ+αv),y(i))]v \leftarrow \alpha v - \epsilon \nabla_{\theta} [\frac{1}{m} \sum^{m}_{i=1} L(f(x^{(i)};\theta + \alpha v), y^{(i)})] θθ+v\theta \leftarrow \theta + v

0

1

Updated 2021-06-24

Tags

Data Science