Code
Nesterov algorithm formula
- is the gradient calculated for the new point where J is the cost function
- step at time stamp t
- the parameters for the layer at the time stamp t
- learning rate
- another hyperparameter(mostly people use use 0.9)
Also the same process is done for bias parameters
0
1
Updated 2020-11-16
Tags
Deep Learning (in Machine learning)
Data Science