Code

Nesterov algorithm formula

stept=βstept1+αJ(Wt1βstept1)step_{t} = \beta step_{t-1} + \alpha \nabla J(W^{t-1} - \beta step_{t-1} ) Wt=Wt1steptW^{t} = W^{t-1} - step_{t}

J(Wt1βstept1)\nabla J(W^{t-1} - \beta step_{t-1} ) - is the gradient calculated for the new point where J is the cost function

steptstep_{t} - step at time stamp t

WtW^{t} - the parameters for the layer at the time stamp t

α\alpha - learning rate

β\beta - another hyperparameter(mostly people use use 0.9)

Also the same process is done for bias parameters

0

1

Updated 2020-11-16

Tags

Deep Learning (in Machine learning)

Data Science