Concept

Mathematical Implementation

Gt=Gt1+J2(Wt1)G^{t} = G^{t-1} + \nabla J^2(W^{t-1})

Wt=Wt1αGt+ϵJ(Wt1)W^{t} = W^{t-1} - \frac{ \alpha} {\sqrt{G^{t} + \epsilon}} \nabla J(W^{t-1})

GtG^{t} - helper matrix for the algorithm

WtW^{t} - the parameters

α\alpha - starting learning rate(usually someting around 0.1 or 0.01)

ϵ\epsilon - it is just to avoid division by zero( usually around 1e-8 )

The same principle applies to the bias parameters

0

2

Updated 2020-11-16

Tags

Deep Learning (in Machine learning)

Data Science