Concept

RMSprop (Deep Learning Optimization Algorithm) Mathematical Implementations

Gt=βGt1+(1β)J2(Wt) G^{t} = \beta G^{t-1} + (1 - \beta) \nabla J^2(W^{t})

Wt=Wt1αGt+ϵJ2(Wt)W^{t} = W^{t-1} - \frac{\alpha}{\sqrt{G^{t} + \epsilon}} \nabla J^2(W^{t})


The same principle applies to the bias parameters

GtG^{t} - helper matrix for the algorithm

β\beta - the term that helps us to decrease the matrix G(usually around 0.9)

WtW^{t} - the parameters

α\alpha - starting learning rate(usually something around 0.1 or 0.01)

ϵ\epsilon - it is just to avoid division by zero( usually around 1e-8 )

0

2

Updated 2020-11-16

Tags

Data Science