Learn Before
Concept

Gradient Descent with Momentum Pseudocode

On iteration t: Compute dW, db on the current mini-batch vdW=βvdW+(1β)dWv_{dW} = \beta v_{dW} + (1-\beta)dW vdb=βvdb+(1β)dbv_{db} = \beta v_{db} + (1-\beta)db W=WαvdW,b=bαvdbW = W - \alpha v_{dW}, b = b - \alpha v_{db} Note that now we have two parameters α\alpha and β\beta.

0

1

Updated 2021-03-19

Tags

Data Science