Learn Before
Code

Gradient Descent with Momentum Pseudocode

On iteration tt: Compute dWdW, dbdb on the current mini-batch

vdW=βvdW+(1β)dWv_{dW} = \beta v_{dW} + (1-\beta)dW

vdb=βvdb+(1β)dbv_{db} = \beta v_{db} + (1-\beta)db

W=WαvdW,b=bαvdbW = W - \alpha v_{dW}, b = b - \alpha v_{db}

Note that now we have two parameters α\alpha and β\beta.

0

1

Updated 2026-05-17

Tags

Data Science