Learn Before
Activity (Process)

Mini-Batch Gradient Descent Algorithm

For t=1,2,,Nt = 1, 2, \ldots, N (where NN is the number of mini-batches):

  1. Forward propagate on mini-batch X{t}X^{\{t\}}.
  2. Compute the cost function J{t}J^{\{t\}} for that mini-batch.
  3. Backpropagate to compute gradients with respect to J{t}J^{\{t\}}, using X{t}X^{\{t\}} and Y{t}Y^{\{t\}}.
  4. Update parameters: W[l]=W[l]αdW[l]W^{[l]} = W^{[l]} - \alpha \, dW^{[l]}, b[l]=b[l]αdb[l]b^{[l]} = b^{[l]} - \alpha \, db^{[l]}, where α\alpha is the learning rate and ll indexes each layer.

One complete pass through all NN mini-batches constitutes one epoch of training.

0

2

Updated 2026-05-17

Tags

Data Science