Mini-Batch Gradient Descent Algorithm
For (where is the number of mini-batches):
- Forward propagate on mini-batch .
- Compute the cost function for that mini-batch.
- Backpropagate to compute gradients with respect to , using and .
- Update parameters: , , where is the learning rate and indexes each layer.
One complete pass through all mini-batches constitutes one epoch of training.
0
2
Contributors are:
Who are from:
Tags
Data Science
Related
An Example of Mini-Batches
Example Using Mini-Batch Gradient Descent (Learning Rate Decay)
Which of these statements about mini-batch gradient descent do you agree with?
Why is the best mini-batch size usually not 1 and not m, but instead something in-between?
Suppose your learning algorithm’s cost J, plotted as a function of the number of iterations, looks like the image below:
Stochastic Gradient Descent Algorithm
Loss Gradient over a Mini-batch
Minibatch Size Selection Trade-off
Batch vs Stochastic vs Mini-Batch Gradient Descent
Mini-Batch Gradient Descent Algorithm
Which of these statements about mini-batch gradient descent do you agree with?
Common Learning Rate Decay Implementation
Mini-Batch Gradient Descent Algorithm