Learn Before
Mini-Batches Size
If your training set is small (m < 2,000), it's better to use Batch Gradient Descent. Make sure that every mini-batch fits in your CPU/GPU memory. It is a common practice to use powers of two as a mini-batch size: 64, 128, 256. This is related to the fact that the number of physical processors of the GPU tend to be a power of 2. If the batch size is too small, the loss curve will oscillate and affect the stability of training
0
2
Tags
Data Science
Related
An Example of Mini-Batches
Mini-Batch Gradient Descent Algorithm
Batch vs Stochastic vs Mini-Batch Gradient Descent
Example Using Mini-Batch Gradient Descent (Learning Rate Decay)
Mini-Batches Size
Which of these statements about mini-batch gradient descent do you agree with?
Why is the best mini-batch size usually not 1 and not m, but instead something in-between?
Suppose your learning algorithm’s cost J, plotted as a function of the number of iterations, looks like the image below:
Stochastic Gradient Descent Algorithm
Loss Gradient over a Mini-batch