Learn Before
Minibatch Size Selection Trade-off
Although increasing the minibatch size reduces the variance of gradient estimates, this benefit exhibits diminishing returns. Beyond a certain point, the additional reduction in standard deviation becomes minimal relative to the linear increase in computational cost per iteration. Therefore, in practice, the minibatch size is chosen to be large enough to offer good computational efficiency and stable gradient estimates, while still fitting within the memory constraints of the hardware, such as a GPU.
0
2
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
An Example of Mini-Batches
Example Using Mini-Batch Gradient Descent (Learning Rate Decay)
Which of these statements about mini-batch gradient descent do you agree with?
Why is the best mini-batch size usually not 1 and not m, but instead something in-between?
Suppose your learning algorithm’s cost J, plotted as a function of the number of iterations, looks like the image below:
Stochastic Gradient Descent Algorithm
Loss Gradient over a Mini-batch
Minibatch Size Selection Trade-off
Batch vs Stochastic vs Mini-Batch Gradient Descent
Mini-Batch Gradient Descent Algorithm