1Cademy - Minibatch Size Selection Trade-off

Learn Before

Mini-Batch Gradient Descent

Concept

Minibatch Size Selection Trade-off

Although increasing the minibatch size $\mathcal{B}_t$ reduces the variance of gradient estimates, this benefit exhibits diminishing returns. Beyond a certain point, the additional reduction in standard deviation becomes minimal relative to the linear increase in computational cost per iteration. Therefore, in practice, the minibatch size is chosen to be large enough to offer good computational efficiency and stable gradient estimates, while still fitting within the memory constraints of the hardware, such as a GPU.