Learn Before
Concept

Mini-Batches Size

If your training set is small (m < 2,000), it's better to use Batch Gradient Descent. Make sure that every mini-batch fits in your CPU/GPU memory. It is a common practice to use powers of two as a mini-batch size: 64, 128, 256. This is related to the fact that the number of physical processors of the GPU tend to be a power of 2. If the batch size is too small, the loss curve will oscillate and affect the stability of training

0

2

Updated 2021-10-30

Tags

Data Science

Learn After