Relation

Batch vs Stochastic vs Mini-Batch Gradient Descent

Batch gradient descent (batch size = N) takes relatively low noise, relatively large steps. And you could just keep matching to the minimum. However, it may take a long time to process and need additional memory.

Stochastic gradient descent (batch size = 1) is easy to fit in memory and efficient for large datasets. But it can be extremely noisy since sometimes you hit in the wrong direction if that a training example happens to point in a bad direction. It won't ever converge, and will always just kind of oscillate and wander around the region of the minimum.

in practice, mini-batch gradient descent with batch size in between 1 and N works better. It's not guaranteed to always head toward the minimum but it tends to head more consistently in direction of the minimum.

Image 0

0

2

Updated 2021-10-03

Tags

Data Science