Learn Before
Concept

Batch Normalization and Batch Size

When applying batch normalization, the choice of minibatch size is highly significant. For fully connected layers, if batch normalization is applied with a minibatch of size 11, the network cannot learn because subtracting the mean causes each hidden unit to take a value of 00. Therefore, a suitably large minibatch is required for stable training. However, in the context of convolutional layers, batch normalization remains well-defined even for minibatches of size 11, because the mean and variance are computed simultaneously across all spatial locations within the single image observation.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L