Learn Before
Concept
Batch Normalization and Batch Size
When applying batch normalization, the choice of minibatch size is highly significant. For fully connected layers, if batch normalization is applied with a minibatch of size , the network cannot learn because subtracting the mean causes each hidden unit to take a value of . Therefore, a suitably large minibatch is required for stable training. However, in the context of convolutional layers, batch normalization remains well-defined even for minibatches of size , because the mean and variance are computed simultaneously across all spatial locations within the single image observation.
0
1
Updated 2026-05-13
Tags
D2L
Dive into Deep Learning @ D2L