Learn Before
Concept

Batch Normalization in Data Parallelism

Applying batch normalization during multi-GPU data parallelism requires specific architectural adjustments. Because the global minibatch is distributed across multiple devices, computing the exact normalization statistics across the entire batch would necessitate costly cross-device synchronization. A practical solution is to maintain a separate batch normalization coefficient for each GPU, allowing each device to calculate its own mean and variance statistics locally based solely on its assigned subset of the minibatch data.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related