Learn Before
Concept
Parameter Synchronization Strategies
In data-parallel training, after gradients are computed across multiple devices, they must be synchronized to update the model parameters. This synchronization can be implemented using centralized strategies, where all gradients are sent to a single GPU or the CPU for aggregation, or through distributed strategies, where gradients are partitioned and aggregated simultaneously across multiple GPUs to leverage the full bandwidth of hardware switches.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L