Concept

Parameter Synchronization Strategies

In data-parallel training, after gradients are computed across multiple devices, they must be synchronized to update the model parameters. This synchronization can be implemented using centralized strategies, where all gradients are sent to a single GPU or the CPU for aggregation, or through distributed strategies, where gradients are partitioned and aggregated simultaneously across multiple GPUs to leverage the full bandwidth of hardware switches.

Image 0

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L