Learn Before
Concept

Computation-to-Synchronization Ratio and Multi-GPU Scalability

The effectiveness of multi-GPU data parallelism depends critically on the ratio of computation time to synchronization overhead. When a model is computationally lightweight (e.g., LeNet), the time spent on the forward pass and gradient computation is comparable to or smaller than the time required for cross-device parameter synchronization and Python scheduling overhead. In such cases, adding more GPUs yields no meaningful speedup. Conversely, when a model is sufficiently complex (e.g., ResNet-18), the per-device computation time dominates the synchronization cost, making the parallelization overhead relatively negligible and enabling significant scalability improvements as more GPUs are added.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Related