1Cademy - Computation-to-Synchronization Ratio and Multi-GPU Scalability

Learn Before

Data Parallelism

Concept

Computation-to-Synchronization Ratio and Multi-GPU Scalability

The effectiveness of multi-GPU data parallelism depends critically on the ratio of computation time to synchronization overhead. When a model is computationally lightweight (e.g., LeNet), the time spent on the forward pass and gradient computation is comparable to or smaller than the time required for cross-device parameter synchronization and Python scheduling overhead. In such cases, adding more GPUs yields no meaningful speedup. Conversely, when a model is sufficiently complex (e.g., ResNet-18), the per-device computation time dominates the synchronization cost, making the parallelization overhead relatively negligible and enabling significant scalability improvements as more GPUs are added.

Updated 2026-06-23

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Single vs. Dual GPU Training Comparison of LeNet on Fashion-MNIST

Learn Before

Related

Learn After