Learn Before
Concept
Overlapping Gradient Computation and Synchronization
In deep neural networks, gradients are computed sequentially from the output layers back to the input layers during backpropagation. To improve distributed training performance, systems can begin synchronizing the gradients of the already-processed upper layers while the lower layers are still computing their gradients. This overlapping of communication and computation minimizes hardware idle time and accelerates the overall training iteration.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L