Concept

Overlapping Gradient Computation and Synchronization

In deep neural networks, gradients are computed sequentially from the output layers back to the input layers during backpropagation. To improve distributed training performance, systems can begin synchronizing the gradients of the already-processed upper layers while the lower layers are still computing their gradients. This overlapping of communication and computation minimizes hardware idle time and accelerates the overall training iteration.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L