Code

Global Synchronization in PyTorch

In PyTorch, developers can explicitly force the system to complete all pending backend computations before returning control to the frontend by utilizing a synchronization barrier. Specifically, calling torch.cuda.synchronize(device) blocks the Python frontend thread until every operation queued on the designated GPU device has finished executing. This global synchronization is essential for tasks such as precise performance benchmarking; without it, measured execution times would incorrectly reflect only the negligible delay of adding tasks to the backend queue, rather than the true computational duration.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L