Learn Before
Concept
Blocking Due to Cross-Device Data Transfer
Moving tensor data across devices severely complicates parallel processing because computational operations must block, or pause, while waiting for the necessary data to be transmitted and received over the system bus. Due to the high baseline overhead of initiating these data transfers, executing numerous small, interspersed copy operations is drastically worse for performance than consolidating data into a single, large transfer operation.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L