Concept

Performance Cost of Cross-Device Tensor Transfer

Transferring tensor data between different hardware devices (such as moving data from the main memory to a GPU) is an exceptionally slow operation, typically much slower than executing the mathematical computations themselves. Deep learning frameworks intentionally require users to explicitly command these transfers rather than performing them automatically under the hood. This design prevents developers from inadvertently writing highly inefficient code where the framework silently copies data back and forth, crashing the program instead to alert the user of the device mismatch.

0

1

Updated 2026-05-09

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L