Concept

Hardware Data Movement Bottlenecks

Deep learning performance depends heavily on the seamless movement of data from durable storage and RAM to the processors (CPUs or GPUs). If data cannot be loaded quickly enough, or if matrices cannot be moved rapidly to the accelerators, the processing elements will starve, creating a major system bottleneck. To achieve optimal performance, systems must efficiently shuffle data and often interleave communication with computation.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L