1Cademy - Prefilling as a Compute-Bound Process

Learn Before

Parallel Self-Attention in the Prefilling Phase

Concept

Prefilling as a Compute-Bound Process

The prefilling phase is generally considered a compute-bound process. This is because the parallel computation of self-attention for the entire sequence merges many operations into a single, large one. This approach minimizes data transfers between memory and the processing unit (like a GPU), meaning the primary performance limitation becomes the raw computational power of the hardware, rather than the speed at which data can be moved (memory bandwidth).

Updated 2026-05-03

Contributors are: