Concept

Prefilling as a Compute-Bound Process

The prefilling phase is generally considered a compute-bound process. This is because the parallel computation of self-attention for the entire sequence merges many operations into a single, large one. This approach minimizes data transfers between memory and the processing unit (like a GPU), meaning the primary performance limitation becomes the raw computational power of the hardware, rather than the speed at which data can be moved (memory bandwidth).

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences