Comparison

Comparison of Prefilling and Decoding Phases

The prefilling and decoding phases of Large Language Model inference differ significantly across several dimensions. While prefilling aims to establish the initial context from the input sequence, decoding focuses on continuing to generate subsequent tokens. In prefilling, tokens are visible all at once and processed in parallel to build an encoded contextual representation. In contrast, decoding operates with sequential visibility, predicting one token at a time using the previously cached key-value pairs. Consequently, prefilling is typically a compute-bound process with a high computational cost, whereas decoding is memory-bound and incurs a very high computational cost as the sequence grows.

Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related