Comparison

Computational Cost Comparison: Decoding vs. Prefilling

In most inference scenarios, the decoding phase of a Transformer model incurs a higher computational cost than the prefilling phase. This increased expense is not merely a result of the sequential, token-by-token generation and the repeated updates to the KV cache; other complex factors also contribute to its high resource demand.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences