1Cademy - Computational Cost Comparison: Decoding vs. Prefilling

Learn Before

Two-Phase Inference from a KV Cache Perspective

Comparison

Computational Cost Comparison: Decoding vs. Prefilling

In most inference scenarios, the decoding phase of a Transformer model incurs a higher computational cost than the prefilling phase. This increased expense is not merely a result of the sequential, token-by-token generation and the repeated updates to the KV cache; other complex factors also contribute to its high resource demand.

Updated 2026-05-03

Contributors are: