1Cademy - Factors Contributing to High Decoding Cost

Learn Before

Computational Cost Comparison: Decoding vs. Prefilling

Concept

Factors Contributing to High Decoding Cost

The higher computational expense of the decoding phase compared to prefilling is not solely attributable to its sequential, one-by-one token generation and the repeated updates to the KV cache. While these factors contribute, the full explanation for its significant cost involves more complex underlying reasons.

Updated 2026-01-15

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

An AI development team is optimizing their language model's inference speed. They observe that generating a long response token-by-token is significantly more time-consuming than processing the initial user prompt, even when the prompt is long. While the sequential nature of the generation is a factor, which of the following provides the most fundamental explanation for this high computational cost?
Analyzing Inference Performance Bottlenecks
Deconstructing the High Cost of Autoregressive Decoding

Learn Before

Related

Learn After