Factors Contributing to High Decoding Cost
The higher computational expense of the decoding phase compared to prefilling is not solely attributable to its sequential, one-by-one token generation and the repeated updates to the KV cache. While these factors contribute, the full explanation for its significant cost involves more complex underlying reasons.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Increased Complexity and Cost from Exploring Multiple Decoding Paths
Inference Performance Bottleneck Analysis
Analysis of Computational Costs in Transformer Inference
Factors Contributing to High Decoding Cost
An engineer observes that generating a 200-token response from a large language model takes significantly more time than processing the initial 200-token input prompt. Which of the following statements provides the most accurate technical explanation for this performance difference?
Learn After
An AI development team is optimizing their language model's inference speed. They observe that generating a long response token-by-token is significantly more time-consuming than processing the initial user prompt, even when the prompt is long. While the sequential nature of the generation is a factor, which of the following provides the most fundamental explanation for this high computational cost?
Analyzing Inference Performance Bottlenecks
Deconstructing the High Cost of Autoregressive Decoding