Analysis of Computational Costs in Transformer Inference
Explain why the decoding phase in a Transformer model's inference process is typically more computationally expensive than the prefilling phase. Go beyond simply stating that it's a sequential process and identify at least two distinct contributing factors.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Increased Complexity and Cost from Exploring Multiple Decoding Paths
Inference Performance Bottleneck Analysis
Analysis of Computational Costs in Transformer Inference
Factors Contributing to High Decoding Cost
An engineer observes that generating a 200-token response from a large language model takes significantly more time than processing the initial 200-token input prompt. Which of the following statements provides the most accurate technical explanation for this performance difference?