Learn Before
An AI development team is optimizing their language model's inference speed. They observe that generating a long response token-by-token is significantly more time-consuming than processing the initial user prompt, even when the prompt is long. While the sequential nature of the generation is a factor, which of the following provides the most fundamental explanation for this high computational cost?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI development team is optimizing their language model's inference speed. They observe that generating a long response token-by-token is significantly more time-consuming than processing the initial user prompt, even when the prompt is long. While the sequential nature of the generation is a factor, which of the following provides the most fundamental explanation for this high computational cost?
Analyzing Inference Performance Bottlenecks
Deconstructing the High Cost of Autoregressive Decoding