Learn Before
Analyzing Generation Latency
Focusing on the causal attention mechanism, explain the primary reason for the observed linear growth in processing time for each new token.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Time Complexity of Self-Attention in Autoregressive Generation
Claimed Linear Time Complexity of Self-Attention in Autoregressive Generation
In a model that generates text one token at a time, suppose it has already produced a sequence of length
Nand is now calculating the next token (at positionN+1). Which of the following best identifies the two primary computational operations within the attention mechanism that cause the cost of this single step to scale linearly with the current sequence lengthN?Analyzing Generation Latency
Predicting Attention Computation Time