Learn Before
Claimed Linear Time Complexity of Self-Attention in Autoregressive Generation
An assertion has been made that the time complexity for self-attention in generating a sequence of length len across L layers is linear, specifically . This claim is based on the computational cost of two key products at each generation step: the dot product between the query and key vectors () and the product of the Softmax output with the value vectors.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Time Complexity of Self-Attention in Autoregressive Generation
Claimed Linear Time Complexity of Self-Attention in Autoregressive Generation
In a model that generates text one token at a time, suppose it has already produced a sequence of length
Nand is now calculating the next token (at positionN+1). Which of the following best identifies the two primary computational operations within the attention mechanism that cause the cost of this single step to scale linearly with the current sequence lengthN?Analyzing Generation Latency
Predicting Attention Computation Time