Learn Before
Predicting Attention Computation Time
An autoregressive language model takes 50 milliseconds to compute the attention for the 100th token in a sequence. Based on the computational scaling of the causal attention mechanism at a single generation step, approximately how long would you expect it to take to compute the attention for the 400th token in the same sequence? Explain your reasoning.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Time Complexity of Self-Attention in Autoregressive Generation
Claimed Linear Time Complexity of Self-Attention in Autoregressive Generation
In a model that generates text one token at a time, suppose it has already produced a sequence of length
Nand is now calculating the next token (at positionN+1). Which of the following best identifies the two primary computational operations within the attention mechanism that cause the cost of this single step to scale linearly with the current sequence lengthN?Analyzing Generation Latency
Predicting Attention Computation Time