An autoregressive model is generating a sequence of text. To produce the 5th token, it computes attention using a query from position 5 and the key/value pairs from positions 1-4. When the model then proceeds to generate the 6th token, which statement accurately describes the most computationally efficient approach for handling the key and value pairs from the first four tokens (positions 1-4)?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Key-Value (KV) Cache in Transformer Inference
Computational Efficiency in Autoregressive Generation
An autoregressive model is generating a sequence of text. To produce the 5th token, it computes attention using a query from position 5 and the key/value pairs from positions 1-4. When the model then proceeds to generate the 6th token, which statement accurately describes the most computationally efficient approach for handling the key and value pairs from the first four tokens (positions 1-4)?
During an autoregressive text generation process, to produce the 10th token in a sequence, the model must re-calculate the key and value vectors for all nine preceding tokens to ensure the contextual information is current.