Learn Before
State of the Key-Value Cache During Generation
An autoregressive transformer model is generating a sequence of tokens. After processing the first three tokens (token 1, token 2, token 3), it is about to compute the attention scores to generate the fourth token. Describe the specific contents of the key-value cache at this exact moment. Your description should explain what information is stored and why it is structured that way for the upcoming computation.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive model is generating a sequence of text. It has already produced 100 tokens and is now in the process of generating the 101st token. The model uses an optimization where the key and value vectors for all 100 previous tokens are stored and reused. What is the primary computational advantage of this storage mechanism when calculating the attention for the 101st token?
Inference Performance Analysis
State of the Key-Value Cache During Generation