Learn Before
Dynamic K/V Cache in Transformer Decoding
Imagine a Transformer-based language model is generating a response. Compare the composition of the key and value vector set used to generate the first token of the response with the set used to generate the tenth token of the response. Explain why this composition changes.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive language model is generating a sequence one token at a time. It has already processed the initial input 'The cat sat on the' and has subsequently generated the tokens 'mat and'. The model is now in the process of generating the token that will follow 'and'. What set of key and value vectors will the new query vector for this step attend to?
Consider a language model generating a sequence of text one token at a time after being given an initial prompt. For the generation of the tenth token in the output sequence, the newly created query vector will attend to a set of key and value vectors derived only from the nine previously generated tokens.
Dynamic K/V Cache in Transformer Decoding