Learn Before
During autoregressive text generation, a model has already processed N tokens and stored their corresponding key and value vectors in a cache. When the model processes the (N+1)-th token, how is this cache utilized and modified to compute the output for this new step?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Single-Step Generation with a KV Cache
Formula for Updating the Key Matrix in the KV Cache
Formula for Updating the Value Matrix in the KV Cache
Example of a Single-Step KV Cache Update
During autoregressive text generation, a model has already processed
Ntokens and stored their corresponding key and value vectors in a cache. When the model processes the(N+1)-th token, how is this cache utilized and modified to compute the output for this new step?An autoregressive model is generating a sequence and has just processed the token at position
t. The Key-Value cache currently stores the key and value vectors for all tokens from position 1 tot. As the model processes the next token at positiont+1, which statement correctly describes how the cache is updated and used for the attention calculation at this new step?Notation for Current Query, Key, and Value Vectors (
q',k',v')Diagram of a Single-Step KV Cache Update and Attention
Debugging a Flawed KV Cache Implementation