Learn Before
Notation for Current Query, Key, and Value Vectors (q', k', v')
In autoregressive models, a new set of vectors is generated for the current token at position i': a query vector (q' or q_{i'}), a key vector (k' or k_{i'}), and a value vector (v' or v_{i'}). The new query q' interacts with all previous key vectors to compute attention scores. The new key k' and value v' are then appended to the Key-Value cache, making them available for subsequent tokens.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Single-Step Generation with a KV Cache
Formula for Updating the Key Matrix in the KV Cache
Formula for Updating the Value Matrix in the KV Cache
Example of a Single-Step KV Cache Update
During autoregressive text generation, a model has already processed
Ntokens and stored their corresponding key and value vectors in a cache. When the model processes the(N+1)-th token, how is this cache utilized and modified to compute the output for this new step?An autoregressive model is generating a sequence and has just processed the token at position
t. The Key-Value cache currently stores the key and value vectors for all tokens from position 1 tot. As the model processes the next token at positiont+1, which statement correctly describes how the cache is updated and used for the attention calculation at this new step?Notation for Current Query, Key, and Value Vectors (
q',k',v')Diagram of a Single-Step KV Cache Update and Attention
Debugging a Flawed KV Cache Implementation
Learn After
Notation for a New Value Vector (V_i')
Applying Vector Notation in Autoregressive Generation
During a single step of autoregressive generation in a transformer-based model, a new query vector (
q') and a new key vector (k') are computed for the token being generated. What is the immediate role of these two new vectors in the self-attention mechanism?During a single step of autoregressive generation for a new token, the newly computed value vector (
v') is immediately used to calculate the output for that same token.