Learn Before
Example of a Single-Step KV Cache Update
The update mechanism for a Key-Value (KV) cache during a single step of autoregressive generation can be illustrated as follows. Initially, the cache contains the key-value pairs for all preceding positions, from 1 to i'. When the input for the next position, i' + 1, is processed, it undergoes linear transformations to generate a new query vector (), key vector (), and value vector (). The newly generated key-value pair is then appended to the cache. Subsequently, the new query vector performs an attention operation over the entire, now updated, set of keys from position 1 to i' + 1.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Single-Step Generation with a KV Cache
Formula for Updating the Key Matrix in the KV Cache
Formula for Updating the Value Matrix in the KV Cache
Example of a Single-Step KV Cache Update
During autoregressive text generation, a model has already processed
Ntokens and stored their corresponding key and value vectors in a cache. When the model processes the(N+1)-th token, how is this cache utilized and modified to compute the output for this new step?An autoregressive model is generating a sequence and has just processed the token at position
t. The Key-Value cache currently stores the key and value vectors for all tokens from position 1 tot. As the model processes the next token at positiont+1, which statement correctly describes how the cache is updated and used for the attention calculation at this new step?Notation for Current Query, Key, and Value Vectors (
q',k',v')Diagram of a Single-Step KV Cache Update and Attention
Debugging a Flawed KV Cache Implementation
Learn After
An autoregressive model is generating a sequence of outputs one step at a time. At step 't', the model has already processed all inputs from step 1 to 't-1' and stored their corresponding key-value pairs. To calculate the output for the current step 't', a new query vector (q_t) is generated. Which set of key vectors must this new query vector attend to in order to correctly incorporate all available context?
An autoregressive model is generating the next token in a sequence and has already processed the first 'N' tokens, with their corresponding key-value pairs stored in a cache. For the generation of the '(N+1)th' token, arrange the following actions in the correct chronological order.
KV Cache State During Generation