Learn Before
Diagram of a Single-Step KV Cache Update and Attention
This diagram illustrates a single step of autoregressive generation, focusing on the update of the Key-Value (KV) cache at position i' + 1. An input at this new position undergoes linear transformations to generate a new query vector (), key vector (), and value vector (). The new key-value pair is then appended to the KV cache, which already contains the pairs for all preceding positions from 1 to i'. Finally, the new query performs an attention operation over the entire updated set of keys in the cache (from 1 to i' + 1) to compute the output for the current step.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Single-Step Generation with a KV Cache
Formula for Updating the Key Matrix in the KV Cache
Formula for Updating the Value Matrix in the KV Cache
Example of a Single-Step KV Cache Update
During autoregressive text generation, a model has already processed
Ntokens and stored their corresponding key and value vectors in a cache. When the model processes the(N+1)-th token, how is this cache utilized and modified to compute the output for this new step?An autoregressive model is generating a sequence and has just processed the token at position
t. The Key-Value cache currently stores the key and value vectors for all tokens from position 1 tot. As the model processes the next token at positiont+1, which statement correctly describes how the cache is updated and used for the attention calculation at this new step?Notation for Current Query, Key, and Value Vectors (
q',k',v')Diagram of a Single-Step KV Cache Update and Attention
Debugging a Flawed KV Cache Implementation
Learn After
A language model is generating the 10th token in a sequence using an autoregressive process with a Key-Value (KV) cache. At this step, a new query vector (q₁₀), key vector (k₁₀), and value vector (v₁₀) are computed from the input. The KV cache already contains the key-value pairs from the first 9 steps. Which statement best analyzes the attention computation that occurs for this 10th step?
You are observing a single step of token generation in a large language model that uses a Key-Value cache. Arrange the following operations in the correct chronological order as they would occur during this single step.
An engineer is debugging an autoregressive language model and observes that as it generates longer sequences, its output progressively loses connection to the initial context. The engineer suspects a flaw in how the attention mechanism utilizes the Key-Value (KV) cache during each generation step. Based on the process where a new query attends to the full, updated cache, which of the following errors is the most probable cause for this specific type of performance degradation?