Learn Before
Formula for Cache State Evolution during Autoregressive Decoding
During the decoding phase of autoregressive generation, the Key-Value (KV) cache state corresponds to the sequence of tokens generated so far. The model uses this cache to predict the next token. This sequential process, where each prefix and its associated cache are used to generate the subsequent token, can be represented as: Here, is the KV cache state containing the key-value pairs for the prefix .

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Next Token Prediction Formula
An autoregressive model is generating the 11th token of a sequence. The Key-Value (KV) Cache has already been populated with the key and value vectors for the first 10 tokens. For this 11th generation step, a new query (q_11), key (k_11), and value (v_11) vector are computed. Which of the following accurately describes the set of key vectors that the new query (q_11) will perform its attention operation over to produce the output for this step?
You are observing a single step of autoregressive generation in a transformer model, specifically for the token at position
i. Arrange the following computational events in the correct chronological order for this single step.Formula for Cache State Evolution during Autoregressive Decoding
Analyzing a Flawed KV Cache Implementation
Learn After
An autoregressive language model is generating a sequence of tokens one by one. As the length of the generated sequence increases from 10 tokens to 100 tokens, what is the primary impact of the evolving key-value cache on the computation required to generate the next token?
An autoregressive language model is given a starting token and generates a three-token sequence: 'A', 'B', 'C'. Arrange the following states of the key-value (KV) cache in the chronological order they occur during this generation process.
KV Cache State during Generation