You are observing a single step of token generation in a large language model that uses a Key-Value cache. Arrange the following operations in the correct chronological order as they would occur during this single step.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is generating the 10th token in a sequence using an autoregressive process with a Key-Value (KV) cache. At this step, a new query vector (q₁₀), key vector (k₁₀), and value vector (v₁₀) are computed from the input. The KV cache already contains the key-value pairs from the first 9 steps. Which statement best analyzes the attention computation that occurs for this 10th step?
You are observing a single step of token generation in a large language model that uses a Key-Value cache. Arrange the following operations in the correct chronological order as they would occur during this single step.
An engineer is debugging an autoregressive language model and observes that as it generates longer sequences, its output progressively loses connection to the initial context. The engineer suspects a flaw in how the attention mechanism utilizes the Key-Value (KV) cache during each generation step. Based on the process where a new query attends to the full, updated cache, which of the following errors is the most probable cause for this specific type of performance degradation?