1Cademy - You are observing a single step of token generation in a large language model that uses a Key-Value cache. Arrange the following operations in the correct chronological order as they would occur during this single step.

Learn Before

Diagram of a Single-Step KV Cache Update and Attention

Sequence Ordering

You are observing a single step of token generation in a large language model that uses a Key-Value cache. Arrange the following operations in the correct chronological order as they would occur during this single step.

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A language model is generating the 10th token in a sequence using an autoregressive process with a Key-Value (KV) cache. At this step, a new query vector (q₁₀), key vector (k₁₀), and value vector (v₁₀) are computed from the input. The KV cache already contains the key-value pairs from the first 9 steps. Which statement best analyzes the attention computation that occurs for this 10th step?
You are observing a single step of token generation in a large language model that uses a Key-Value cache. Arrange the following operations in the correct chronological order as they would occur during this single step.
An engineer is debugging an autoregressive language model and observes that as it generates longer sequences, its output progressively loses connection to the initial context. The engineer suspects a flaw in how the attention mechanism utilizes the Key-Value (KV) cache during each generation step. Based on the process where a new query attends to the full, updated cache, which of the following errors is the most probable cause for this specific type of performance degradation?

Learn Before

Related