Multiple Choice

A language model is generating the 10th token in a sequence using an autoregressive process with a Key-Value (KV) cache. At this step, a new query vector (q₁₀), key vector (k₁₀), and value vector (v₁₀) are computed from the input. The KV cache already contains the key-value pairs from the first 9 steps. Which statement best analyzes the attention computation that occurs for this 10th step?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science