Concept

Q, K, and V Composition in Transformer Decoding

In the step-by-step decoding process of a Transformer, the self-attention mechanism uses distinct sets of Query (Q), Key (K), and Value (V) vectors. For each new token being generated, a new query vector is created based on that token's embedding. This new query then attends to a cumulative set of key and value vectors. This set is composed of all the key-value pairs from the initial prompt (processed during the prefilling phase) combined with all the key-value pairs from the tokens that have already been generated in previous decoding steps.

0

1

Updated 2026-01-15

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences