Learn Before
Example

Example of Query-Key Interactions in Causal Attention

In a causal self-attention mechanism, the set of calculated query-key dot products explicitly demonstrates the autoregressive nature of the model, where each position can only attend to itself and preceding positions. The following list enumerates all such interactions for a sequence of length 7 (from position 0 to 6), where qi represents the query at position i and kT j represents the transposed key at position j:

  • q0: q0 kT 0
  • q1: q1 kT 0, q1 kT 1
  • q2: q2 kT 0, q2 kT 1, q2 kT 2
  • q3: q3 kT 0, q3 kT 1, q3 kT 2, q3 kT 3
  • q4: q4 kT 0, q4 kT 1, q4 kT 2, q4 kT 3, q4 kT 4
  • q5: q5 kT 0, q5 kT 1, q5 kT 2, q5 kT 3, q5 kT 4, q5 kT 5
  • q6: q6 kT 0, q6 kT 1, q6 kT 2, q6 kT 3, q6 kT 4, q6 kT 5, q6 kT 6

This pattern ensures that the prediction for a token at a given position is not influenced by any future tokens.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences