1Cademy - Example of Query-Key Interactions in Causal Attention

Learn Before

Causal Attention Input Structure

Example

Example of Query-Key Interactions in Causal Attention

In a causal self-attention mechanism, the set of calculated query-key dot products explicitly demonstrates the autoregressive nature of the model, where each position can only attend to itself and preceding positions. The following list enumerates all such interactions for a sequence of length 7 (from position 0 to 6), where qi represents the query at position i and kT j represents the transposed key at position j:

q0: q0 kT 0
q1: q1 kT 0, q1 kT 1
q2: q2 kT 0, q2 kT 1, q2 kT 2
q3: q3 kT 0, q3 kT 1, q3 kT 2, q3 kT 3
q4: q4 kT 0, q4 kT 1, q4 kT 2, q4 kT 3, q4 kT 4
q5: q5 kT 0, q5 kT 1, q5 kT 2, q5 kT 3, q5 kT 4, q5 kT 5
q6: q6 kT 0, q6 kT 1, q6 kT 2, q6 kT 3, q6 kT 4, q6 kT 5, q6 kT 6