Example

Explicit Enumeration of Causal Self-Attention Dot Products

In a causal self-attention mechanism, a query at a given position i can only interact with keys at positions j where jij \le i. For a sequence of length 7 (indexed 0 to 6), this results in the following explicit set of query-key dot products being computed:

  • q0q_0: q0k0Tq_0 k_0^T
  • q1q_1: q1k0T,q1k1Tq_1 k_0^T, q_1 k_1^T
  • q2q_2: q2k0T,q2k1T,q2k2Tq_2 k_0^T, q_2 k_1^T, q_2 k_2^T
  • q3q_3: q_3 k_0^T, q_3 k_1^T, q_3 k_2^T, q_3 k_3^T
  • q4q_4: q_4 k_0^T, q_4 k_1^T, q_4 k_2^T, q_4 k_3^T, q_4 k_4^T
  • q5q_5: q_5 k_0^T, q_5 k_1^T, q_5 k_2^T, q_5 k_3^T, q_5 k_4^T, q_5 k_5^T
  • q6q_6: q_6 k_0^T, q_6 k_1^T, q_6 k_2^T, q_6 k_3^T, q_6 k_4^T, q_6 k_5^T, q_6 k_6^T

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences