Example

Explicit Enumeration of Causal Self-Attention Dot Products

In a causal self-attention mechanism, a query at a given position i can only interact with keys at positions j where jij \le i. For a sequence of length 7 (indexed 0 to 6), this results in the following explicit set of query-key dot products being computed:

  • q0q_0: q0k0Tq_0 k_0^T
  • q1q_1: q1k0T,q1k1Tq_1 k_0^T, q_1 k_1^T
  • q2q_2: q2k0T,q2k1T,q2k2Tq_2 k_0^T, q_2 k_1^T, q_2 k_2^T
  • q3q_3: q3k0T,q3k1T,q3k2T,q3k3Tq_3 k_0^T, q_3 k_1^T, q_3 k_2^T, q_3 k_3^T
  • q4q_4: q4k0T,q4k1T,q4k2T,q4k3T,q4k4Tq_4 k_0^T, q_4 k_1^T, q_4 k_2^T, q_4 k_3^T, q_4 k_4^T
  • q5q_5: q5k0T,q5k1T,q5k2T,q5k3T,q5k4T,q5k5Tq_5 k_0^T, q_5 k_1^T, q_5 k_2^T, q_5 k_3^T, q_5 k_4^T, q_5 k_5^T
  • q6q_6: q6k0T,q6k1T,q6k2T,q6k3T,q6k4T,q6k5T,q6k6Tq_6 k_0^T, q_6 k_1^T, q_6 k_2^T, q_6 k_3^T, q_6 k_4^T, q_6 k_5^T, q_6 k_6^T

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences