1Cademy - Explicit Enumeration of Causal Self-Attention Dot Products

Learn Before

Enumeration of Dot Products in Causal Self-Attention

Example

Explicit Enumeration of Causal Self-Attention Dot Products

In a causal self-attention mechanism, a query at a given position i can only interact with keys at positions j where $j \le i$ . For a sequence of length 7 (indexed 0 to 6), this results in the following explicit set of query-key dot products being computed:

$q_0$ : $q_0 k_0^T$
$q_1$ : $q_1 k_0^T, q_1 k_1^T$
$q_2$ : $q_2 k_0^T, q_2 k_1^T, q_2 k_2^T$
$q_3$ : $q_3 k_0^T, q_3 k_1^T, q_3 k_2^T, q_3 k_3^T$
$q_4$ : $q_4 k_0^T, q_4 k_1^T, q_4 k_2^T, q_4 k_3^T, q_4 k_4^T$
$q_5$ : $q_5 k_0^T, q_5 k_1^T, q_5 k_2^T, q_5 k_3^T, q_5 k_4^T, q_5 k_5^T$
$q_6$ : $q_6 k_0^T, q_6 k_1^T, q_6 k_2^T, q_6 k_3^T, q_6 k_4^T, q_6 k_5^T, q_6 k_6^T$

0

1

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After