1Cademy - Causal Attention

Learn Before

Single-Query Attention Computation with Multiplicative Scaling

Concept

Causal Attention

Causal attention is a type of self-attention mechanism where a query at a specific position i can only attend to keys and values at positions less than or equal to i (K_<=i, V_<=i). This restriction, often implemented using a mask, ensures that the model's prediction for a token only depends on the preceding tokens and not on future ones. The computation is formally expressed as Att_qkv(q_i, K_<=i, V_<=i).