1Cademy - Causal Attention Mechanism

Learn Before

Sequence Encoding Models

Formula

Causal Attention Mechanism

The causal attention mechanism computes an output for a specific position i in a sequence by considering only the elements up to and including that position. The formula is given by: $\text{Att}_{\text{qkv}}(\mathbf{q}_i, \mathbf{K}_{\leq i}, \mathbf{V}_{\leq i}) = \sum_{j=0}^{i} \alpha(i, j) \mathbf{v}_j$ In this equation, the output is a weighted sum of the value vectors ( $\mathbf{v}_j$ ) from the beginning of the sequence up to the current position i. The weights, $\alpha(i, j)$ , determine the importance of each value vector $\mathbf{v}_j$ to the query vector $\mathbf{q}_i$ . This unidirectional constraint is crucial for autoregressive tasks, as it prevents the model from attending to future tokens.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After