Learn Before
Formula

Causal Attention Mechanism

The causal attention mechanism computes an output for a specific position i in a sequence by considering only the elements up to and including that position. The formula is given by: Attqkv(qi,Ki,Vi)=j=0iα(i,j)vj\text{Att}_{\text{qkv}}(\mathbf{q}_i, \mathbf{K}_{\leq i}, \mathbf{V}_{\leq i}) = \sum_{j=0}^{i} \alpha(i, j) \mathbf{v}_j In this equation, the output is a weighted sum of the value vectors (vj\mathbf{v}_j) from the beginning of the sequence up to the current position i. The weights, α(i,j)\alpha(i, j), determine the importance of each value vector vj\mathbf{v}_j to the query vector qi\mathbf{q}_i. This unidirectional constraint is crucial for autoregressive tasks, as it prevents the model from attending to future tokens.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences