1Cademy - Formula for Causal Attention

Learn Before

Formula for Attention Weight with Relative Positional Encoding

Formula

Formula for Causal Attention

The output of a causal attention mechanism for a specific query vector $\mathbf{q}_i$ is calculated as a weighted sum of value vectors $\mathbf{v}_j$ from all positions $j$ up to and including the current position $i$ . The formula is expressed as: $\text{Att}_{\text{qkv}}(\mathbf{q}_i, \mathbf{K}_{\leq i}, \mathbf{V}_{\leq i}) = \sum_{j=0}^{i} \alpha(i, j) \mathbf{v}_j$ Here, $\alpha(i, j)$ represents the attention weight assigned to the value vector at position $j$ when computing the output for position $i$ .