1Cademy - Visual Example of a Linear Relative Position Bias in Causal Attention

Learn Before

ALiBi Bias Term Formula

Example

Visual Example of a Linear Relative Position Bias in Causal Attention

In causal self-attention, a linear relative position bias is applied to penalize attention to distant past tokens. The bias for a query at position $i$ and a key at position $j$ is calculated as $-\beta(i - j)$ , where $\beta$ is a scalar parameter. This bias is only applied to valid query-key pairs where $j \le i$ , enforcing causality. For example, the set of computed query-key dot products for a sequence of length 7 (indexed 0-6) would form a lower-triangular structure: q0kT0, q1kT0, q1kT1, ..., q6kT0, ..., q6kT6. The bias added to each of these dot products would be zero for self-attention (e.g., q2kT2) and become increasingly negative for more distant pairs (e.g., the bias for q6kT0 would be more negative than for q6kT5).

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After