1Cademy - Interpretation of Positional Bias as a Distance Penalty

Learn Before

Relative Positional Encoding as a Query-Key Bias
Formula for Attention Weight with Relative Positional Encoding

Concept

Interpretation of Positional Bias as a Distance Penalty

In self-attention with relative positional embeddings, the bias term $\mathrm{PE}(i, j)$ added to the query-key product can intuitively be interpreted as a distance penalty between positions $i$ and $j$ . To reflect that tokens further apart should generally have less influence on each other, the value of $\mathrm{PE}(i, j)$ decreases as the token at position $i$ moves further away from the token at position $j$ .