1Cademy - Relative Positional Encoding as a Query-Key Bias

Learn Before

Attention Score in Transformers ( $\beta_{i,j}$ )

Concept

Relative Positional Encoding as a Query-Key Bias

Rather than modifying the initial input token embeddings, an alternative self-attention architecture integrates positional awareness directly into the core interaction calculation. It achieves this by adding a relative positional bias term, represented as $\mathrm{PE}(i, j)$ , directly to the query-key product, which structurally alters the attention score between position $i$ and position $j$ .