1Cademy - Incorporating Positional Bias into Attention Scores

Learn Before

Relative Positional Encoding as a Query-Key Bias

Short Answer

Incorporating Positional Bias into Attention Scores

The standard formula for a scaled dot-product attention score between a query vector q_i and a key vector k_j is (q_i ⋅ k_j) / sqrt(d). How would you modify this formula to include a relative positional bias term, PE(i, j), that is added directly to this score before the normalization step?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related