Concept

Interpretation of Positional Bias as a Distance Penalty

In self-attention with relative positional embeddings, the bias term PE(i,j)\mathrm{PE}(i, j) added to the query-key product can intuitively be interpreted as a distance penalty between positions ii and jj. To reflect that tokens further apart should generally have less influence on each other, the value of PE(i,j)\mathrm{PE}(i, j) decreases as the token at position ii moves further away from the token at position jj.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related