Concept

Shared Learnable Bias per Offset

A basic architecture for a relative positional bias assigns a singular, shared learnable parameter to each distinct query-key distance. Under this framework, the bias value for any query qi\mathbf{q}_i and key kj\mathbf{k}_j relies entirely on their offset, iji - j. Consequently, all pairs sharing this identical offset are mapped to the same variable, uiju_{i-j}, resulting in the mathematical relationship: PE(i,j)=uij\mathrm{PE}(i, j) = u_{i-j}.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related