1Cademy - Parameter Implications of Offset-Based Positional Bias

Learn Before

Shared Learnable Bias per Offset

Short Answer

Parameter Implications of Offset-Based Positional Bias

Consider a self-attention model that implements relative positional encoding by assigning a unique, learnable bias parameter to each possible offset between a query and a key. If this model is trained exclusively on sequences with a maximum length of 128 tokens, analyze how many distinct learnable bias parameters are required for this positional encoding scheme. Explain your reasoning.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related