1Cademy - Critique of a Relative Positional Bias Method

Learn Before

Generalization Limit of Offset-Specific Biases

Short Answer

Critique of a Relative Positional Bias Method

A language model's architecture incorporates a mechanism where a unique, learnable value is assigned to every possible relative distance between any two tokens. For example, a specific value is learned for a distance of 5, another for a distance of 6, and so on, up to the maximum distance encountered in the training data. Evaluate the suitability of this design for a model intended to process documents of highly variable and potentially unbounded lengths. Justify your reasoning.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related