1Cademy - Generalization Limit of Offset-Specific Biases

Learn Before

Shared Learnable Bias per Offset

Concept

Generalization Limit of Offset-Specific Biases

A major disadvantage of allocating a unique learnable value to every possible sequence offset is that the model becomes rigidly tied to the distances it observed during training. If the architecture processes sequences where the offset $i - j$ is greater than the maximum distance encountered in the training phase, it lacks the appropriate learned variables for those extended distances, preventing effective generalization.