Learn Before
Concept

Generalization Limit of Offset-Specific Biases

A major disadvantage of allocating a unique learnable value to every possible sequence offset is that the model becomes rigidly tied to the distances it observed during training. If the architecture processes sequences where the offset iji - j is greater than the maximum distance encountered in the training phase, it lacks the appropriate learned variables for those extended distances, preventing effective generalization.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences