Generalizable Positional Embeddings
To overcome the limitations of fixed-length training, an alternative approach is to develop generalizable positional embeddings. Suppose an embedding model is trained on sequences with a maximum length of . If the model can generalize, it can be applied to handle much longer sequences of length (where ) during inference. This capability allows the model to extrapolate and effectively deal with new positions outside the range observed in the training data.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Classification of Generalization Approaches for Positional Embeddings
Positional Encoding without Generalization
A team trains a language model using an architecture where a unique vector is learned for every possible token position. The entire training dataset consists of texts that are no longer than 1,024 tokens. After training, the model shows excellent performance on all evaluation texts that are 1,024 tokens or shorter. However, when deployed to process a new, 1,500-token document, the model's ability to understand relationships between words degrades dramatically, particularly for words appearing after the 1,024th position. Which of the following is the most direct cause of this performance drop?
Explaining Extrapolation Failure in Positional Embeddings
Evaluating a Flawed Generalization Strategy
Generalizable Positional Embeddings