1Cademy - A team trains a language model using an architecture where a unique vector is learned for every possible token position. The entire training dataset consists of texts that are no longer than 1,024 tokens. After training, the model shows excellent performance on all evaluation texts that are 1,024 tokens or shorter. However, when deployed to process a new, 1,500-token document, the models ability to understand relationships between words degrades dramatically, particularly for words appearing after the 1,024th position. Which of the following is the most direct cause of this performance drop?

Learn Before

Generalization Issues of Learnable Positional Embeddings

Multiple Choice

A team trains a language model using an architecture where a unique vector is learned for every possible token position. The entire training dataset consists of texts that are no longer than 1,024 tokens. After training, the model shows excellent performance on all evaluation texts that are 1,024 tokens or shorter. However, when deployed to process a new, 1,500-token document, the model's ability to understand relationships between words degrades dramatically, particularly for words appearing after the 1,024th position. Which of the following is the most direct cause of this performance drop?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related