1Cademy - A language model is trained exclusively on text sequences with a maximum length of 512 tokens. During evaluation, the model shows a significant drop in performance when processing documents that are 1000 tokens long. The engineers hypothesize the problem is related to how the model incorporates word order information. Which of the following changes to the models architecture is most likely to resolve this specific issue?

Learn Before

Relative Positional Representations

Multiple Choice

A language model is trained exclusively on text sequences with a maximum length of 512 tokens. During evaluation, the model shows a significant drop in performance when processing documents that are 1000 tokens long. The engineers hypothesize the problem is related to how the model incorporates word order information. Which of the following changes to the model's architecture is most likely to resolve this specific issue?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related