Learn Before
A language model is trained exclusively on text sequences with a maximum length of 512 tokens. During evaluation, the model shows a significant drop in performance when processing documents that are 1000 tokens long. The engineers hypothesize the problem is related to how the model incorporates word order information. Which of the following changes to the model's architecture is most likely to resolve this specific issue?
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is trained exclusively on text sequences with a maximum length of 512 tokens. During evaluation, the model shows a significant drop in performance when processing documents that are 1000 tokens long. The engineers hypothesize the problem is related to how the model incorporates word order information. Which of the following changes to the model's architecture is most likely to resolve this specific issue?
Positional Encoding for Machine Translation
Positional Invariance in Self-Attention
Mechanism of Relative Positional Embedding