Learn Before
Multiple Choice

A language model is trained exclusively on text sequences with a maximum length of 512 tokens. During evaluation, the model shows a significant drop in performance when processing documents that are 1000 tokens long. The engineers hypothesize the problem is related to how the model incorporates word order information. Which of the following changes to the model's architecture is most likely to resolve this specific issue?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science