A language model is trained exclusively on texts with a maximum length of 512 tokens. When it is later used to process a 1000-token document, its performance is extremely poor. An investigation reveals that the model's internal representations for tokens at positions 513 and beyond are erratic and do not follow any discernible pattern. Which of the following is the most likely cause of this specific failure?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is trained exclusively on texts with a maximum length of 512 tokens. When it is later used to process a 1000-token document, its performance is extremely poor. An investigation reveals that the model's internal representations for tokens at positions 513 and beyond are erratic and do not follow any discernible pattern. Which of the following is the most likely cause of this specific failure?
Selecting an Appropriate Positional Encoding Method
Analyzing Positional Encoding Behavior