A research team is pre-training a new large language model with the goal of processing documents much longer than those typically seen in standard benchmarks. They observe that while the model performs well on shorter texts, its performance sharply degrades on sequences longer than the maximum length used during its initial training phase. The model seems unable to understand the relationships between tokens that are far apart in these extended contexts. Which of the following is the most probable cause of this issue?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Positional Encoding Strategy for a Long-Context Model
A research team is pre-training a new large language model with the goal of processing documents much longer than those typically seen in standard benchmarks. They observe that while the model performs well on shorter texts, its performance sharply degrades on sequences longer than the maximum length used during its initial training phase. The model seems unable to understand the relationships between tokens that are far apart in these extended contexts. Which of the following is the most probable cause of this issue?
Comparing Positional Embedding Strategies for Long-Context Pre-training