Comparing Positional Embedding Strategies for Long-Context Pre-training
A research team is developing a new large language model intended to process entire books during its pre-training phase. They are debating between using absolute positional embeddings, where each position is assigned a unique, fixed vector, and relative positional embeddings, where the model learns the relationship between the positions of tokens. Analyze the implications of choosing each of these embedding strategies for the model's ability to handle extremely long sequences. In your analysis, compare their effectiveness, scalability, and generalization capabilities.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Positional Encoding Strategy for a Long-Context Model
A research team is pre-training a new large language model with the goal of processing documents much longer than those typically seen in standard benchmarks. They observe that while the model performs well on shorter texts, its performance sharply degrades on sequences longer than the maximum length used during its initial training phase. The model seems unable to understand the relationships between tokens that are far apart in these extended contexts. Which of the following is the most probable cause of this issue?
Comparing Positional Embedding Strategies for Long-Context Pre-training