Role of Specific Positional Embeddings in Long-Context Pre-training
The use of specific positional embedding techniques, such as relative or rotary positional embeddings, is a key enabler for the pre-training phase of adapting Large Language Models for long-context tasks, as it allows them to be trained effectively on large-scale data.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Role of Specific Positional Embeddings in Long-Context Pre-training
Evaluating a Model Adaptation Strategy
A research team aims to adapt a powerful, existing language model to summarize entire books, a task requiring the model to process very long sequences of text. They have access to a vast, diverse dataset of general web text and a smaller, curated dataset composed exclusively of full-length books. To achieve their goal efficiently, what is the most effective two-stage approach for the team to follow?
A machine learning engineer is adapting a pre-existing language model to effectively handle long documents. The process involves two distinct stages. Arrange the following stages in the correct chronological order.
Learn After
Positional Encoding Strategy for a Long-Context Model
A research team is pre-training a new large language model with the goal of processing documents much longer than those typically seen in standard benchmarks. They observe that while the model performs well on shorter texts, its performance sharply degrades on sequences longer than the maximum length used during its initial training phase. The model seems unable to understand the relationships between tokens that are far apart in these extended contexts. Which of the following is the most probable cause of this issue?
Comparing Positional Embedding Strategies for Long-Context Pre-training