1Cademy - Pre-training and Fine-tuning Strategy for Long-Context Adaptation

Learn Before

Adapting Pre-trained LLMs for Long Sequences

Activity (Process)

Pre-training and Fine-tuning Strategy for Long-Context Adaptation

A widely used two-stage method for enabling Large Language Models to handle long contexts involves an initial pre-training phase on general, large-scale datasets, followed by a more focused fine-tuning phase using longer text sequences.

Updated 2025-10-10

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn After

Role of Specific Positional Embeddings in Long-Context Pre-training
Evaluating a Model Adaptation Strategy
A research team aims to adapt a powerful, existing language model to summarize entire books, a task requiring the model to process very long sequences of text. They have access to a vast, diverse dataset of general web text and a smaller, curated dataset composed exclusively of full-length books. To achieve their goal efficiently, what is the most effective two-stage approach for the team to follow?
A machine learning engineer is adapting a pre-existing language model to effectively handle long documents. The process involves two distinct stages. Arrange the following stages in the correct chronological order.

Learn Before

Related

Learn After