Learn Before
Fine-tuning on Longer Sequences for Enhanced Length Extrapolation
A targeted and effective method for improving a pre-trained LLM's length extrapolation capabilities is to fine-tune it on a dataset of sequences that are longer than those used in its initial training phase.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Fine-tuning on Longer Sequences for Enhanced Length Extrapolation
Analyzing Model Performance on Long Documents
An AI development team trains a language model exclusively on documents with a maximum length of 4,096 tokens. After deployment, they are surprised to find that the model can coherently summarize documents up to 5,000 tokens long, but its performance degrades significantly on documents longer than 6,000 tokens. Which statement best analyzes this observation?
Explaining Unexpected Model Performance
Learn After
A research team has a language model that was pre-trained exclusively on text segments with a maximum length of 2,048 tokens. The team's goal is to adapt this model to accurately summarize legal documents that are frequently 5,000 tokens long, a task at which the model currently performs poorly. Given this specific goal, which of the following fine-tuning strategies is most likely to be effective?
Diagnosing Fine-Tuning Failure for Long Contexts
Designing a Fine-Tuning Strategy for Long-Context Tasks