A machine learning engineer is pre-training a large language model. They monitor the model's performance on a separate, unseen dataset after every 10,000 training steps. They observe the following trend:
- Steps 1-100,000: Performance steadily improves.
- Step 110,000: The model achieves its best performance so far.
- Steps 120,000-150,000: Performance consistently worsens with each measurement.
Based on this observation, what is the most appropriate immediate action to ensure the best possible model is obtained from this training run?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is pre-training a large language model. They monitor the model's performance on a separate, unseen dataset after every 10,000 training steps. They observe the following trend:
- Steps 1-100,000: Performance steadily improves.
- Step 110,000: The model achieves its best performance so far.
- Steps 120,000-150,000: Performance consistently worsens with each measurement.
Based on this observation, what is the most appropriate immediate action to ensure the best possible model is obtained from this training run?
Analyzing a Language Model's Pre-training Log
Rationale for Early Stopping in Model Pre-training