Prioritizing Solutions for Training Instability
A machine learning team is training a new, very large model, but the process repeatedly fails due to numerical instability. They are considering two potential solutions to stabilize the training.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Increasing Batch Size for Training Stability
Carefully Designed Setups for LLM Training
Prioritizing Solutions for Training Instability
A research team is training a very large language model using a standard, well-established architecture. During the process, they observe that the model's loss value periodically spikes to extreme levels, causing the training to fail. The team has confirmed that the model's fundamental design is not the source of the problem. What is the most effective area for the team to investigate next to resolve this instability?
Beyond Architecture: Stabilizing LLM Training