A research team is training a very large language model using a standard, well-established architecture. During the process, they observe that the model's loss value periodically spikes to extreme levels, causing the training to fail. The team has confirmed that the model's fundamental design is not the source of the problem. What is the most effective area for the team to investigate next to resolve this instability?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Increasing Batch Size for Training Stability
Carefully Designed Setups for LLM Training
Prioritizing Solutions for Training Instability
A research team is training a very large language model using a standard, well-established architecture. During the process, they observe that the model's loss value periodically spikes to extreme levels, causing the training to fail. The team has confirmed that the model's fundamental design is not the source of the problem. What is the most effective area for the team to investigate next to resolve this instability?
Beyond Architecture: Stabilizing LLM Training