Learn Before
A team is training a large language model and observes that after several thousand steps, the training loss suddenly becomes 'NaN' (Not a Number), indicating a numerical instability issue. The model architecture itself is considered sound. Which of the following components of the training setup is the most direct and appropriate one to adjust first to address this specific type of instability?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Iterative Nature of LLM Training Configuration
Critique of an LLM Training Configuration
A team is training a large language model and observes that after several thousand steps, the training loss suddenly becomes 'NaN' (Not a Number), indicating a numerical instability issue. The model architecture itself is considered sound. Which of the following components of the training setup is the most direct and appropriate one to adjust first to address this specific type of instability?
A team of engineers is training a new large language model and encounters several distinct challenges. Match each challenge with the training setup component that is most directly designed to address it.
Designing an Efficient and Stable LLM Training Regimen