Learn Before
Model Modification for Large-Scale Training
Adapting the architecture of a language model is a significant consideration when undertaking large-scale training to ensure stability and efficiency.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Data Quality as a Key Issue in LLM Training
Data Diversity as a Key Issue in LLM Training
Data Bias as a Key Issue in LLM Training
Privacy Concerns in LLM Data Collection
Architectural Modifications for Trainable LLMs
Model Modification for Large-Scale Training
Distributed Training for LLMs
Evaluating a Large-Scale Model Training Plan
A team is developing a new large-scale language model and encounters several distinct challenges. Match each challenge with the primary technical area that needs to be addressed to solve it.
Prioritizing Challenges in Large-Scale Model Training
Data Preparation for Large-Scale LLM Training
Learn After
Diagnosing Instability in Large-Scale Model Training
A team is training an exceptionally deep transformer-based language model and observes that the training process is highly unstable, with loss values fluctuating wildly and sometimes resulting in non-numeric values (NaNs). This suggests that the gradients are either exploding or vanishing as they propagate through the numerous layers. Which of the following architectural modifications is most specifically designed to address this type of instability in very deep networks?
Prioritizing Architectural Modifications for Training Stability