Architectural Modifications for Trainable LLMs
To address training instability and other difficulties in large-scale training, significant modifications to the standard Transformer architecture are often required. These architectural improvements are a crucial factor in developing LLMs that are stable and trainable.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Data Quality as a Key Issue in LLM Training
Data Diversity as a Key Issue in LLM Training
Data Bias as a Key Issue in LLM Training
Privacy Concerns in LLM Data Collection
Architectural Modifications for Trainable LLMs
Model Modification for Large-Scale Training
Distributed Training for LLMs
Evaluating a Large-Scale Model Training Plan
A team is developing a new large-scale language model and encounters several distinct challenges. Match each challenge with the primary technical area that needs to be addressed to solve it.
Prioritizing Challenges in Large-Scale Model Training
Data Preparation for Large-Scale LLM Training
Learning Rate and Training Time Trade-off in LLMs
Multiple Approaches to Enhance LLM Training Stability
Evaluating a Training Strategy for a Large Model
Architectural Modifications for Trainable LLMs
A research team successfully trains a 1-billion-parameter language model. Encouraged by their results, they scale up the exact same architecture and training setup to a 100-billion-parameter version using a much larger dataset. Midway through the training process, the model's loss value suddenly becomes
NaN(Not a Number), and the training crashes. This happens repeatedly despite restarting from previous checkpoints. Which of the following best explains this phenomenon?A machine learning team is training a very large language model and encounters several issues. Match each observed issue with the most likely underlying factor related to training stability.
Considerations for Stabilizing Large-Scale Model Training
Factors Influencing LLM Training Optimization
Learn After
Evaluating a Training Strategy for a New Large Model
Layer Normalization in Transformers
A research team is training a very deep language model based on a standard network design. They observe that as they increase the model's depth, the training process frequently fails with loss values suddenly becoming invalid (NaN). This forces them to restart training repeatedly. Which of the following architectural changes is most specifically designed to mitigate this kind of deep-network training instability?
Rationale for Architectural Changes in Large-Scale Models
Connecting Model Scale and Architectural Design
Omission of Bias Terms in LLM Affine Transformations