Analyzing Competing Training Strategies
Two research teams are training identical large-scale models on the same massive dataset. Based on their training logs described below, analyze the most probable difference in their training configurations and explain the trade-off each team is experiencing.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is training a very large neural network for a natural language task. They observe that the training process is highly unstable, with the loss function values fluctuating wildly and sometimes becoming infinitely large, causing the training to crash. The project has a strict deadline and a fixed computational budget. Which of the following adjustments is the most logical first step to stabilize the training process, even though it introduces a significant trade-off?
Analyzing Competing Training Strategies
Analyzing Learning Rate Choices