Learn Before
A machine learning team is training a large model on a distributed system with a mix of high-performance and older, slower processing units. To maximize hardware utilization and speed up training, they opt for an asynchronous update strategy where nodes do not wait for each other. What is the most significant risk the team must be prepared to manage with this approach?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Training Strategy Analysis
A machine learning team is training a large model on a distributed system with a mix of high-performance and older, slower processing units. To maximize hardware utilization and speed up training, they opt for an asynchronous update strategy where nodes do not wait for each other. What is the most significant risk the team must be prepared to manage with this approach?
Evaluating Asynchronous Training Strategies