Learn Before
Evaluating a Training Strategy
A small startup is developing a model to classify customer reviews as positive or negative. Their dataset contains 50,000 reviews, and the model architecture is simple enough to be trained on a single high-end graphics card in under three hours. The lead engineer decides to spend a week configuring a system to train the model across ten separate machines, arguing it will accelerate the process. Evaluate the engineer's decision. Is this an effective use of resources? Justify your answer by explaining the primary trade-offs involved in using a multi-processor approach.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating a Training Strategy
A research team is training a language model with hundreds of billions of parameters on a dataset that is several terabytes in size. They find that training on their most powerful single processing unit would take several years to complete. Which statement best analyzes the core motivation for implementing a distributed training strategy in this scenario?
Match each distributed training scenario with the primary challenge it is designed to address.
Motivation for Sequence Parallelism