Learn Before
A research team is training a language model with hundreds of billions of parameters on a dataset that is several terabytes in size. They find that training on their most powerful single processing unit would take several years to complete. Which statement best analyzes the core motivation for implementing a distributed training strategy in this scenario?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Evaluating a Training Strategy
A research team is training a language model with hundreds of billions of parameters on a dataset that is several terabytes in size. They find that training on their most powerful single processing unit would take several years to complete. Which statement best analyzes the core motivation for implementing a distributed training strategy in this scenario?
Match each distributed training scenario with the primary challenge it is designed to address.
Motivation for Sequence Parallelism