Learn Before
A research team is developing a novel language model with several trillion parameters. During the initial training setup, they discover that the model is too large to fit into the memory of a single available accelerator (e.g., a GPU). Which parallelism strategy is specifically designed to address this fundamental constraint?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Data Parallelism
Model Parallelism
Pipeline Parallelism
A research team is developing a novel language model with several trillion parameters. During the initial training setup, they discover that the model is too large to fit into the memory of a single available accelerator (e.g., a GPU). Which parallelism strategy is specifically designed to address this fundamental constraint?
Match each parallelism strategy with the description that best defines its core mechanism for distributing the training workload.
Diagnosing Training Inefficiency