1Cademy - Optimizing a Large Model Training Pipeline

Learn Before

Combining Model Parallelism with Other Mechanisms

Short Answer

Optimizing a Large Model Training Pipeline

A research team is training a very large sequential model on a cluster of 4 high-performance GPUs. Due to the model's size, they have partitioned it into four sequential segments, placing one segment on each GPU. During training, they monitor the system and notice that the overall GPU utilization is consistently low, averaging around 25%. They observe that during both the forward and backward passes, only one GPU is actively computing at any given time while the other three are idle. Describe the primary cause of this inefficiency and propose a hybrid parallelism strategy that could significantly improve the GPU utilization. Explain how your proposed strategy addresses the observed problem.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related