Short Answer

Optimizing a Large Model Training Pipeline

A research team is training a very large sequential model on a cluster of 4 high-performance GPUs. Due to the model's size, they have partitioned it into four sequential segments, placing one segment on each GPU. During training, they monitor the system and notice that the overall GPU utilization is consistently low, averaging around 25%. They observe that during both the forward and backward passes, only one GPU is actively computing at any given time while the other three are idle. Describe the primary cause of this inefficiency and propose a hybrid parallelism strategy that could significantly improve the GPU utilization. Explain how your proposed strategy addresses the observed problem.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science