Concept

Trade-off of Micro-batch Size in Pipeline Parallelism

While the goal of micro-batching in pipeline parallelism is to maximize the number of batches to reduce worker idle time, there is a practical trade-off. Using excessively small micro-batches can be detrimental, leading to reduced GPU utilization and higher costs associated with task-switching. Consequently, this can negatively impact the overall throughput of the training system.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences