Learn Before
Activity (Process)

Micro-batching in Pipeline Parallelism

The core mechanism of pipeline parallelism involves partitioning a data batch into several smaller 'micro-batches'. These micro-batches are then fed sequentially into the pipeline of workers. As soon as a worker completes its computation for one micro-batch and forwards it to the subsequent worker, it immediately begins processing the next available micro-batch. This continuous flow ensures that different stages of the computation are active simultaneously, maximizing device utilization.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related