Concept

Pipeline Parallelism

Pipeline parallelism is a strategy designed to overcome the inefficiency of basic model parallelism, where hardware is underutilized because only one device is active at any given moment. This technique introduces computational overlap by dividing a data batch into smaller units called micro-batches. These micro-batches are fed into a pipeline of workers, allowing a worker to begin processing the next micro-batch as soon as it has passed the current one to the subsequent worker. This creates a continuous flow of computation, ensuring that different devices are working simultaneously on different stages of the process.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
Learn After