1Cademy - Pipelined Engine Efficiency

Learn Before

Disaggregation of Prefilling and Decoding using Pipelined Engines

Short Answer

Pipelined Engine Efficiency

An LLM inference system uses two separate engines in a pipeline: Engine 1 for processing initial prompts (prefilling) and Engine 2 for generating subsequent tokens (decoding). When a continuous stream of request batches arrives, explain precisely when Engine 1 can start processing a new batch (e.g., Batch B) in relation to the processing of the previous batch (Batch A). Why is this timing crucial for maximizing hardware utilization?

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related