Example

Example of Interleaving Prefilling and Decoding in Continuous Batching

Continuous batching demonstrates its efficiency when a new request arrives while an existing batch is already undergoing decoding. For example, after an initial batch of requests (e.g., x1, x2, x3) has completed its prefilling and several decoding steps, a new request (x4) might arrive. The system can then, in the next computational iteration, perform the prefilling for the new request x4 while simultaneously executing another decoding step for the ongoing requests x1, x2, and x3. This concurrent execution of prefilling for new requests and decoding for existing ones is a key feature that maximizes hardware utilization and system throughput.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences