1Cademy - Narrative Example of Dynamic Batch Management in Continuous Batching

Learn Before

General Process of Continuous Batching

Example

Narrative Example of Dynamic Batch Management in Continuous Batching

This scenario illustrates how continuous batching dynamically manages sequences during inference, contrasting with standard request-level batching which fixes a batch of input sequences and processes them to completion. As illustrated, the system continuously accepts and adds new requests into the current batch as long as there is available compute capacity. Initially, two user requests, $\mathbf{x}_1$ and $\mathbf{x}_2$ , are grouped into a batch and sent to the inference engine. After two iterations, a new request, $\mathbf{x}_3$ , is received and incorporated into the active batch. The engine processes this updated batch concurrently, advancing the decoding process for $\mathbf{x}_1$ and $\mathbf{x}_2$ while executing the prefilling phase for $\mathbf{x}_3$ . When $\mathbf{x}_2$ completes its generation, two additional requests, $\mathbf{x}_4$ and $\mathbf{x}_5$ , arrive. The scheduler removes the finished $\mathbf{x}_2$ and adds $\mathbf{x}_4$ to the batch based on available capacity, while $\mathbf{x}_5$ is queued until resources free up.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

References

Learn Before

Related

Learn After