1Cademy - Continuous Batching for LLM Inference

Learn Before

Aggregated Architecture for Prefilling and Decoding

Activity (Process)

Continuous Batching for LLM Inference

Continuous batching is an iteration-based scheduling method, notably used in the Orca system, where the composition of a request batch is dynamically adjusted between computational steps. This flexibility allows for new input sequences to be added or completed sequences to be removed from the batch during any iteration. This adjustment can occur even if the processing for the entire batch is not yet finished, distinguishing it from static methods.

Updated 2026-05-02

Contributors are: