1Cademy - Comparison of Continuous (Prefilling-Prioritized) vs. Standard (Decoding-Prioritized) Batching

Learn Before

Comparison

Comparison of Continuous (Prefilling-Prioritized) vs. Standard (Decoding-Prioritized) Batching

Continuous and standard batching strategies differ fundamentally in their prioritization, which leads to distinct performance trade-offs. Continuous batching employs a prefilling-prioritized approach, where new requests are added to the batch as soon as computational resources become available. This method maximizes system throughput and hardware utilization but can increase the processing latency for requests already in the batch. Conversely, standard batching is decoding-prioritized, meaning it processes an entire batch to completion before handling new requests. This ensures lower latency for the active batch but results in reduced device utilization and overall system throughput.

Updated 2026-05-06

Contributors are:

Who are from:

References

Learn Before

Related

Learn After