Comparison

Comparison of Continuous (Prefilling-Prioritized) vs. Standard (Decoding-Prioritized) Batching

Continuous and standard batching strategies differ fundamentally in their prioritization, which leads to distinct performance trade-offs. Continuous batching employs a prefilling-prioritized approach, where new requests are added to the batch as soon as computational resources become available. This method maximizes system throughput and hardware utilization but can increase the processing latency for requests already in the batch. Conversely, standard batching is decoding-prioritized, meaning it processes an entire batch to completion before handling new requests. This ensures lower latency for the active batch but results in reduced device utilization and overall system throughput.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related