Concept

Decoding-Prioritized Strategy in Standard Batching

The decoding-prioritized strategy is characteristic of standard (or static) batching, where the system must wait for every sequence in the current batch to complete generation before it can begin processing new requests. This approach ensures relatively low latency for the requests within the active batch. However, this focus on completing the current batch leads to lower overall device utilization and system throughput, as hardware can sit idle while waiting for the longest sequence to finish.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences