Concept

Prefilling-Prioritized Strategy in Continuous Batching

The prefilling-prioritized strategy is a core characteristic of continuous batching where the scheduler adds new requests to the active batch as soon as the inference engine has available resources. By processing these new requests for prefilling as early as possible, this approach is designed to maximize system throughput. However, this prioritization comes at the cost of increased latency for ongoing requests, as the prefilling of new, long inputs can extend the overall processing time for the entire batch.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related