1Cademy - Prefilling-Prioritized Strategy in Continuous Batching

Learn Before

Continuous Batching for LLM Inference
Priority-Based Scheduling in LLM Inference

Concept

Prefilling-Prioritized Strategy in Continuous Batching

The prefilling-prioritized strategy is a core characteristic of continuous batching where the scheduler adds new requests to the active batch as soon as the inference engine has available resources. By processing these new requests for prefilling as early as possible, this approach is designed to maximize system throughput. However, this prioritization comes at the cost of increased latency for ongoing requests, as the prefilling of new, long inputs can extend the overall processing time for the entire batch.

Updated 2026-05-06

Contributors are: