Learn Before
Concept

Iteration-Based Scheduling in LLM Inference

Iteration-based scheduling is an advanced strategy where the scheduler interacts with the inference engine at every single token prediction step, rather than waiting for an entire sequence to finish. This fine-grained approach permits dynamic adjustments to the active batch during execution. For example, if a critical or urgent request arrives, the scheduler can immediately insert it into the ongoing batch, ensuring it is processed without delay.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences