Learn Before
Concept

Request-Level Scheduling in LLM Inference

Request-level scheduling is a basic strategy for managing tasks in LLM inference. Under this approach, the scheduler groups requests into a complete batch and sends it to the inference engine. Once execution begins, the batch cannot be interrupted or modified. The scheduler is forced to wait until the entire batch finishes processing before it can dispatch the next one.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences