Activity (Process)

Continuous Batching for LLM Inference

Continuous batching is an iteration-based scheduling method, notably used in the Orca system, where the composition of a request batch is dynamically adjusted between computational steps. This flexibility allows for new input sequences to be added or completed sequences to be removed from the batch during any iteration. This adjustment can occur even if the processing for the entire batch is not yet finished, distinguishing it from static methods.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences