Activity (Process)

Adding New Requests in Continuous Batching

In continuous batching, the scheduler can dynamically incorporate new user requests into the active batch between processing iterations. This action is conditional on the inference engine possessing sufficient available capacity to manage the additional workload. This allows the system to maintain high utilization by promptly integrating incoming tasks without waiting for the entire current batch to complete.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences