Learn Before
Narrative Example of Dynamic Batch Management in Continuous Batching
This scenario illustrates how continuous batching dynamically manages sequences during inference, contrasting with standard request-level batching which fixes a batch of input sequences and processes them to completion. As illustrated, the system continuously accepts and adds new requests into the current batch as long as there is available compute capacity. Initially, two user requests, and , are grouped into a batch and sent to the inference engine. After two iterations, a new request, , is received and incorporated into the active batch. The engine processes this updated batch concurrently, advancing the decoding process for and while executing the prefilling phase for . When completes its generation, two additional requests, and , arrive. The scheduler removes the finished and adds to the batch based on available capacity, while is queued until resources free up.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Initial Batch Creation in Continuous Batching
Scheduler-Driven Batch Adjustments Between Iterations in Continuous Batching
Termination Condition for Continuous Batching
Narrative Example of Dynamic Batch Management in Continuous Batching
An inference engine is processing a batch of user requests using an iteration-based scheduling method where the batch composition can be adjusted between computational steps. Midway through a single computational iteration, a new, high-priority request arrives. Based on the principles of this dynamic scheduling process, what is the most likely action the system will take?
An inference engine uses a dynamic, iteration-based scheduling method to handle user requests. Arrange the following actions into the correct logical sequence that describes the general process from start to finish.
Analysis of Inference Engine Halting
Learn After
A system is processing a batch of user requests. The current batch contains three active requests: Request A (long), Request B (short), and Request C (medium). During the current processing cycle, Request B finishes and is completed. At the same moment, two new requests, Request D and Request E, arrive. The system determines it has enough available capacity to add exactly one of the new requests to the batch. Which of the following describes the most likely composition of the processing batch in the very next cycle?
A system is managing inference requests using a dynamic process where requests can be added or removed from a batch during processing. The following events occur over a period of time. Arrange them in the logical order that demonstrates how the system handles incoming and outgoing requests.
Analysis of a Dynamic Batching Scheduler's Decision