Example of a Request Completing in Continuous Batching (Iteration 5)
This diagram illustrates the fifth iteration in a continuous batching example. In this step, the scheduler directs the inference engine to perform a single decoding operation for all three active requests: x₁, x₂, and x₃. As a result of this step, the generation for request x₂ is finished, which is indicated by the 'complete' status. The full output for this request, y₂, is now available. This event signals to the scheduler that x₂ can be removed from the batch in the next iteration, freeing up resources and demonstrating a key advantage of continuous batching—the dynamic management of the batch composition.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of a Request Completing in Continuous Batching (Iteration 5)
An LLM inference system is actively generating tokens for two separate user requests that are already in progress. A third user submits a new request to the system. To maximize overall throughput by overlapping different types of computation, what actions will the system perform in the next single computational step?
LLM Inference Scheduling Decision
Efficiency of Concurrent LLM Operations
Learn After
Example of Reusing a Completed Slot in Continuous Batching (Iteration 6)
An inference engine is using a dynamic batching strategy to process three text generation requests simultaneously: Request A, Request B, and Request C. After a single, parallel decoding step is applied to all three, the system determines that Request B has finished generating its full output, while Requests A and C still require more steps. What is the most significant, immediate consequence of Request B's completion for the system's operation in the very next processing step?
An LLM inference engine is processing a batch of multiple, independent requests using a dynamic scheduling approach. One of these requests is about to finish. Arrange the following events in the correct chronological order, starting from the computational step that generates the final piece of output.
Resource Management in Dynamic Batching