1Cademy - Example of a Request Completing in Continuous Batching (Iteration 5)

Learn Before

Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)

Example

Example of a Request Completing in Continuous Batching (Iteration 5)

This diagram illustrates the fifth iteration in a continuous batching example. In this step, the scheduler directs the inference engine to perform a single decoding operation for all three active requests: x₁, x₂, and x₃. As a result of this step, the generation for request x₂ is finished, which is indicated by the 'complete' status. The full output for this request, y₂, is now available. This event signals to the scheduler that x₂ can be removed from the batch in the next iteration, freeing up resources and demonstrating a key advantage of continuous batching—the dynamic management of the batch composition.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After