Example

Example of a Request Completing in Continuous Batching (Iteration 5)

This diagram illustrates the fifth iteration in a continuous batching example. In this step, the scheduler directs the inference engine to perform a single decoding operation for all three active requests: x₁, x₂, and x₃. As a result of this step, the generation for request x₂ is finished, which is indicated by the 'complete' status. The full output for this request, y₂, is now available. This event signals to the scheduler that x₂ can be removed from the batch in the next iteration, freeing up resources and demonstrating a key advantage of continuous batching—the dynamic management of the batch composition.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences