Example

Example of the First Decoding Step in Continuous Batching (Iteration 2)

This diagram illustrates the second iteration in a continuous batching process, which follows the initial prefilling of requests x1 and x2. In this step, the scheduler directs the inference engine to perform a single decoding operation for the entire batch. This concurrently generates the first output token for both request x1 and request x2, demonstrating how the system transitions from the prefilling phase to the iterative decoding phase for a group of requests.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences