Example

Example of Reusing a Completed Slot in Continuous Batching (Iteration 6)

This diagram illustrates the sixth iteration in a continuous batching process, which occurs after request x₂ has completed. New requests, x₄ and x₅, arrive in the system. The scheduler dynamically adjusts the batch by using the resources freed up by the completed request x₂ to accommodate the new request x₄. In a single computational step, the system concurrently performs the prefilling phase for x₄ while also executing a single decoding step for the ongoing requests x₁ and x₃. This highlights a key efficiency of continuous batching: the immediate reuse of resources to interleave the processing of new and existing requests, thereby maximizing throughput.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences