Example

Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)

This diagram illustrates the fourth iteration in a continuous batching process, where a new request, x₃, arrives. The scheduler incorporates this new request into the batch. In this single computational step, the system performs two distinct operations concurrently: the prefilling phase for the new request x₃ and a single decoding step for the ongoing requests x₁ and x₂. This interleaving of compute-intensive prefilling with memory-bound decoding is a core feature of continuous batching that enhances system throughput.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences