Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)
This diagram illustrates the fourth iteration in a continuous batching process, where a new request, x₃, arrives. The scheduler incorporates this new request into the batch. In this single computational step, the system performs two distinct operations concurrently: the prefilling phase for the new request x₃ and a single decoding step for the ongoing requests x₁ and x₂. This interleaving of compute-intensive prefilling with memory-bound decoding is a core feature of continuous batching that enhances system throughput.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)
An inference engine is processing a batch of two text generation requests, Request A and Request B, using a continuous batching strategy. So far, the engine has generated the first output token for each: 'The' for Request A, and 'Once' for Request B. Neither request is complete, and no new requests have arrived. What is the most likely immediate next action the engine will perform in a single computational step?
A continuous batching system receives two new text generation requests simultaneously. Arrange the following computational stages in the correct chronological order for processing these two requests, assuming no other requests arrive during this time.
Analyzing a System State in Continuous Batching
Learn After
Example of a Request Completing in Continuous Batching (Iteration 5)
An LLM inference system is actively generating tokens for two separate user requests that are already in progress. A third user submits a new request to the system. To maximize overall throughput by overlapping different types of computation, what actions will the system perform in the next single computational step?
LLM Inference Scheduling Decision
Efficiency of Concurrent LLM Operations