Example of the Second Decoding Step in Continuous Batching (Iteration 3)
This diagram illustrates the third iteration in the continuous batching example, continuing from the first decoding step. In this stage, the scheduler again directs the inference engine to perform a single decoding operation for the batch containing requests x1 and x2. This action generates the second output token for each of the two requests, demonstrating the ongoing, iterative nature of the decoding phase.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of the Second Decoding Step in Continuous Batching (Iteration 3)
An inference engine is processing user requests. It has just completed the initial processing for two separate requests, 'Request A' and 'Request B', loading them into memory. Both requests are now ready for the next stage of generation. What is the most likely immediate next action the engine will take to operate efficiently, and what will be its result?
State of Batched Requests After First Generation Step
Inference Engine State Analysis
Learn After
Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)
An inference engine is processing a batch of two text generation requests, Request A and Request B, using a continuous batching strategy. So far, the engine has generated the first output token for each: 'The' for Request A, and 'Once' for Request B. Neither request is complete, and no new requests have arrived. What is the most likely immediate next action the engine will perform in a single computational step?
A continuous batching system receives two new text generation requests simultaneously. Arrange the following computational stages in the correct chronological order for processing these two requests, assuming no other requests arrive during this time.
Analyzing a System State in Continuous Batching