Analyzing a System State in Continuous Batching
An LLM inference engine using a continuous batching strategy is processing two separate requests simultaneously. After a recent computational step, the state is as follows:
- Request 1 has generated the sequence: "The cat sat on"
- Request 2 has generated the sequence: "Once upon a"
In the immediately preceding step, the generated sequences were "The cat sat" and "Once upon", respectively. No new requests have arrived, and neither request has finished generating.
Based on this change in state, describe the single computational operation that was just performed and explain why this operation is a key feature of the iterative decoding phase for a batch.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)
An inference engine is processing a batch of two text generation requests, Request A and Request B, using a continuous batching strategy. So far, the engine has generated the first output token for each: 'The' for Request A, and 'Once' for Request B. Neither request is complete, and no new requests have arrived. What is the most likely immediate next action the engine will perform in a single computational step?
A continuous batching system receives two new text generation requests simultaneously. Arrange the following computational stages in the correct chronological order for processing these two requests, assuming no other requests arrive during this time.
Analyzing a System State in Continuous Batching