A continuous batching system receives two new text generation requests simultaneously. Arrange the following computational stages in the correct chronological order for processing these two requests, assuming no other requests arrive during this time.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Comprehension in Revised Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)
An inference engine is processing a batch of two text generation requests, Request A and Request B, using a continuous batching strategy. So far, the engine has generated the first output token for each: 'The' for Request A, and 'Once' for Request B. Neither request is complete, and no new requests have arrived. What is the most likely immediate next action the engine will perform in a single computational step?
A continuous batching system receives two new text generation requests simultaneously. Arrange the following computational stages in the correct chronological order for processing these two requests, assuming no other requests arrive during this time.
Analyzing a System State in Continuous Batching