Learn Before
Example of Initial Batch Creation in Continuous Batching
An illustration of the initial step in continuous batching involves the scheduler receiving new requests, such as 'x1' and 'x2'. In the first iteration of the process, these requests are formed into an initial batch and sent to the inference engine to begin the prefilling phase.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Initial Batch Creation in Continuous Batching
Batching Sequences of Varying Lengths
Assembling an Initial Processing Batch
An inference engine employing a continuous batching strategy is initialized and presented with a queue of 10 pending user requests. In forming the very first batch to begin processing, which of the following is the most critical constraint determining how many of these requests can be grouped together?
When an inference engine using continuous batching forms its initial batch, it is required to include all user requests that are currently pending in the queue, regardless of system limitations.
Learn After
Example of the First Decoding Step in Continuous Batching (Iteration 2)
An inference server's scheduler receives two new, independent user requests at the same time. Assuming the system has the capacity to handle both, what is the most accurate description of the scheduler's immediate action and the primary goal of this initial processing step?
Initial Batch Formation
An inference scheduler receives two new, independent requests. Arrange the following events to accurately describe the initial processing step for these requests.