1Cademy - An LLM inference server uses an iteration-level scheduler to process two requests concurrently. Request A requires an initial computation (prefill) that is broken into two chunks. Request B is in the process of generating its first two tokens (decoding). To ensure both requests make progress without one blocking the other, the scheduler interleaves these tasks. Arrange the four computational tasks below into the most logical and efficient sequence of operations over four iterations.

Learn Before

Simple Iteration-level Scheduling

Sequence Ordering

An LLM inference server uses an iteration-level scheduler to process two requests concurrently. Request A requires an initial computation (prefill) that is broken into two chunks. Request B is in the process of generating its first two tokens (decoding). To ensure both requests make progress without one blocking the other, the scheduler interleaves these tasks. Arrange the four computational tasks below into the most logical and efficient sequence of operations over four iterations.

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related