Learn Before
Sequence Ordering

An LLM inference server uses an iteration-level scheduler to process two requests concurrently. Request A requires an initial computation (prefill) that is broken into two chunks. Request B is in the process of generating its first two tokens (decoding). To ensure both requests make progress without one blocking the other, the scheduler interleaves these tasks. Arrange the four computational tasks below into the most logical and efficient sequence of operations over four iterations.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science