1Cademy - An inference server for a large language model is handling two user requests at the same time. Request A requires a long, multi-step initial processing phase before it can generate its first word. Request B is already in its generation phase, producing one word at a time. The server employs a scheduling system that, in each computational cycle, assigns exactly one unit of work—either a single step of initial processing or the generation of a single word—to each active request. What is the most significant outcome of using this scheduling approach in this scenario?

Learn Before

Simple Iteration-level Scheduling

Multiple Choice

An inference server for a large language model is handling two user requests at the same time. Request A requires a long, multi-step initial processing phase before it can generate its first word. Request B is already in its generation phase, producing one word at a time. The server employs a scheduling system that, in each computational cycle, assigns exactly one unit of work—either a single step of initial processing or the generation of a single word—to each active request. What is the most significant outcome of using this scheduling approach in this scenario?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related