Learn Before
Multiple Choice

An inference server for a large language model is handling two user requests at the same time. Request A requires a long, multi-step initial processing phase before it can generate its first word. Request B is already in its generation phase, producing one word at a time. The server employs a scheduling system that, in each computational cycle, assigns exactly one unit of work—either a single step of initial processing or the generation of a single word—to each active request. What is the most significant outcome of using this scheduling approach in this scenario?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science