1Cademy - An operations team monitors an LLM inference system and notices that the hardware responsible for model execution is consistently underutilized, even when there is a continuous stream of user requests waiting to be processed. This leads to lower-than-expected overall system throughput. In a standard workflow where requests are grouped into batches by a scheduler before being processed, what is the most probable explanation for this specific performance issue?

Learn Before

Request Processing Workflow in LLM Inference

Multiple Choice

An operations team monitors an LLM inference system and notices that the hardware responsible for model execution is consistently underutilized, even when there is a continuous stream of user requests waiting to be processed. This leads to lower-than-expected overall system throughput. In a standard workflow where requests are grouped into batches by a scheduler before being processed, what is the most probable explanation for this specific performance issue?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related