Multiple Choice

An operations team monitors an LLM inference system and notices that the hardware responsible for model execution is consistently underutilized, even when there is a continuous stream of user requests waiting to be processed. This leads to lower-than-expected overall system throughput. In a standard workflow where requests are grouped into batches by a scheduler before being processed, what is the most probable explanation for this specific performance issue?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science