Multiple Choice

An LLM serving system is processing numerous concurrent requests of varying lengths. As requests are completed, their associated memory is freed. After running for some time, the system's overall throughput decreases, and it frequently fails to start processing new, long sequences, even though monitoring tools show that a significant percentage of total memory is free. Based on this scenario, what is the most accurate evaluation of the underlying problem?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science