Multiple Choice

An LLM inference server is processing a batch of three requests (A, B, C) and has just completed their initial, compute-intensive processing stage. At this moment, a new request (D) arrives. To maximize hardware utilization and overall system throughput, what is the most efficient action for the server to take in the very next iteration?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science