1Cademy - An LLM inference server is processing a batch of three requests (A, B, C) and has just completed their initial, compute-intensive processing stage. At this moment, a new request (D) arrives. To maximize hardware utilization and overall system throughput, what is the most efficient action for the server to take in the very next iteration?

Learn Before

Example of Interleaving Prefilling and Decoding in Continuous Batching

Multiple Choice

An LLM inference server is processing a batch of three requests (A, B, C) and has just completed their initial, compute-intensive processing stage. At this moment, a new request (D) arrives. To maximize hardware utilization and overall system throughput, what is the most efficient action for the server to take in the very next iteration?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related