1Cademy - An LLM inference system is actively generating tokens for two separate user requests that are already in progress. A third user submits a new request to the system. To maximize overall throughput by overlapping different types of computation, what actions will the system perform in the next single computational step?

Learn Before

Example of Concurrent Prefilling and Decoding in Continuous Batching (Iteration 4)

Multiple Choice

An LLM inference system is actively generating tokens for two separate user requests that are already in progress. A third user submits a new request to the system. To maximize overall throughput by overlapping different types of computation, what actions will the system perform in the next single computational step?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related