1Cademy - An engineering team is optimizing a system that serves a large language model to multiple users. To maximize the number of requests processed per hour, they decide to group incoming requests into large batches before sending them to the hardware for processing. This approach significantly increases the systems overall processing capacity. For which of the following applications would this optimization strategy be most detrimental to the user experience?

Learn Before

Throughput-Latency Trade-off in LLM Inference

Multiple Choice

An engineering team is optimizing a system that serves a large language model to multiple users. To maximize the number of requests processed per hour, they decide to group incoming requests into large batches before sending them to the hardware for processing. This approach significantly increases the system's overall processing capacity. For which of the following applications would this optimization strategy be most detrimental to the user experience?

Updated 2025-10-01

Contributors are:

Who are from:

Learn Before

Related