1Cademy - Latency in Batched vs. Single Sequence Processing

Learn Before

Example of Minimal Latency with a Single Sequence

Short Answer

Latency in Batched vs. Single Sequence Processing

Imagine two separate requests are sent to a large language model. Request A contains only a single, short sentence to be completed. Request B is a batch containing two items: the same short sentence from Request A, and a much longer paragraph that also needs to be completed. Explain why the user who sent Request A will receive their completed sentence back faster than the user who sent Request B, even though the same short sentence was processed in both cases.

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related