Short Answer

Latency in Batched vs. Single Sequence Processing

Imagine two separate requests are sent to a large language model. Request A contains only a single, short sentence to be completed. Request B is a batch containing two items: the same short sentence from Request A, and a much longer paragraph that also needs to be completed. Explain why the user who sent Request A will receive their completed sentence back faster than the user who sent Request B, even though the same short sentence was processed in both cases.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science