Example

Example of Minimal Latency with a Single Sequence

An illustrative case for understanding latency is processing a single input sequence. In this scenario, with a batch size of one, the result becomes available immediately after the generation is complete. There is no additional waiting time or computational overhead caused by other sequences, representing the lowest possible latency for a request.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course