Concept

Latency Variability as a Drawback of Continuous Batching

While prioritizing prefilling is effective for maximizing hardware utilization, it introduces a critical trade-off: significant variability in token generation latency. This latency inconsistency becomes especially pronounced in systems that handle a mixed workload of both long and short input sequences, as shorter requests can be delayed by longer ones.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences