1Cademy - Example of Decoder Idle Time in Standard Prefilling

Learn Before

Throughput-Latency Trade-off in Prefilling-Prioritized Continuous Batching

Example

Example of Decoder Idle Time in Standard Prefilling

This diagram illustrates a key inefficiency in standard 'prefill in one go' batching. Sequence 2, with a short prompt, completes its prefill (P₂₁) and first decoding step (D₂₁) in Iteration 1. It then enters a prolonged 'Idle Time' during Iteration 2, as it must wait for the much longer prefilling of Sequence 1 (P₁₁) to complete. Only after this long prefill finishes can both sequences proceed with decoding in parallel from Iteration 3 onwards. This idle period demonstrates how long prefill tasks can block shorter decoding tasks, leading to underutilization of hardware and increased latency for some sequences in the batch.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After