Example

Example of Decoder Idle Time in Standard Prefilling

This diagram illustrates a key inefficiency in standard 'prefill in one go' batching. Sequence 2, with a short prompt, completes its prefill (P₂₁) and first decoding step (D₂₁) in Iteration 1. It then enters a prolonged 'Idle Time' during Iteration 2, as it must wait for the much longer prefilling of Sequence 1 (P₁₁) to complete. Only after this long prefill finishes can both sequences proceed with decoding in parallel from Iteration 3 onwards. This idle period demonstrates how long prefill tasks can block shorter decoding tasks, leading to underutilization of hardware and increased latency for some sequences in the batch.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences