1Cademy - Example of Chunked Prefilling in Iteration-Level Scheduling

Learn Before

Example

Example of Chunked Prefilling in Iteration-Level Scheduling

In an iteration-level scheduling system, chunked prefilling can efficiently process a batch containing multiple sequences by overlapping prefilling and decoding steps. For instance, consider a batch with two sequences. Standard scheduling treats the entire prefilling of the first sequence as a single iteration, forcing the second sequence's decoding step (e.g., $D_{22}$ ) to wait until the entire prefill is complete. In contrast, chunked prefilling divides the first sequence's prefilling into smaller steps, such as chunks $P_{11}$ , $P_{12}$ , and $P_{13}$ . Because each chunk corresponds to one iteration, decoding steps for the second sequence can execute concurrently with these prefilling chunks (e.g., $D_{22}$ can execute during $P_{12}$ ). This significantly reduces decoder idle time and allows output tokens to be generated earlier.

Updated 2026-05-06

Contributors are:

Who are from:

References

Learn Before

Related

Learn After