1Cademy - An LLM inference system is handling two sequences simultaneously using iteration-level scheduling with chunked prefilling. Sequence A has a long prompt that is broken into three prefill chunks (P₁, P₂, P₃). Sequence B is already in the middle of generating its response, requiring individual decode steps (D₁, D₂, D₃). Arrange the following computational steps into the most efficient order that demonstrates this scheduling strategy, ensuring that neither sequence is unnecessarily blocked.

Learn Before

Example of Chunked Prefilling in Iteration-Level Scheduling

Sequence Ordering

An LLM inference system is handling two sequences simultaneously using iteration-level scheduling with chunked prefilling. Sequence A has a long prompt that is broken into three prefill chunks (P₁, P₂, P₃). Sequence B is already in the middle of generating its response, requiring individual decode steps (D₁, D₂, D₃). Arrange the following computational steps into the most efficient order that demonstrates this scheduling strategy, ensuring that neither sequence is unnecessarily blocked.

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related