Sequence Ordering

An LLM inference system is handling two sequences simultaneously using iteration-level scheduling with chunked prefilling. Sequence A has a long prompt that is broken into three prefill chunks (P₁, P₂, P₃). Sequence B is already in the middle of generating its response, requiring individual decode steps (D₁, D₂, D₃). Arrange the following computational steps into the most efficient order that demonstrates this scheduling strategy, ensuring that neither sequence is unnecessarily blocked.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Comprehension in Revised Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science