1Cademy - A language model processes a batch containing two sequences: Sequence A with a long prompt and Sequence B with a short prompt. The system is configured to complete the entire prompt-processing (prefill) phase for all sequences in the batch before starting the parallel token-generation (decode) phase for the entire batch. Which statement best analyzes the primary source of computational inefficiency in this scenario?

Learn Before

Example of Decoder Idle Time in Standard Prefilling

Multiple Choice

A language model processes a batch containing two sequences: Sequence A with a long prompt and Sequence B with a short prompt. The system is configured to complete the entire prompt-processing (prefill) phase for all sequences in the batch before starting the parallel token-generation (decode) phase for the entire batch. Which statement best analyzes the primary source of computational inefficiency in this scenario?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related