Multiple Choice

A language model processes a batch containing two sequences: Sequence A with a long prompt and Sequence B with a short prompt. The system is configured to complete the entire prompt-processing (prefill) phase for all sequences in the batch before starting the parallel token-generation (decode) phase for the entire batch. Which statement best analyzes the primary source of computational inefficiency in this scenario?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science