Multiple Choice

In a common architecture for language model inference, the initial processing of a user's prompt (prefilling) and the subsequent token-by-token generation of the response (decoding) are treated as distinct computational stages, even though they execute on the same hardware. What is the primary analytical reason for this architectural separation?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science