1Cademy - In a common architecture for language model inference, the initial processing of a users prompt (prefilling) and the subsequent token-by-token generation of the response (decoding) are treated as distinct computational stages, even though they execute on the same hardware. What is the primary analytical reason for this architectural separation?

Learn Before

Aggregated Architecture for Prefilling and Decoding

Multiple Choice

In a common architecture for language model inference, the initial processing of a user's prompt (prefilling) and the subsequent token-by-token generation of the response (decoding) are treated as distinct computational stages, even though they execute on the same hardware. What is the primary analytical reason for this architectural separation?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related