1Cademy - An LLM inference system is designed with two specialized hardware engines operating in a pipeline. Engine A processes the initial prompts for a batch of user requests to generate their internal state. This state is then passed to Engine B, which handles the step-by-step generation of the response tokens for that same batch. As soon as Engine A finishes with the first batch, it immediately begins processing the initial prompts for a second, new batch of requests while Engine B is still generating

Learn Before

Disaggregation of Prefilling and Decoding using Pipelined Engines

Multiple Choice

An LLM inference system is designed with two specialized hardware engines operating in a pipeline. Engine A processes the initial prompts for a batch of user requests to generate their internal state. This state is then passed to Engine B, which handles the step-by-step generation of the response tokens for that same batch. As soon as Engine A finishes with the first batch, it immediately begins processing the initial prompts for a second, new batch of requests while Engine B is still generating

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related