1Cademy - Disaggregation of Prefilling and Decoding using Pipelined Engines

Learn Before

Continuous Batching for LLM Inference
Prefilling Phase in Transformer Inference
Decoding Phase in Transformer Inference

Activity (Process)

Disaggregation of Prefilling and Decoding using Pipelined Engines

This strategy, known as the disaggregation of prefilling and decoding, implements continuous batching by using two specialized hardware engines. A dedicated 'Engine 1' performs prefilling for a batch of requests. Once complete, the generated Key-Value (KV) cache is sent to a separate 'Engine 2' for decoding. The primary benefit of this pipeline is that Engine 1 can immediately start prefilling the next batch while Engine 2 is decoding the first. This overlapping of computations is key to improving computational efficiency and maximizing hardware utilization.

Updated 2026-05-02

Contributors are: