Learn Before
Concept

Aggregated Architecture for Prefilling and Decoding

An architectural model where the prefilling and decoding phases of inference are treated as separate stages of computation but are executed on the same hardware. This approach is a common foundation for advanced batching techniques that improve upon simpler strategies.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences