1Cademy - Decoding Phase as a Memory-Bound Process

Learn Before

Diagram of the Decoding Phase

Concept

Decoding Phase as a Memory-Bound Process

The decoding phase in Transformer models is considered a memory-bound operation because it requires frequent access to the Key-Value (KV) cache. This computational bottleneck is exacerbated as the output sequence grows, since the cost of decoding increases significantly with each new token generated.

Updated 2026-05-03

Contributors are: