Concept

Decoding Phase as a Memory-Bound Process

The decoding phase in Transformer models is considered a memory-bound operation because it requires frequent access to the Key-Value (KV) cache. This computational bottleneck is exacerbated as the output sequence grows, since the cost of decoding increases significantly with each new token generated.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences