1Cademy - Global Nature of Standard Transformer LLMs

Learn Before

Decoder-Only Transformer as a Language Model

Concept

Global Nature of Standard Transformer LLMs

Large language models that utilize the standard Transformer architecture function as global models. During inference, these models are required to store the complete left-context—the entire history of previously generated tokens—in order to predict future tokens. This comprehensive storage is managed through a Key-Value (KV) cache, which retains the key and value representations of all past tokens, resulting in a caching cost that progressively increases as the generation process continues.

Updated 2026-04-22

Contributors are: