Learn Before
Concept

Global Tokens in Attention

Global tokens are a widely-used technique in attention mechanisms for combining local and long-term context. This approach designates a few tokens at the beginning of a sequence as 'global,' making them accessible to all other tokens during attention calculations. Often implemented alongside sparse attention models, this method serves as a form of global memory. It offers the advantage of stabilizing model performance by smoothing the Softmax distribution of attention weights, but it also introduces a trade-off: the fixed size of this global memory can lead to information loss, creating a tension between representational capacity and computational cost.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related