Learn Before
Concept

Global Tokens for Attention

A widely-used technique for combining local and long-range context is to designate the first few tokens of a sequence as 'global tokens'. These tokens are made accessible to all other tokens during the attention calculation, effectively serving as a form of global memory. This method is frequently implemented in conjunction with sparse attention models.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related